humancode.us

All about libraries

January 3, 2024

After writing the post about mergeable libraries, I thought I should write a post about how libraries work in general, and capture some recommendations on how to use them.

Let’s start at the beginning, with what happens when a source file is compiled.

A compiler creates object files from source files

A compiler takes a translation unit (a source file with any imported headers), and produces a compiled object file, containing the code and data defined in the translation unit.

Exported symbols

An object file may export symbols. Exported symbols are strings that identify data structures within the object file, which may be function implementations, constants, initialized data, whatever.

// sound.c

static int tweetCount = 0;

extern void tweet(void) {
    tweetCount++;
}

The file above compiles into an object file sounds.o which exports the symbol _tweet for the implementation of the tweet() function1 2.

Some symbols are purely internal to the translation unit. tweetCount, for example, is not exported.

In addition to open/public (exported) and private/fileprivate (not exported), Swift symbols have an access control option, package, which makes the symbol visible only within a Swift package. As far as libraries are concerned, package symbols are exported.

Unresolved symbols

An object file may also have unresolved symbols. These are symbols the object file needs, but must be provided by something else.

// motion.c

extern void step(void);

void walk(void) {
    step();
}

The file above compiles into an object file motion.o which exports the symbol _walk for the walk() function, and has an unresolved symbol _step which must be provided by some other object file.

Static libraries collect object files

A static library is nothing more than an ordered collection of object files. We can archive sound.o and motion.o together into a static library libactions.a using an archiver.

% nm -g libactions.a 

sound.o:
0000000000000000 T _tweet

motion.o:
                 U _step
0000000000000000 T _walk

The contents of that static library are simply the two object files. Their exported symbols and unresolved symbols remain the same.

Linking resolves unresolved symbols

Linking is the build step that turns object files and libraries into an executable (app). During linking, all unresolved symbols must be resolved.

Let’s try to compile and link this file into an executable:

// main.c

extern void tweet(void);

int main(int argc, char **argv) {
    tweet();
    return 0;
}

Compiling this file yields an object file main.o which exports the symbol _main (used as the designated start point of a C program), and has one unresolved symbol: _tweet. Linking this object file into an executable will fail, because the symbol _tweet remains unresolved.

Static libraries provide needed symbols

We can use the prebuilt libactions.a library to provide the implementation for _tweet that the executable needs. To do this, we change our linker invocation to link the library (by adding a -lactions option).

The static linker copies object files into the executable

To find the missing _tweet symbol, the linker goes through each library on its options list, and examines each object file in order. When it finds an object file that exports the unresolved symbol, it copies the contents of the object file into the executable, and resolves the symbol reference by patching the code in main() to call tweet() directly.

Note that the entire contents of the object file is copied, including other code and data found in that object file, even if they are unrelated to the symbol that was needed.

Note that the second object file, motion.o, was never copied into the executable, so the linker doesn’t care that the unresolved symbol _step is needed by that file. The executable links successfully without needing to resolve that symbol.

The static linker stops looking once it finds what it needs

An symbol may be exported by multiple object files in a library, or by multiple object files in multiple libraries, but the static linker stops looking when it finds the first instance of the symbol. The order in which static libraries appear on the linker options list matters.

You can force the static linker to copy all object files from every dependent static library by passing the -all_load option to the linker. This is usually not a good idea.

Objective-C requires special handling

Due to the dynamic nature of Objective-C, you must copy all Objective-C symbols to the executable even if they are not used to resolve an unresolved symbol. Otherwise, surprising behavior may occur. In particular, Objective-C categories may be unexpectedly dropped from your executable.

You can ensure that Objective-C symbols are always copied by using the -ObjC linker option.

A static library may use another static library

Since a static library is just a collection of object files, it can use symbols exported by other static libraries. Any library (including a dynamic library, described below) can resolve an unresolved symbol needed by a static library when an executable is built.

When building a static library, it is possible to tell the archiver to link another static library. This causes the archiver to copy all of the object files found in the other library into the one being built. Then, the executable needs only link one library to get object files from both.

Recap on static libraries

Let’s review what we learned about static libraries:

  1. A static library is a collection of object files.
  2. The static linker resolves unresolved symbols when building an executable.
  3. The static linker searches for unresolved symbols within the object files archived in static libraries.
  4. The static linker copies whole object files from static libraries into the executable.
  5. The static linker stops looking once it finds the symbol it needs.
  6. The order in which static libraries appear on the command line matters when the same symbol occurs more than once.

A dynamic library allows symbols to be resolved when the executable runs

Unlike static libraries that resolve symbols when an executable is linked, dynamic libraries allow unresolved symbols to be resolved when an executable runs. Instead of copying object files from static libraries into the executable, a dynamic linker notes in the executable the path3 to the dynamic library where an unresolved symbol should be found. The executable looks up noted symbols each time it runs using a dynamic loader that executes in its process.

A dynamic library has to be present both during build time (for the linker to note where symbols should be found) and at run time (for the loader to actually resolve the symbols by loading the libraries).

Because dynamic libraries must be loaded and examined when an app is launched, an executable that uses dynamic libraries will experience increased startup times. Mergeable libraries provides a way to enjoy the benefits of dynamic libraries while reclaiming much of the performance of static libraries.

Dynamic linking is especially useful for operating system libraries. An OS can update its dynamic libraries without requiring installed apps to be rebuilt. After an OS update, apps will automatically load the new system libraries the next time they run.

A dynamic library may be shared by many executables

A dynamic library contains code that can be efficiently shared by many executables that load it. The operating system may map a single cache of the library into the memory of multiple processes.

Sharing code this way allows a developer to reduce the total size of an app bundle. If an iOS app and an included extension link the same dynamic library, then embedding the dynamic library in the app bundle (as a framework) allows a single copy of the library to be used by both executables.

On Apple platforms, a framework is a way to bundle a library (static or dynamic) along with the resources (eg. localized strings, images, colors) that the framework needs. A framework is a kind of bundle, which is just a standard directory structure for packaging these kinds of things and their metadata. An iOS or Mac app is another kind of bundle, for instance. Xcode has good support for dealing with frameworks when building and distributing apps.

On Apple platforms, an embedded dynamic framework inside an app bundle allows convenient resource lookup using the Bundle API. Resources can also be bundled with a dynamic library for binary distribution using dynamic xcframeworks.

A dynamic library is loaded in its entirety

Unlike a static library that allows individual object files to be copied into an executable, a dynamic library must be loaded into a running process in its entirety. All symbols exported by the dynamic library are made available in the executable whether or not they are needed.

That means that symbol collisions may occur when a symbol appears more than once4. This is especially common with Objective-C implementations: A class implementation that appears in two dynamic libraries, for instance, will result in a random one being loaded.

A dynamic library may use other dynamic libraries

A dynamic library is free to use symbols exported by other dynamic libraries. These uses will be recorded as unresolved symbols that will be resolved at load time. In fact, in a dynamic library, all of its unresolved symbols must be resolved at load time by loading other dynamic libraries.

A dynamic library must link another dynamic library to use it. Linking a dynamic library doesn’t copy any code, unlike linking a static library.

It is possible for a set of dynamic libraries to form a circular link dependency. While this is not a problem at run time, it does cause difficulty at build time: which library do you build first? There are way to work around this issue, but it’s much better to break such cycles.

A dynamic library can link static libraries. The way a dynamic library uses a static library is similar to the way an executable does: object files are copied from the static library to the dynamic library to resolve unresolved symbols. Symbols exported by the copied object files are reexported by the dynamic library.

Because symbols satisfied by static libraries are resolved when a dynamic library is built, two dynamic libraries linking a common static library and loaded by an executable will load two copies of the common static library into the process. Since these two copies don’t know about each other, this may result in duplication of singleton global variables. This may lead to bugs that are hard to diagnose.

Recap on dynamic libraries

Let’s review what we learned about dynamic libraries:

  1. An executable may load a dynamic library to resolve symbols at load time.
  2. A dynamic library will be loaded in its entirety.
  3. A dynamic library may link other dynamic libraries to resolve its own symbols.
  4. A dynamic library must be present both at build time (in the SDK) and at run time.

Recommendations

That’s a short summary of libraries and what they do. Here are some recommendations I would make, especially when building iOS apps:

Keep the library dependency graph acyclic. Although it’s possible to resolve cyclic dependencies, it’s best to avoid them altogether. Cycles can usually be broken by creating another library that two or more libraries depend on.

Avoid symbol duplication. Each symbol should appear only once in the link dependency graph.

The executable should explicitly link all libraries. It’s normal for an executable to explicitly link all libraries. Don’t try to hide second- or third-level dependencies from the executable5.

Always use the -ObjC option. This ensures that Objective-C behavior is respected, and does nothing if Objective-C code or data is absent.

Consider explicitly controlling exported symbols. Reducing the number of exported symbols makes your libraries faster and safer to use. Explore compiler options such as -fvisibility=hidden or linker options like -exported_symbols_list to do this.

Use static libraries for best run-time performance. Nothing beats the performance of static libraries.

Use dynamic libraries for best build experience. Dynamic libraries yield faster build times as there is no need to copy library contents each time an executable is built.

Use dynamic libraries to share code. On Apple platforms, using embedded dynamic libraries (as frameworks) to share code between executables in a single app bundle reduces the download size of the app.

Use dynamic frameworks to bundle resources with code. On Apple platforms, dynamic frameworks are a natural fit for bundling resources with your code.

Use mergeable libraries to improve dynamic library performance. Using mergeable libraries allows the use of dynamic library semantics while enjoying most of the performance benefits of static libraries.

Static libraries should not link other static libraries. While static libraries may use other static libraries, they should not link them to avoid symbol duplication6.

Dynamic libraries should not link static libraries. Due to the risk of loss of global uniqueness and symbol collisions at load-time (especially Objective-C symbols), dynamic libraries should only link other dynamic libraries. An exception can be made if you are absolutely sure that any linked static libraries will only appear once in the link dependency graph.

  1. By convention, C functions are exported with a leading underscore. 

  2. By default, C symbols are exported unless marked static. You can flip this behavior by setting GCC_SYMBOLS_PRIVATE_EXTERN = YES, and using visibility attributes to export symbols. 

  3. There are ways to capture relative paths when embedding dynamic frameworks within an app bundle. For system libraries, absolute paths are always used. 

  4. Technically, this is not true. Apple’s dynamic linkers use two-level namespaces to distinguish between identical symbols that come from more than one library. In practice, this technique is difficult to use outside manual dlopen and dlsym, as C doesn’t natively support this distinction. 

  5. Swift Package Manager automates this to some degree: Static libraries created by Swift packages include a magic section that automatically adds more link options to an executable that links it, automatically linking dependent libraries. 

  6. Static frameworks seem to avoid this problem as the build system appears to refuse to copy object files from dependent static frameworks, even when they are linked. Nonetheless, not linking at all remains the better option.