Unless you’ve written C in the past, your closest encounter with the preprocessor and its code inclusion is probably the #import statement. But before #import, there was #include.
The preprocessor command #include is a clever little trick. It basically tells the preprocessor to treat the contents the file included as if the entire file actually appeared at the point of the #include. That explanation may seem a little confusing, so let’s just look at an example. Let’s say we have a file, IncludeMe.h:
// IncludeMe.h #define kMyConstantNumber 42 #define kMyConstantBoolean true
Now, we write a little C program that uses the constants defined in the IncludeMe header file:
// MyProgram.c #include <stdio.h> #include "IncludeMe.h" printf("The constant is %d and the boolean is %d", kMyConstantNumber, kMyConstantBoolean);
How does MyProgram.c know what the values of kMyConstantNumber and kMyConstantBoolean are in order to print them out from printf()? Well, what’s really happening is the preprocessor is going in and injecting the contents of IncludeMe.h into MyProgram.c. So, what the compiler actually sees when it’s compiling your program is the following:
// MyProgram.c #define kMyConstantNumber 42 #define kMyConstantBoolean true printf("The constant is %d and the boolean is %d", kMyConstantNumber, kMyConstantBoolean);
Sure, this is a trivial example, but it should make it pretty obvious exactly what the preprocessor is doing. Actually, the above example isn’t entirely true. To see what the preprocessed file really looks like, fire up Xcode and create a new C/C++ file. In terminal, navigate to the directory containing that file (use the cd command to change directories and navigate to the file). Then type gcc -E filename.c and observe all of the code that gets spit out to the terminal window. By default, all new C files have stdio.h included. This is the header file for all of the basic IO functions available to all C programs (such as the printf() function you saw above). The preprocessor sees that your C file includes stdio.h, and so it includes all of the code in stdio and makes it available to your C program. If you scroll aaaall the way to the bottom of the output, you’ll see the code you actually wrote. #include placed all of the code from stdio in your file as if you had copy/pasted it there yourself.
As a side note, notice that we wrapped our included header file in double quotes (” “). This tells the preprocessor “look for IncludeMe.h in the same directory as MyProgram.c”. However, stdio.h is wrapped in angle brackets (< >). This tells the preprocessor to look for this header file in the directory with all of the system headers.
So, this preprocessor #include directive is great. It lets you modularize your code, include system headers, and fosters reusability. But what happens when you have a pair of files that look like this:
// FirstFile.h #include "SecondFile.h" /* Some code */ // SecondFile.h #include "FirstFile.h" /* Some other code */
Well, the preprocessor first goes out and sees that FirstFile.h wants to include SecondFile.h inside of it. But when it goes to do that, it sees that SecondFile.h also tries to include FirstFile.h, which includes SecondFile.h, which includes FirstFile.h, which includes….ok, well, you get the picture. This is called a Recursive Include.
#include vs. #import
The recursive include is the problem that Objective-C tried to solve with the introduction of the #import directive. Using #import, a file would be guarded against recursive includes by first checking to make sure the included file was not already defined. If it was not, the file would be included, otherwise it would be skipped. Traditional C headers also support this in the form of header guards:
#ifndef MyFile_h #define MyFile_h // Some code #endif
The two essentially do the same thing, however Objective-C classes and frameworks should use #import and not #include.
Back in November in 2012, Doug Gregor of Apple gave a presentation at the LLVM Developers Meeting requesting that modules, a solution to the problems inherent to preprocessor #imports and #includes, be introduced. Modules, Gregor argued, solve two problems the current preprocessor implementation faces:
With regards to fragility, a simple example exposes how ordering with preprocessor #includes and #imports matters greatly in the end. Let’s say we have the following Objective-C file:
// MyFile.h #define strong @"this won't work" #import <UIKit/UIKit.h> @interface MyFile : NSObject @property (nonatomic, strong) NSArray *anArray; @end
What happens after the preprocessor is done doing its work? Your header file now looks like this:
// MyFile.h #define strong @"this won't work" // UIKit imports @interface MyFile : NSObject @property (nonatomic, @"this won't work") NSArray *anArray; @end
Notice that we’ve overridden the definition of the strong keyword with something the compiler doesn’t know how to handle.
The other issue, scalability, should be apparent from the above description about #include. #include and #import are both textual inclusions — they are a glorified copy/paste transaction. The contents of the included file are simply pasted inline where the #include or #import statement was placed. Furthermore, any files included in that file also have their contents pasted into the original file, and so on and so forth until the entire #include/#import tree is traversed. This results in a multiplicative compile time between source files and headers. Now, you would think that for as long as C and Objective-C have been around, someone somewhere would have tried to tackle this problem. And you’d be right. Pre-compiled headers (.pch) have been in use for years to combat the scalability issue. Add an #include/#import statement here, headers are compiled into a single on-disk representation of all files in the .pch, and those headers are included in every source file in your project. However, even .pch files come with their own set of problems:
- Most developers don’t maintain their .pch files. They soon become unruly and unmanageable
- It’s difficult for developers new to a project to understand how files are related if everything is in the .pch file
- Sometimes you just don’t want a file included everywhere
Modules break away from this textual inclusion model and instead act as an encapsulation of what a framework is, as opposed to just shoving the headers into your source files. Think of it as making an API of the framework available to your source file. With modules, a framework is compiled once into an efficient, serialized representation that can be efficiently imported when the library is used. Additionally, it ignores preprocessor state within the source file, meaning that you can’t override the definition of a keyword in a module just because you #define something with the same name prior to the import.
But Apple, in typical fashion, has taken modules even further. Think about it: if you import the MapKit framework, why should you also have to tell Xcode to link against the MapKit framework? With modules, you get support for autolinking frameworks by default!
And how do we get all of this header-caching-auto-linking goodness? Say hello to @import. Simply using the @import declaration will kick off the new module parsing and caching. And you can still do selective imports through the use of the dot syntax. So, for instance, to import only the MKMapView classes and none of the other classes in the MapKit module, you simply say
Well that’s great, but now I have to go through each of my #import statements and replace them with @import? No! Apple is taking care of this for you, as long as you opt in. To opt in, make sure you turn on the Enable Modules (C and Objective-C) build setting. Additionally, you can turn on/off the auto-linking of frameworks here, as well.
And that’s it! How does this work, you ask? Module Maps. Module Maps are a way for modules to, well, map back to their header counterparts. Then, a separate compiler instance is spawned and the headers from the Module Map are parsed. A module file is written, and then that module file is loaded at the import declaration. As before, that module file is cached for later re-use (where ever you may import the same module).
Defining a Module
Defining a module is a relatively easy process. From Apple’s own presentation to the November LLVM Meeting, here is an example of how to define the C stdio library as a module:
The export keyword specifies the module name. The dot (.) indicates a submodule, so, in this case, stdio is a submodule of the std module. The public: keyword denotes the access to the API. In other words, which variables, methods, etc. will be publicly available. Anything defined outside of this is private to the module and remains that way. And that’s about it!
NOTE: At this time, modules are only available for Apple’s frameworks. and have been implicitly disabled for C++.
All of this @import stuff is really cool, in theory. But how does it stack up in practice? Doug Gregor includes an example in his presentation of the difference in the number of lines of compiled code using the traditional preprocessor macro versus the new @import directive. For the ubiquitous “Hello World” C program, the original source code file is a mere 64 lines of code. After the preprocessor has done its include of the stdio.h header, that number jumps to 11,072 lines. That’s 173 times the number of lines in the actual source file. That’s huge! Now let’s say each of your source files imports the stdio.h header (as almost all C programs do). You’re talking about adding over 11,000 lines of code to each file you add. From a mathematical perspective, that leads to an M x N compile time, if you imagine M source files and N headers. Using modules, the stdio module is parsed only once and then cached, dramatically reducing the number of lines in your processed source code files. Now, for smaller projects, the difference in compile time is negligible, and you probably won’t notice much of a benefit (aside from the conveniences of things like the auto-linking of frameworks). For larger projects, however, you’re looking at potential compile time improvements of a couple of percentage points or more! Either way, this is a really interesting and welcome addition to the LLVM compiler, and as the community continues to expand on what a module is and does, the benefits of using modules will continue to grow.