The UAPI header file split

This article brought to you by LWN subscribers

Subscribers to LWN.net made this article — and everything that surrounds it — possible. If you appreciate our content, please buy a subscription and make the next set of articles possible.

By Michael Kerrisk
July 25, 2012

Patches that add new software features often gain the biggest headlines in free software projects. However, once a project reaches a certain size, refactoring work that improves the overall maintainability of the code is arguably at least as important. While such work does not improve the lives of users, it certainly improves the lives of developers, by easing later work that does add new features.

With around 15 million lines of code (including 17,161 .c files and 14,222 .h files) in the recent 3.5 release, the Linux kernel falls firmly into the category of projects large enough that periodic refactoring is a necessary and important task. Sometimes, however, the sheer size of the code base means that refactoring becomes a formidable task—one that verges on being impossible if attempted manually. At that point, an enterprising kernel hacker may well turn to writing code that refactors the kernel code. David Howell's UAPI patch series, which has been proposed for inclusion during the last few kernel merge windows, was created using such an approach.

The UAPI patchset was motivated by David's observation that when modifying the kernel code:

I occasionally run into a problem where I can't write an inline function in a header file because I need to access something from another header that includes this one. Due to this, I end up writing it as a #define instead.

He went on to elaborate that this problem of "inclusion recursion" in header files typically occurs with inline functions:

Quite often it's a case of an inline function in header A wanting a struct [or constant or whatever] from header B, but header B already has an inline function that wants a struct from header A.

As is the way of such things, a small itch can lead one to thinking about more general problems, and how to solve them, and David has devised a grand nine-step plan of changes to achieve his goals, of which the current patch set is just the first step. However, this step is, in terms of code churn, a big one.

What David wants to do is to split out the user-space API content of the kernel header files in the include and arch/xxxxxx/include directories, placing that content into corresponding headers created in new uapi/ subdirectories that reside under each of the original directories. As well as being a step toward solving his original problem and performing a number of other useful code cleanups, David notes that disintegrating the header files has many other benefits. It simplifies and reduces the size of the kernel-only headers. More importantly, splitting out the user-space APIs into separate headers has the desirable consequence that it "simplifies the complex interdependencies between headers that are [currently] partly exported to userspace".

There is one other benefit of the UAPI split that may be of particular interest to the wider Linux ecosystem. By placing all of the user-space API-related definitions into files dedicated solely to that task, it becomes easier to track changes to the APIs that the kernel presents to user space. In the first instance, these changes can be discovered by scanning the git logs for changes in files under the uapi/ subdirectories. Easing the task of tracking user-space APIs would help many other parts of the ecosystem, for example, C library maintainers, scripting language projects that maintain language bindings for the user-space API, testing projects such as LTP, documentation projects such as man-pages, and perhaps even LWN editors preparing summaries of changes in the merge window that occurs at the start of each kernel release cycle.

The task of disintegrating each of the header files into two pieces is in principle straightforward. In the general case, each header file has the following form:

    /* Header comments (copyright, etc.) */

    #ifndef _XXXXXX_H     /* Guard macro preventing double inclusion */
    #define _XXXXXX_H

    [User-space definitions]

    #ifdef __KERNEL__

    [Kernel-space definitions]

    #endif /* __KERNEL__ */

    [User-space definitions]
  
    #endif /* End prevent double inclusion */

Each of the above parts may or may not be present in individual header files, and there may be multiple blocks governed by #ifdef __KERNEL__ preprocessor directives.

The part of this file that is of most interest is the code that falls inside the outermost #ifndef block that prevents double inclusion of the header file. Everything inside that block that is not nested within a block governed by a #ifdef __KERNEL__ block should move to the corresponding uapi/ header file. The content inside the #ifdef __KERNEL__ block remains in the original header file, but the #ifdef __KERNEL__ and its accompanying #endif are removed.

A copy of the header comments remains in the original header file, and is duplicated in the new uapi/ header file. In addition, a #include directive needs to be added to the original header file so that it includes the new uapi/ header file, and of course a suitable git commit message needs to be supplied for the change.

The goal is to modify the original header file to look like this:

    /* Header comments (copyright, etc.) */

    #ifndef _XXXXXX_H     /* Guard macro preventing double inclusion */
    #define _XXXXXX_H

    #include <include/uapi/path/to/header.h>

    [Kernel-space definitions]

    #endif /* End prevent double inclusion */

The corresponding uapi/ header file will look like this:

    /* Header comments (copyright, etc.) */

    #ifndef _UAPI__XXXXXX_H     /* Guard macro preventing double inclusion */
    #define _UAPI__XXXXXX_H

    [User-space definitions]

    #endif /* End prevent double inclusion */

Of course, there are various details to handle in order to correctly automate this task. First of all, sometimes the script should produce only one result file. If there is no #ifdef __KERNEL__ block in the original header, the original header file is in effect renamed to the uapi/ file. Where the header file is disintegrated into two files, there are many other details that need to be handled. For example, if there are #include directives that are retained at the top of the original header file, then the #include for the generated uapi/ file should be placed after those #include directives (in case the included uapi/ file has dependencies on them). Furthermore, there may be pieces of the original header that are explicitly not intended for kernel space (i.e., they are for user-space only)—for example, pieces governed by #ifndef __KERNEL__. Those pieces should migrate to the uapi/ file, retaining the guarding #ifndef __KERNEL__.

David's scripts handle all of the aforementioned details, and many others as well, including making corresponding changes to .c source files and various kernel build files. Naturally, no scripting can correctly handle all possible cases in human-generated files, so part of the current patch set includes pre-patches that add markers to "coach" the scripts to do the right thing in those cases.

Writing scripts to automate this sort of task becomes a major programming project in its own right, and the shell and Perl scripts (.tar.xz archive) to accomplish the task run total more than 1800 lines. (Using scripting to generate the patch set has the notable benefit that the patch set can be automatically refreshed as the relevant kernel source files are changed by other kernel developers. Given that the UAPI patches touch a large number of files, this is an important consideration.)

Knowing the size of those scripts, and the effort that must have been required to write them, gives us a clue that the scale of the actual changes to the kernel code must be large. And indeed they are. In its current incarnation, the UAPI patch series consists of 74 commits, of which 65 are scripted (the scripted changes produce commits to the kernel source tree on a per-directory basis). Altogether, the patches touch more than 3500 files, and the diff of the changes amounts to over 300,000 lines.

The scale of these changes brings David to his next problem: how to get the changes accepted by Linus. The problem is that it's impossible to manually review source code changes of this magnitude. Even a partial review would require considerable effort, and would not provide ironclad guarantees about the remaining unreviewed changes. In the absence of such reviews, when Linus received David's request to pull these patches in the 3.5 merge window, he employed a time-honored strategy: the request was ignored.

Although David first started working on these changes around a year ago, Linus has not to date directly commented on them. However, back in January Linus accepted some preparatory patches for the UAPI work, which suggests that he's at least aware of the proposal and possibly willing to entertain it. Other kernel developers have expressed support for the UAPI split (1 and 2). However, probably because of the magnitude of the changes, getting actual reviews and Acked-by: tags has to date proved to be a challenge. Given the impossibility of a complete manual review of the changes, the best hope would seem to be to have other developers review the conceptual approach employed by David's scripts, possibly review the scripts themselves, perform a review of a sample of the changed kernel source files, and perform kernel builds on as many different architectures as possible. (Aspiring kernel hackers might note that much of the review task on this quite important piece of kernel work does not require deep understanding of the workings of the kernel.)

Getting sufficient review of any set of kernel patches, let alone a set this large, is a perennial difficulty. Things at least took a step forward with David's request to Linus to have the patches accepted for the currently open 3.6 merge window, when Arnd Bergmann provided his Acked-by: for the entire patch series. Whether that will prove enough, or whether Linus will want to see formal agreement from additional developers before accepting the patches is an open question. If it proves insufficient for this merge window, then perhaps a rethink will be required next time around about how to have such a large change accepted into the mainline kernel.

Index entries for this article
Kernel	Include files
Kernel	User-space API

(Log in to post comments)

Checking same object files produced

Posted Jul 26, 2012 10:59 UTC (Thu) by epa (subscriber, #39769) [Link]

I imagine that rearranging some header files will not change the generated code. So the correctness check can be making sure that the resulting kernel image is byte-for-byte identical (for various sets of config flags). The review process then isn't to check correctness of each line of patch, but to approve the general idea.

Checking same object files produced

Posted Jul 26, 2012 11:39 UTC (Thu) by NRArnot (subscriber, #3033) [Link]

That's assuming the compiler is perfectly well-behaved. It might be that the different ordering of #define statements changes the internal state of the compiler, leading to a change in the ordering of entities which are present in the generated code. The binary code would then be functionally identical but not byte-for-byte identical.

Checking same object files produced

Posted Jul 26, 2012 12:28 UTC (Thu) by ms (subscriber, #41272) [Link]

Take the generated assembly before and after and prove it's equivalent? Yes, I'll fetch my coat...

Checking same object files produced

Posted Jul 27, 2012 11:17 UTC (Fri) by nix (subscriber, #2304) [Link]

Given that the only #defines this code should be altering are those related to multiple-inclusion guards, I think we could describe any such ordering changes as a compiler bug. That is not to say that no such bugs exist, but it is likely that they'd also lead to inconsistent code generation when faced with identical output, and cause failures in GCC bootstrap comparison. (This has happened before, e.g. with optimizations depending on the ordering of pieces of compiler state in hash tables -- the infamous 'hashing by
address' bug class -- and that sort of thing.)

The UAPI header file split

Posted Jul 26, 2012 12:09 UTC (Thu) by krisis (guest, #70792) [Link]

I took a quick look at this patch set, and I've found some weirdness that I'm not sure what to make of.

The UAPI files seem to include the kernel headers.
For example: the include/uapi/linux/eventpoll.h file includes <linux/fcntl.h>.

The include stack would then look like this for a kernel compile:

linux/eventpoll.h
uapi/linux/eventpoll.h
linux/fcntl.h
uapi/linux/fcntl.h
...
etc.

This seems unnecessary and would make it possible to depend on declarations that are implicitly defined through the kernel versions of the header.

Unless I'm mistaken. Which is a very likely possibility.

The UAPI header file split

Posted Jul 26, 2012 15:19 UTC (Thu) by dhowells (subscriber, #55933) [Link]

> The UAPI files seem to include the kernel headers.

Yes. That is so - *if* you're using them for a kernel build. If you're using them for a userspace build, they only see each other.

It actually mirrors the way things work now. Just think of the way such kernel header files are usually constructed: a bunch of #includes that get put into the userspace header, then some userspace defs, then the __KERNEL__ guard, then the kernel space stuff.

With what I've constructed, it ends up being pretty much the same - except that the UAPI part of it is in a separate file.

> ...
> This seems unnecessary

Unfortunately, if I use uapi/ in the UAPI header files' #includes they will need processing to remove it on installation. The alternative would require the KAPI headers have different names in some way (either named something different or with different directory components) - they would have to be distinguished by cpp, right?

> and would make it possible to depend on declarations that are implicitly
> defined through the kernel versions of the header.

I'm not sure what you're getting at. #including a kernel header after the split must give you at least what you got from it before the split, even if some of those bits are obtained indirectly. I don't think there are any cases where you get anything more than you got before the split.

The UAPI header file split

Posted Jul 26, 2012 20:03 UTC (Thu) by krisis (guest, #70792) [Link]

What I was getting at, was that you could use this refactoring to get rid of some implicit includes and reduce the compile time.

After thinking about it, I've realized that it would be a massive undertaking, added to the already massive undertaking that this patch series is.

The UAPI header file split

Posted Jul 26, 2012 20:30 UTC (Thu) by dhowells (subscriber, #55933) [Link]

This patch series is just step 1...

The UAPI header file split

Posted Jul 27, 2012 4:32 UTC (Fri) by dlang (guest, #313) [Link]

sounds like a good topic for discussion at the kernel summit.

The UAPI header file split

Posted Jul 28, 2012 17:39 UTC (Sat) by ranmachan (guest, #21283) [Link]

Typo: "[Users-pace definitions]" slowed down for users who can't keep up with Kernels-pace. :)

The UAPI header file split

Posted Jul 29, 2012 16:40 UTC (Sun) by jake (editor, #205) [Link]

fixed now ...

jake

The UAPI header file split

Posted Aug 3, 2012 13:15 UTC (Fri) by marcH (subscriber, #57642) [Link]

> Sometimes, however, the sheer size of the code base means that refactoring becomes a formidable task—one that verges on being impossible if attempted manually. At that point, an enterprising kernel hacker may well turn to writing code that refactors the [kernel] code.

For Java such code has been written already: (at least) Eclipse does this on a routine basis.

Of course this is a much easier problem since Java has no pre-processor. I'm still impressed seeing it in action.

The UAPI header file split

Posted Mar 4, 2019 14:08 UTC (Mon) by anuj1560 (guest, #130768) [Link]

In file uapi/linux/i2c.h , it includes linux/types.h. Is this right ? I think it should include uapi/linux/types.h.