Rocky Linux Series #4: Build Dependencies (or: "It's Complicated")

Dependency debugging, illustrated:

Date: 2021-10-15

Today we'll go over RPM dependencies, and why they may not be as simple as they seem.


RPM Specs: Quick Review

RPMs have Requires: and Provides: listings which indicate the dependencies they need, and dependencies that they provide. These are part of the .spec file, the thing that defines the RPM package and how to build it.

Every DNF/yum repository maintains a collective gigantic index of these Requires and Provides, that cover all packages contained within. The indexes are searchable, so when DNF comes across a Requires that is not currently installed on your system, it can check the index and find which package(s) Provide it.

Quick Example:

Let's look at the DNF package manager itself - what does that package require? We can find out with repoquery , like this:

skip@skip-rpi:~$ repoquery --requires dnf
Last metadata expiration check: 0:00:26 ago on Thu 14 Oct 2021 08:56:13 PM EDT.
/bin/sh
python3-dnf = 4.4.2-11.el8

So one of the requires is /bin/sh, a basic shell. Let's see what package Provides that:

skip@skip-rpi:~$ dnf whatprovides '/bin/sh'
Last metadata expiration check: 0:02:53 ago on Thu 14 Oct 2021 08:56:13 PM EDT.
bash-4.4.20-1.el8_4.aarch64 : The GNU Bourne Again shell
Repo        : baseos
Matched from:
Provide    : /bin/sh

So we see that the bash package from Rocky's BaseOS repository provides the /bin/sh shell that DNF requires. Pretty simple takeaway: RPMs have things they require and provide, and DNF (or the older Yum) is very good at tracking down and matching these.



Build Time Requirements

In addition to Requires entries, source RPM .spec files have another kind of entry, called BuildRequires:. These are the dependencies that are needed to build the RPM, but not to install or run it. This makes them distinct from the Requires entries, but they often overlap.

A simple example of this: Many packages contain software written in C, and require the gcc package to compile their source. But obviously they don't need gcc to in order to install or run on your system! Gcc would therefore be a BuildRequires part of that source RPM.

Gathering and keeping track of these BuildRequires is hugely important when building a distribution, as they tend to be more numerous and more complicated than the simple install-time requirements of each package.



How do we start? (Hint: Look to your bootstraps!)

If we're going to build an entire RHEL 8's worth of RPM packages (ie. several thousand), we must have some kind of a base to start with. Even the simplest, most "basic" packages have plenty of BuildRequires, and we must satisfy them to even begin building everything.

In early development, the Rocky Linux team used CentOS 8.4 as a base to begin building our packages. Once we build a full set of RPMs using the CentOS repositories as requirements, we can put the set of new packages we produced in a repository. Any further builds can now do ourselves by pointing to the new Rocky repository instead of CentOS - now we have become self-building!

I've heard different names for this process, most often called Repository or Distribution Bootstrapping. We started with nothing, and used an outside source (in this case CentOS 8) to get us going!



*-devel and "Hidden" Dependencies

There's something about building RHEL in particular that many people don't realize: Not all of these build dependencies are present in the RHEL repositories! Let me explain:

Packages often require *-devel packages from libraries at build-time to compile successfully. These -devel packages can contain header files, or other relevant information that tell a piece of code how to build against a required library. For example, the popular bind package (a DNS server) requires openssl-devel to compile successfully, because Bind must communicate with the openssl library for its cryptographic functions.

These -devel libraries are generated along with their matching main package during an RPM build. When you build the "openssl" package, openssl-devel is also produced alongside it. Unfortunately, many of these -devel packages are not available in the Red Hat repositories. You cannot build all of the packages in RHEL 8 using onlythe RHEL 8 (or CentOS 8) repos. You must go out and produce those missing -devel packages by compiling the appropriate RPMs yourself!

Complicating matters further, some packages depend on packages not even present in RHEL 8. Going back to our bind example, one of its BuildRequires is the package kyua. Kyua is NOT available in any RHEL package repository. Kyua in turn depends on the lutok package, which itself depends on atf. This chain of dependencies needs to be built and available before we can produce bind!

Fortunately, all of these packages are available as Red Hat sources from https://git.centos.org . All the sources are well-maintained, and are easy to clone via Git (see previous article for more info about code storage). One of the early tasks in the primordial stages of the Rocky Linux build process was gathering all of these dependencies and figuring out the entire list of sources that need to be imported. We see above that it's not as simple as just "import all the packages that are in RHEL", many more than that are needed to do all the builds. You can see some of the dev team's early work on this documented in the Rocky Linux wiki, like on this page.



Let's talk briefly about: MODULES

Modules are groups of related packages that get built together and often with special build-time options.

For example, the mariadb module is for building MariaDB 10.3, 10.5, and all their related packages (as of this writing). The specification for building modules is written in YAML, and exists under the modules/ subdirectory in Gitlab. Example: the YAML to build MariaDB 10.5 is located here.

In the linked YAML, we see that there is a "build order" in which the packages need to be built. Packages later in the order will have BuildRequires on packages earlier in the build order, so the earlier packages must be built first. There is also a "macros" section, which defines special macros (variables) applied just to packages within this module. Additionally, other options in this YAML file are possible: Like a module that depends upon a particular version of another module to be enabled!

Now that we see how they work, it's apparent that building modules by hand is quite a chore. Each package in the module needs to be built in the correct order, and you'd need a unique Mock config file for each package in the module, all containing custom options. And after each individual RPM build, you would have to commit that package to a local repository, so it could be available as a BuildRequires for the next packages in the module's build order.

Fortunately, we have tools to automate this tedious work: MBS (Module Build Service (officially), and Rocky's own Ansible-based "Lazy Builder" (very unofficial, used for local builds). These tools are designed to read the YAML file for the module and execute these steps automatically. Much less tedious when our tools do this for us!

A module comparison: I like to think of the module YAML files as analogous to the .spec files of individual source RPMs. As each spec file is a kind of "recipe" for building an RPM package, each YAML is a recipe for building a modular group of packages (like our MariaDB example).



Conclusion

I hope you learned something about package (and module!) building from this article. My goal is to introduce the basic concepts, but also illustrate some of the complexity in doing this for an entire Linux distribution.

In my next article, I'm planning something special: a package building lab! We'll walk through grabbing the source for a couple of Rocky packages, and building them locally, from the ground up. With specific, step-by-step instructions. And commentary by me along the way, of course! Stay tuned!


Thanks for reading,

-Skip