Rocky Linux Series #3: Source Control

Making soap (or Rocky Linux)

Date: 2021-10-06

(I'm back with more tech coverage! I know it's been a while)

In this article I'll cover the Rocky Linux source code: where it's stored, how it's imported from RHEL, and how patches/debrands are performed.


Quick Review: RPM Sources

RPM packages are compiled from source RPMs ("SRPM" files). These SRPMs generally contain 3 things:

Rocky Linux (along with Fedora, RHEL, and CentOS) uses the Git version control system to store these package sources.


"Ok, Enough small talk. Point me to the sources!"

(TLDR: https://git.rockylinux.org/staging/rpms/ )

Rocky Linux package sources are stored under our Gitlab, under staging/rpms/ (the above-linked URL). Each repository under there represents exactly 1 source package, so there is a 1:1 mapping of Git repositories to SRPM files.

For example, the source RPM for the default Bash shell is here: https://git.rockylinux.org/staging/rpms/bash. Its contents are compiled into a SRPM, like this one for bash: https://download.rockylinux.org/pub/rocky/8.4/BaseOS/source/tree/Packages/bash-4.4.20-1.el8_4.src.rpm.

It's just that simple. All source packages are created from the code and specs that live in those Git repos.


"Hey, wait! You only have patches and specs in those repos, not the actual source. What are you HIDING!"

Shenanigans!

I promise that they're available! Just not stored in Git. Have a read:

Let's take a closer look at that bash source repo mentioned earlier here. Notice that there is a file in there called .bash.metadata. Have a look inside. You'll see a .tar.gz file and a SHA1SUM hash. Source files are named according to their hash, and served out of a simple web server (AWS S3, in Rocky Linux's case)

To continue our bash example, we see that the file is named bash-4.4.tar.gz and the sha1sum is: 8de012df1e4f3e91f571c3eb8ec45b43d7c747eb. Therefore, to get the tarball, simply download: https://rocky-linux-sources-staging.a1.rockylinux.org/8de012df1e4f3e91f571c3eb8ec45b43d7c747eb . The file can be renamed to the proper name from the .bash.metadata package, in this case bash-4.4.tar.gz. BAM! You've got the source code.

Rocky Linux gets its RHEL sources from git.centos.org, and it works exactly the same way. The Git repository is located here: https://git.centos.org/rpms/bash/tree/c8, and that same precise source tarball is located in here: https://git.centos.org/sources/bash/c8/.

Key Takeaways:


Side Note: "Why not use git-lfs? Why not just import source tar contents directly into Git?"

Now don't start THAT again!

These different approaches were debated, and considered, and debated again among the dev team. It was argued over so much that it's become a running joke in Rocky Release Engineering. The short answer is that all of these approaches could probably work, but each presents their own set of drawbacks. For example, Git with Large File Support ("git-lfs") for the archives would work well, but makes a possible future migration to another Git system significantly more complicated.

The decision to use dist-git (the method outlined above) was influenced by the fact that our upstream RHEL/Fedora systems also store their sources in this way, and it seems to work well enough.


Imports: Where do these sources come from, exactly?

Long story short: The Red Hat sources are stored at: https://git.centos.org/.

Many people get confused when they hear this. Doesn't Red Hat distribute source RPMs, and don't they also contain the source code? Yes, but those source RPMs themselves are a kind of binary product. And they are produced from source repositories like this one: https://git.centos.org/rpms/bash/tree/c8.

When the RHEL team decides to release an update to a package (like bash), it simultaneously pushes the update out to its DNF repos, as well as commits to the CentOS Git repository. This way, sources are always in sync with the released RPMs in RHEL.

Once an update comes in to a package on git.centos.org, Rocky Linux has an automated process that imports it into our own Git. The source code binary (tar.gz) file will also get pulled from the CentOS sources web folder into our own storage. This automatic import tool was developed early in the Rocky project, and was mentioned in a previous article: Srpmproc.


Branches

Branches in CentOS (and Rocky's) Git don't work like a traditional software project. Branches are maintained separately, based on the separate major releases of the distro. In CentOS, for example, there is a branch for CentOS 4, one for CentOS 5, one for CentOS 6, etc. There is no "main" or "master" branch - each branch stands on its own based on its major version.

Generally, Rocky will only import the branches relevant to RHEL 8, which of course what we want to (re)build.

Simple Example: Bash (CentOS Bash and Rocky Bash)

Looking at the CentOS source, we see that this package has several Git branches to it.

Rocky's convention is to take the "c" (CentOS) and replace it with an "r" (Rocky). So the "c8" branch of bash in CentOS becomes the "r8" branch in Rocky. Simple, right?


Slightly More Complicated: Modules! (CentOS Nginx and Rocky Nginx )

It's an ongoing theme that modular stream RPMs complicate everything! Looking at these CentOS branches in the link, why do they have those funny "-stream" names in them!? Remember, modular streams are a way to package multiple major versions of the software. RHEL 8 (and Rocky 8) carry Nginx 1.14 , 1.16 , AND 1.18. And you can flip between them on your installed machines via dnf.

But if we're going to carry 3 different versions, we need a Git branch for each one! That's where the -stream-##.## nomenclature comes in. It indicates which version of RHEL/CentOS we're on, and which major version of a package the branch belongs to.

So, again looking at the branches:

As we see, different branches must be maintained for RHEL (c8-stream-*) and CentOS Stream (c8s-stream-*). The names are a little confusing at first, becase the name "CentOS Stream" is distinct from the package also being a "modular stream" package.

Just as in the bash source, Rocky simply imports the branches from git.centos.org and renames the branches to begin with "r". So c8-stream-1.14 becomes r8-stream-1.14, etc. We thought it was logical enough ;-) .


Debranding and Patching

Most packages that Rocky imports are taken as-is from git.centos.org, with absolutely no modification. Some packages, however, must have parts modified due to trademark issues. We are not allowed to redistribute trademarked images, text, or other media from other companies or entities without their permission!

We accomplish this, of course, automatically via srpmproc. It's easiest to illustrate this with a walk-through of how one of these packages is imported. Let's say we want to import nginx (all 3 versions) into Rocky. (Nginx is a package which requires debranding on import) These are the steps taken:

  1. Srpmproc (running locally) searches https://git.centos.org/rpms/nginx/branches and identifies which branches need to be imported. It clones the project and saves those branches locally

  2. It then checks to see if the "nginx" project exists under the Rocky Linux "patch" git group: https://git.rockylinux.org/staging/patch/

  3. Ah ha! Nginx does exist under there: https://git.rockylinux.org/staging/patch/nginx/ . Srpmproc clones this project and reads the special patching/config instructions

  4. Patch instructions are read, and the sources for Nginx are patched by srpmproc locally, before final landing in git.rockylinux.org.

  5. The final patched sources are pushed to the Nginx package repo: https://git.rockylinux.org/staging/rpms/nginx

The key is that patch/ folder in Rocky's Gitlab. If a package has a corresponding matched name under https://git.rockylinux.org/staging/patch/, the configuration in that patch repository will be applied. The patch repository has branches just like the package repo, so we can make sure each version gets a proper patch (ex: c8-stream-1.14 would get the c8-stream-1.14 patch branch). We can also just have a "main" branch in the patch/nginx/ repository which applies that same patch to all nginx branches.

The patching process is quite powerful. New patches can be inserted and RPM .spec files can be automatically manipulated upon source import. The config language is proto3, which is a commonly used configuration format.

More in-depth documentation about the process is available on the Wiki: https://wiki.rockylinux.org/en/team/development/debranding/how-to.


"Rocky Originals" and Git Structure

The vast majority of the Rocky Linux packages are imported from Red Hat, but not quite all. We have a need to host our own original packages, such as the rocky-release package, or the rocky-logos-httpd package.

We keep these things separate in Git, just like we keep our Rocky-specific patches separate. We strive to keep our upstream absolutely pure! Our whole goal is to get as close as possible to RHEL, and it should be well-documented in Git whenever we deviate from their sources in any way.

Here is a brief synopsis of our Gitlab locations:


Hey - What about Github!

This is a bit confusing, but bear with me! Rocky Linux has a self-hosted Gitlab instance that I've been linking to here (https://git.rockylinux.org). BUT, it also has a popular public Github account: https://github.com/rocky-linux/.

In broad strokes, the Gitlab instance is used to host the Rocky Linux distro itself. Packages, modules, patches, all things that ultimately go into Rocky Linux packages and get released to end-users.

Github is used to host everything else. Examples include the public website code, testing tools, documentation, and developer tools (like srpmproc). There are a ton of extra pieces around the project that are not in Rocky Linux proper, but still must be built and maintained. This "division of the Gits" is not a hard and fast rule, but grew organically near the project's beginning. It's worked out pretty well so far. Part of it is practical as well: Github doesn't appreciate it when a single account creates ~3100 new repositories, which is what we'd need to host all our source packages!


Closing

I hope you learned something about how Rocky (and RHEL!) are hosted, and why their source code is put where it is.

My next article is going to be all about our favorite friend in the RPM world: dependencies! It sounds simple enough, but we'll learn why it's not always so straightforward, especially with a massive enterprise-grade distro to compile. I've got some painful recent memories surrounding this topic, so I'm of course looking forward to it :-) .

Thanks for reading,

-Skip