Peter Kolloch - Blog - Nix: Authenticated Fetches from GitLab

In our quest to establish Nix for distributing developer toolchains, we depend also on private, company-specific tools. We do not want to nixify every build but simply fetch the existing build artifacts using GitLab authentication.

This might not be exactly what you need but I assume it is incredibly common to want to fetch artifacts from some authenticated API.

I discussed this with John Ericson and he gave me some pointers to current work in progress or proposals that might be related. Thank you!

Not exactly being familiar with core Nix development, the impact of these issues is not always easy for me to understand. Therefore, I thought it might also help others if I published the notes while reading through the suggested material.

Nix Store ACLs

Reference: RFC 0143

Implement a way to only allow user access to a store path if they provide proof that they have all the necessary sources available, or had the access permission explicitly granted to them.

Well written!

Obviously, this is related to stores and is superficially nearly the opposite to what we want: This is about restricting access to parts of a store -- not being more permissive.

But, could we not require substitution of all private artifacts? I.e. ensure that all private artifacts are available in our shared Nix store (cache)? The fetches would just be "Fixed Output Derivations", meaning that they already specify the hash of the fetched artifact, and before attempting the fetch our nix client would check if it is already available in the cache.

This would not even require this RFC but simply a private Nix store, in our case via AWS S3.

I have some concerns:

The fetching "builder code" would be misleading since the intention is that it would never be executed. I vaguely remember there is a construct in Nixpkgs that models "you need to download this independently and upload it to your store", so that could be remedied.
Someone or something would still need to get access to the artifact and upload it to the shared store. That might be relatively easy to achieve (e.g. by putting the upload into the CI/CD pipeline of the binary) or difficult, depending as so often on the context. For us, it would be acceptable, even though I'd like to keep the CI/CD pipelines untouched for now so that the nix related changes keep in a few isolated repos.
I conceptualize the AWS S3 Store that we set up as "cache" that does not need to be backed up etc. The requirement of always having all private binaries available there, would invalidate that.

Where would the RFC be of use for us?

The RFC itself mentions that this is probably not a good idea to use this for secrets. So we should not use it to e.g. store GitLab credentials in a store path in a protected manner. But I'll think more about that.

What the RFC does allow is to allow access to some binaries without making their full build recipes and source available. That could be nice but is generally not our problem: The build of the binaries is not nixified anyways and thus Nix doesn't see the source. Furthermore, all our employees can see pretty much all our source code and we like it that way.

In theory, the RFC could be a solution for bootstrapping: In our case, for even authenticating, we need certain authentication helpers. With the RFC, we could expose the parts of the store publicly that are strictly needed for bootstrapping publicly and all the rest only privately.

But all in all, it looks quite complicated to pull off and is not a good match for our needs, I think.

Builtin fetching should be represented by derivations

Reference: Nix Github Issue #9077

Currently, there are two fundamental different ways of "fetching":

"built-in fetchers" (libfetchers): These are functions of "Nix the language" and are executed directly while evaluating the build tree.
"fixed-output derivation fetchers": Nix allows them to access the network by virtue of an easily verifiable promise. FODs need to specify the hash of their output upfront.

This has some indirect consequences which is why, depending on their needs, prefer one over the other.

"Built-in" fetchers run wherever the Nix expressions are evaluated - in the environment of the user that invoked the build. That has an immediate application for us: If the user is authenticated, the fetcher also has access to the associated credentials!

Yet, it also requires that we have an appropriate fetcher directly built into Nix, which makes this weirdly inflexible: The built-in fetchers do not e.g. allow setting an authentication header from an environment variable. That makes it impossible fetching job artifacts via the GitLab API that does not allow basic authentication which would be supported via credentials via the ~~insanely limited, not even supporting passwords with spaces~~ time-proven netrc mechanism.

In addition to this inflexibility due to the nature of being in the Nix code (live with what we Elders have foreseen, Young Jedi), there are also performance issues, e.g. builtin-fetchers are only executed one by one and not in parallel with derivations. This is what the issues strives to fix that and more (I believe). That is really cool!

I am not clear, though, if this would mean that the built-in fetchers would still be executed by the build client, e.g. the user that called nix build. Let's assume that in good faith for now, otherwise this would make authentication more difficult not easier. (Note from future Peter: This gets actually addressed in the next section!)

So in a future world where Nix contained a built-in fetcher suitable for GitLab, this could mitigate some of the downsides of built-in fetchers.

How would it work then with remote builders?

For the first time:

nix build gets invoked.
Fetch from GitLab running in the same process or at least environment as the nix build process so it has access to all user authentication.
The result is uploaded to the builders from the local machine...
build continues
Someway or the other, the result of the fetch ends up in our shared Nix store/substituter.

Afterwards:

nix build gets invoked.
The result gets substituted which is also fast on remote builders.
Build continues...

Not too bad, but actually nearly not at all dependent on issue #9077. It mostly depends on supporting GitLab by having more flexible built-in fetchers. Potentially, I could do that with a nix plugin or patch.

What is really cool about this solution is that the credentials never leave the client. Which makes it both simple and secure. But implementing that issue in a way that would allow executing these fetches in parallel but on the client is not really trivial.

BTW, built-in fetchers have other super powers (e.g. not necessarily requiring an output hash but simply a git hash) and FODs have some other problems (e.g. being unwantedly substituted when the URL but not the hash is changed).

Optional client-side building

Reference: Nix Github Issue #9344

For fixed output derivations that need authentication, it is probably better to run them as the current user in order to give them secrets, especially ephemeral secrets (like expiring tokens) that might require some humans in the loop (various 2fa schemes) and are cumbersome to store in the store.

Builtin fetching should be represented by derivations #9077 once the above is sorted out, we should do this too. Currently the main reason fetching is not done with derivations is authentication. This provides a proper solution. All fetching done as client-side derivations nicely meets in the middle of the current fixed-output derivations vs libfetchers divide.

Huh! John addressed my previous concerns here without me even realizing. Thanks again!

After this John dives directly into implementation details that are hard to follow for me:

General decoupling. Building shouldn't depend on using the SQL database (and I don't think it currently does). It ought to work with other stores that also provide a bike system view.

I assume that he means "build system view" but I can't follow the relationship to the SQLite database.

Conclusion

It was interesting to learn about the different proposals. Some of them would be awesome improvements, the last one would nearly solve my problem.

Short term I see the following possibilities for allowing private fetches from GitLab:

Patching nix or writing a plugin so that I can use a more flexible fetchurl built-in that supports setting headers from environment variables. That would allow me to pass credentials without somehow having them end up in the store.
Using a custom FOD to fetch from GitLab and have a mechanism to pass the GitLab API token to the builder.

The second one could be easily achieved by similar means as described here: E.g. allowing trusted clients to change allowlisted environment variables in the builder. Since that is not possible yet and requires further discussion, I implemented a hacky way to do that in our internal PoC. I might write a separate blog post about that!

If you want to know more about my current journey and context, check out Nix: Distributing private/public binaries.

Noticed a mistake? Want to discuss something with me? Feel free to discuss

Alos, feel free to get in touch with me with the information shared on my GitHub profile kolloch or Twitter @pkolloch.