Nix & Docker: Layer explicitly without duplicate packages!
I love nix2container! It allows you to build docker containers with nix, declaratively, and it even avoids writing large archive files to the disk.
However, there are some pitfalls when using it. Especially, when using it as I do: spelling out the contents of your layers explicitly.
Straight-forward image with explicit layering
Here is a container with bash
and zsh
included:
{ #...
layered = nix2container.buildImage {
name = "layered";
layers =
let layerDefs = [
{
deps = with nixpkgs; [ readline ];
}
{
deps = with nixpkgs; [ bashInteractive ];
}
{
deps = with nixpkgs; [ zsh ];
}
];
in builtins.map nix2container.buildLayer layerDefs;
config = {
Env = [
(let path = with nixpkgs; lib.makeBinPath [ bashInteractive zsh ];
in "PATH=${path}")
];
};
};
}
I love this! So straight-forward! Close to the on-disk format. I feel like having control without the fuzz.
And it works:
❯ docker run -it layered bash
bash-5.2#
exit
❯ docker run -it layered zsh
34c7a50b621e#
How efficient is that image? Does it contain duplicate files? Let's ask dive:
❯ dive --ci layered
Using default CI config
Image Source: docker://layered
Fetching image... (this can take a while for large images)
Analyzing image...
efficiency: 40.9436 %
wastedBytes: 142845873 bytes (143 MB)
userWastedPercent: 126.2491 %
Inefficient Files:
Count Wasted Space File Path
3 14 MB /nix/store/c0hkzndf6i162jymxmlirn9l6ypv7p3c-glibc-2.38-23/share/i18n/locales/cns11643_stroke
3 10 MB /nix/store/c0hkzndf6i162jymxmlirn9l6ypv7p3c-glibc-2.38-23/share/i18n/locales/iso14651_t1_common
3 6.1 MB /nix/store/c0hkzndf6i162jymxmlirn9l6ypv7p3c-glibc-2.38-23/lib/libc.so.6
[...]
Oh. That is not so great. What happened here?
bashInteractive
and zsh
share some dependencies. I already tried to factor
out readline
which also uses glibc
. But still, nix2container
repeats all
store paths, even if present in earlier layers.
Why is that?
nix2container.buildLayer
builds each layer independently,
nix2container.buildImage
assembles them in one image.
Deduplicating common dependencies (manually)
How do we make a layer aware of previous layers? We use the layers
attribute
of buildLayer
:
{ #...
layeredDeduplicated = nix2container.buildImage {
name = "layered";
layers =
let
commonLayer = {
deps = with nixpkgs; [ readline ];
};
layerDefs = [
{
deps = with nixpkgs; [ bashInteractive ];
layers = [ (nix2container.buildLayer commonLayer) ];
}
{
deps = with nixpkgs; [ zsh ];
layers = [ (nix2container.buildLayer commonLayer) ];
}
];
in builtins.map nix2container.buildLayer layerDefs;
config = {
Env = [
(let path = with nixpkgs; lib.makeBinPath [ bashInteractive zsh ];
in "PATH=${path}")
];
};
};
}
Let's have a look at dive's opinion on this:
Image Source: docker://layered
Fetching image... (this can take a while for large images)
Analyzing image...
efficiency: 100.0000 %
wastedBytes: 0 bytes (0 B)
userWastedPercent: 0.0000 %
Inefficient Files:
Count Wasted Space File Path
None
Results:
PASS: highestUserWastedPercent
SKIP: highestWastedBytes: rule disabled
PASS: lowestEfficiency
Result:PASS [Total:3] [Passed:2] [Failed:0] [Warn:0] [Skipped:1]
Dive is super happy! Nice!
Deduplication (automatic)
Wouldn't it be nice if later layers always skipped all store paths already contained in earlier layers?
This being nix, this is easy to generalize:
{ # ...
# Nest all layers so that prior layers are dependencies of later layers.
# This way, we should avoid redundant dependencies.
foldImageLayers = let
mergeToLayer = priorLayers: component:
assert builtins.isList priorLayers;
assert builtins.isAttrs component; let
layer = nix2container.buildLayer (component
// {
layers = priorLayers;
});
in
priorLayers ++ [layer];
in
layers: lib.foldl mergeToLayer [] layers;
}
Each layer will reference all prior layers in the layers
attribute. This might
be problematic with a large number of layers -- but hasn't been problematic so
far for me. I am sure one can optimize it.
Putting foldImageLayers
to good use:
{ # ...
layeredDeduplicatedAutomatic = nix2container.buildImage {
name = "layeredAutomatic";
layers =
let layerDefs = [
{ deps = with nixpkgs; [ readline ]; }
{ deps = with nixpkgs; [ bashInteractive ]; }
{ deps = with nixpkgs; [ zsh ]; }
];
in foldImageLayers layerDefs;
config = {
Env = [
(let path = with nixpkgs; lib.makeBinPath [ bashInteractive zsh ];
in "PATH=${path}")
];
};
};
}
This results again in a docker image without any redundant store paths! Dive is happy and so are we: We explicitly define the contents of each layer but automatically exclude all duplicate store paths!
The end
Note that nix2container
supports layering your dependencies automatically
using the algorithm described
here. Check out the
maxLayers
setting.
I like to have more control but maybe fully automatic is the best? I'm curious to hear your thoughts on the automatic vs. manual control in Docker container optimization. What has been your approach?
Feel free to get in touch with me with the information shared on my GitHub profile: kolloch
References
I am for sure not the first one that stumbled upon this. Initially, the discussion issue #41 of nix2container pointed me into the right direction!