Document the Nix Archive format

This is adopted from Eelco's PhD thesis.
This commit is contained in:
John Ericson 2024-04-10 15:21:22 -04:00
parent a268c0de71
commit 3e5797e97f
2 changed files with 43 additions and 0 deletions

View file

@ -110,6 +110,7 @@
- [Derivation](protocols/json/derivation.md)
- [Serving Tarball Flakes](protocols/tarball-fetcher.md)
- [Store Path Specification](protocols/store-path.md)
- [Nix Archive (NAR) Format](protocols/nix-archive.md)
- [Derivation "ATerm" file format](protocols/derivation-aterm.md)
- [Glossary](glossary.md)
- [Contributing](contributing/index.md)

View file

@ -0,0 +1,42 @@
# Nix Archive (NAR) format
This is the complete specification of the Nix Archive format.
The Nix Archive format closely follows the abstract specification of a [file system object] tree,
because it is designed to serialize exactly that data structure.
[file system object]: @docroot@/store/file-system-object.md
The format of this specification is close to [Extended BackusNaur form](https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_form), with the exception of the `str(..)` function / parameterized rule, which length-prefixes and pads strings.
This makes the resulting binary format easier to parse.
Regular users do *not* need to know this information.
But for those interested in exactly how Nix works, e.g. if they are reimplementing it, this information can be useful.
```ebnf
nar = str("nix-archive-1"), nar-obj;
nar-obj = str("("), nar-obj-inner, str(")");
nar-obj-inner
= str("type"), str("regular") regular
| str("type"), str("symlink") symlink
| str("type"), str("directory") directory
;
regular = [ str("executable"), str("") ], str("contents"), str(contents);
symlink = str("target"), str(target);
(* side condition: directory entries must be ordered by their names *)
directory = str("type"), str("directory") { directory-entry };
directory-entry = str("entry"), str("("), str("name"), str(name), str("node"), nar-obj, str(")");
```
The `str` function / parameterized rule is defined as follows:
- `str(s)` = `int(|s|), pad(s);`
- `int(n)` = the 64-bit little endian representation of the number `n`
- `pad(s)` = the byte sequence `s`, padded with 0s to a multiple of 8 byte