Merge pull request #10479 from obsidiansystems/ca-fso-docs

Document file system object content addressing
This commit is contained in:
Robert Hensing 2024-05-15 22:52:53 +02:00 committed by GitHub
commit 303268bb71
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
33 changed files with 228 additions and 68 deletions

View file

@ -18,6 +18,7 @@
- [Uninstalling Nix](installation/uninstall.md) - [Uninstalling Nix](installation/uninstall.md)
- [Nix Store](store/index.md) - [Nix Store](store/index.md)
- [File System Object](store/file-system-object.md) - [File System Object](store/file-system-object.md)
- [Content-Addressing File System Objects](store/file-system-object/content-address.md)
- [Store Object](store/store-object.md) - [Store Object](store/store-object.md)
- [Store Path](store/store-path.md) - [Store Path](store/store-path.md)
- [Store Types](store/types/index.md) - [Store Types](store/types/index.md)

View file

@ -74,4 +74,4 @@ $ nix-collect-garbage -d
``` ```
[profiles]: @docroot@/command-ref/files/profiles.md [profiles]: @docroot@/command-ref/files/profiles.md
[store objects]: @docroot@/glossary.md#gloss-store-object [store objects]: @docroot@/store/store-object.md

View file

@ -49,7 +49,7 @@ Periodically deleting old generations is important to make garbage collection
effective. effective.
The is because profiles are also garbage collection roots — any [store object] reachable from a profile is "alive" and ineligible for deletion. The is because profiles are also garbage collection roots — any [store object] reachable from a profile is "alive" and ineligible for deletion.
[store object]: @docroot@/glossary.md#gloss-store-object [store object]: @docroot@/store/store-object.md
{{#include ./opt-common.md}} {{#include ./opt-common.md}}

View file

@ -17,7 +17,7 @@
The `--install` operation creates a new user environment. The `--install` operation creates a new user environment.
It is based on the current generation of the active [profile](@docroot@/command-ref/files/profiles.md), to which a set of [store paths] described by *args* is added. It is based on the current generation of the active [profile](@docroot@/command-ref/files/profiles.md), to which a set of [store paths] described by *args* is added.
[store paths]: @docroot@/glossary.md#gloss-store-path [store paths]: @docroot@/store/store-path.md
The arguments *args* map to store paths in a number of possible ways: The arguments *args* map to store paths in a number of possible ways:

View file

@ -20,16 +20,21 @@ an example.
The hash is computed over a *serialisation* of each path: a dump of The hash is computed over a *serialisation* of each path: a dump of
the file system tree rooted at the path. This allows directories and the file system tree rooted at the path. This allows directories and
symlinks to be hashed as well as regular files. The dump is in the symlinks to be hashed as well as regular files. The dump is in the
*NAR format* produced by [`nix-store *[Nix Archive (NAR)][Nix Archive] format* produced by [`nix-store
--dump`](@docroot@/command-ref/nix-store/dump.md). Thus, `nix-hash path` --dump`](@docroot@/command-ref/nix-store/dump.md). Thus, `nix-hash path`
yields the same cryptographic hash as `nix-store --dump path | yields the same cryptographic hash as `nix-store --dump path |
md5sum`. md5sum`.
[Nix Archive]: @docroot@/store/file-system-object/content-address.md#serial-nix-archive
# Options # Options
- `--flat`\ - `--flat`\
Print the cryptographic hash of the contents of each regular file Print the cryptographic hash of the contents of each regular file *path*.
*path*. That is, do not compute the hash over the dump of *path*. That is, instead of computing
the hash of the [Nix Archive (NAR)](@docroot@/store/file-system-object/content-address.md#serial-nix-archive) of *path*,
just [directly hash]((@docroot@/store/file-system-object/content-address.md#serial-flat) *path* as is.
This requires *path* to resolve to a regular file rather than directory.
The result is identical to that produced by the GNU commands The result is identical to that produced by the GNU commands
`md5sum` and `sha1sum`. `md5sum` and `sha1sum`.

View file

@ -1,6 +1,6 @@
# Name # Name
`nix-store --dump` - write a single path to a Nix Archive `nix-store --dump` - write a single path to a [Nix Archive]
## Synopsis ## Synopsis
@ -8,7 +8,7 @@
## Description ## Description
The operation `--dump` produces a NAR (Nix ARchive) file containing the The operation `--dump` produces a [NAR (Nix ARchive)][Nix Archive] file containing the
contents of the file system tree rooted at *path*. The archive is contents of the file system tree rooted at *path*. The archive is
written to standard output. written to standard output.
@ -33,6 +33,8 @@ but not other types of files (such as device nodes).
A Nix archive can be unpacked using `nix-store A Nix archive can be unpacked using `nix-store
--restore`. --restore`.
[Nix Archive]: @docroot@/store/file-system-object/content-address.md#serial-nix-archive
{{#include ./opt-common.md}} {{#include ./opt-common.md}}
{{#include ../opt-common.md}} {{#include ../opt-common.md}}

View file

@ -1,6 +1,6 @@
# Name # Name
`nix-store --export` - export store paths to a Nix Archive `nix-store --export` - export store paths to a [Nix Archive]
## Synopsis ## Synopsis
@ -11,7 +11,7 @@
The operation `--export` writes a serialisation of the specified store The operation `--export` writes a serialisation of the specified store
paths to standard output in a format that can be imported into another paths to standard output in a format that can be imported into another
Nix store with `nix-store --import`. This is like `nix-store Nix store with `nix-store --import`. This is like `nix-store
--dump`, except that the NAR archive produced by that command doesnt --dump`, except that the [Nix Archive (NAR)][Nix Archive] produced by that command doesnt
contain the necessary meta-information to allow it to be imported into contain the necessary meta-information to allow it to be imported into
another Nix store (namely, the set of references of the path). another Nix store (namely, the set of references of the path).
@ -19,6 +19,8 @@ This command does not produce a *closure* of the specified paths, so if
a store path references other store paths that are missing in the target a store path references other store paths that are missing in the target
Nix store, the import will fail. Nix store, the import will fail.
[Nix Archive]: @docroot@/store/file-system-object/content-address.md#serial-nix-archive
{{#include ./opt-common.md}} {{#include ./opt-common.md}}
{{#include ../opt-common.md}} {{#include ../opt-common.md}}

View file

@ -1,6 +1,8 @@
# Name # Name
`nix-store --import` - import Nix Archive into the store `nix-store --import` - import [Nix Archive] into the store
[Nix Archive]: @docroot@/store/file-system-object/content-address.md#serial-nix-archive
# Synopsis # Synopsis

View file

@ -12,7 +12,7 @@ The operation `--optimise` reduces Nix store disk space usage by finding
identical files in the store and hard-linking them to each other. It identical files in the store and hard-linking them to each other. It
typically reduces the size of the store by something like 25-35%. Only typically reduces the size of the store by something like 25-35%. Only
regular files and symlinks are hard-linked in this manner. Files are regular files and symlinks are hard-linked in this manner. Files are
considered identical when they have the same NAR archive serialisation: considered identical when they have the same [Nix Archive (NAR)][Nix Archive] serialisation:
that is, regular files must have the same contents and permission that is, regular files must have the same contents and permission
(executable or non-executable), and symlinks must have the same (executable or non-executable), and symlinks must have the same
contents. contents.
@ -38,3 +38,4 @@ hashing files in `/nix/store/qhqx7l2f1kmwihc9bnxs7rc159hsxnf3-gcc-4.1.1'
there are 114486 files with equal contents out of 215894 files in total there are 114486 files with equal contents out of 215894 files in total
``` ```
[Nix Archive]: @docroot@/store/file-system-object/content-address.md#serial-nix-archive

View file

@ -25,11 +25,11 @@ Each of *paths* is processed as follows:
If no substitutes are available and no store derivation is given, realisation fails. If no substitutes are available and no store derivation is given, realisation fails.
[store paths]: @docroot@/glossary.md#gloss-store-path [store paths]: @docroot@/store/store-path.md
[valid]: @docroot@/glossary.md#gloss-validity [valid]: @docroot@/glossary.md#gloss-validity
[store derivation]: @docroot@/glossary.md#gloss-store-derivation [store derivation]: @docroot@/glossary.md#gloss-store-derivation
[output paths]: @docroot@/glossary.md#gloss-output-path [output paths]: @docroot@/glossary.md#gloss-output-path
[store objects]: @docroot@/glossary.md#gloss-store-object [store objects]: @docroot@/store/store-object.md
[closure]: @docroot@/glossary.md#gloss-closure [closure]: @docroot@/glossary.md#gloss-closure
[substituters]: @docroot@/command-ref/conf-file.md#conf-substituters [substituters]: @docroot@/command-ref/conf-file.md#conf-substituters
[content-addressed derivations]: @docroot@/contributing/experimental-features.md#xp-feature-ca-derivations [content-addressed derivations]: @docroot@/contributing/experimental-features.md#xp-feature-ca-derivations

View file

@ -8,9 +8,11 @@
## Description ## Description
The operation `--restore` unpacks a NAR archive to *path*, which must The operation `--restore` unpacks a [Nix Archive (NAR)][Nix Archive] to *path*, which must
not already exist. The archive is read from standard input. not already exist. The archive is read from standard input.
[Nix Archive]: @docroot@/store/file-system-object/content-address.md#serial-nix-archive
{{#include ./opt-common.md}} {{#include ./opt-common.md}}
{{#include ../opt-common.md}} {{#include ../opt-common.md}}

View file

@ -147,7 +147,7 @@ Please observe these guidelines to ease reviews:
``` ```
A [store object] contains a [file system object] and [references] to other store objects. A [store object] contains a [file system object] and [references] to other store objects.
[store object]: @docroot@/glossary.md#gloss-store-object [store object]: @docroot@/store/store-object.md
[file system object]: @docroot@/architecture/file-system-object.md [file system object]: @docroot@/architecture/file-system-object.md
[references]: @docroot@/glossary.md#gloss-reference [references]: @docroot@/glossary.md#gloss-reference
``` ```

View file

@ -1,5 +1,24 @@
# Glossary # Glossary
- [content address]{#gloss-content-address}
A
[*content address*](https://en.wikipedia.org/wiki/Content-addressable_storage)
is a secure way to reference immutable data.
The reference is calculated directly from the content of the data being referenced, which means the reference is
[*tamper proof*](https://en.wikipedia.org/wiki/Tamperproofing)
--- variations of the data should always calculate to distinct content addresses.
For how Nix uses content addresses, see:
- [Content-Addressing File System Objects](@docroot@/store/file-system-object/content-address.md)
- [content-addressed store object](#gloss-content-addressed-store-object)
- [content-addressed derivation](#gloss-content-addressed-derivation)
Software Heritage's writing on [*Intrinsic and Extrinsic identifiers*](https://www.softwareheritage.org/2020/07/09/intrinsic-vs-extrinsic-identifiers) is also a good introduction to the value of content-addressing over other referencing schemes.
Besides content addressing, the Nix store also uses [input addressing](#gloss-input-addressed-store-object).
- [derivation]{#gloss-derivation} - [derivation]{#gloss-derivation}
A description of a build task. The result of a derivation is a A description of a build task. The result of a derivation is a
@ -266,13 +285,15 @@
See [installables](./command-ref/new-cli/nix.md#installables) for [`nix` commands](./command-ref/new-cli/nix.md) (experimental) for details. See [installables](./command-ref/new-cli/nix.md#installables) for [`nix` commands](./command-ref/new-cli/nix.md) (experimental) for details.
- [NAR]{#gloss-nar} - [Nix Archive (NAR)]{#gloss-nar}
A *N*ix *AR*chive. This is a serialisation of a path in the Nix A *N*ix *AR*chive. This is a serialisation of a path in the Nix
store. It can contain regular files, directories and symbolic store. It can contain regular files, directories and symbolic
links. NARs are generated and unpacked using `nix-store --dump` links. NARs are generated and unpacked using `nix-store --dump`
and `nix-store --restore`. and `nix-store --restore`.
See [Nix Archive](store/file-system-object/content-address.html#serial-nix-archive) for details.
- [`∅`]{#gloss-emtpy-set} - [`∅`]{#gloss-emtpy-set}
The empty set symbol. In the context of profile history, this denotes a package is not present in a particular version of the profile. The empty set symbol. In the context of profile history, this denotes a package is not present in a particular version of the profile.

View file

@ -199,19 +199,23 @@ Derivations can declare some infrequently used optional attributes.
The `outputHashMode` attribute determines how the hash is computed. The `outputHashMode` attribute determines how the hash is computed.
It must be one of the following two values: It must be one of the following two values:
- `"flat"`\ <!-- FIXME link to store object content-addressing not file system object content addressing once we have the page for that. -->
The output must be a non-executable regular file. If it isnt,
the build fails. The hash is simply computed over the contents - `"flat"`
of that file (so its equal to what Unix commands like
`sha256sum` or `sha1sum` produce). The output must be a non-executable regular file; if it isnt, the build fails.
The hash is
[simply computed over the contents of that file](@docroot@/store/file-system-object/content-address.md#serial-flat)
(so its equal to what Unix commands like `sha256sum` or `sha1sum` produce).
This is the default. This is the default.
- `"recursive"` or `"nar"`\ - `"recursive"` or `"nar"`
The hash is computed over the [NAR archive](@docroot@/glossary.md#gloss-nar) dump of the output
(i.e., the result of [`nix-store --dump`](@docroot@/command-ref/nix-store/dump.md)). In The hash is computed over the
this case, the output can be anything, including a directory [Nix Archive (NAR)](@docroot@/store/file-system-object/content-address.md#serial-nix-archive)
tree. dump of the output (i.e., the result of [`nix-store --dump`](@docroot@/command-ref/nix-store/dump.md)).
In this case, the output is allowed to be any [file system object], including directories and more.
`"recursive"` is the traditional way of indicating this, `"recursive"` is the traditional way of indicating this,
and is supported since 2005 (virtually the entire history of Nix). and is supported since 2005 (virtually the entire history of Nix).
@ -303,7 +307,7 @@ Derivations can declare some infrequently used optional attributes.
[`disallowedReferences`](#adv-attr-disallowedReferences) and [`disallowedRequisites`](#adv-attr-disallowedRequisites), [`disallowedReferences`](#adv-attr-disallowedReferences) and [`disallowedRequisites`](#adv-attr-disallowedRequisites),
the following attributes are available: the following attributes are available:
- `maxSize` defines the maximum size of the resulting [store object](@docroot@/glossary.md#gloss-store-object). - `maxSize` defines the maximum size of the resulting [store object](@docroot@/store/store-object.md).
- `maxClosureSize` defines the maximum size of the output's closure. - `maxClosureSize` defines the maximum size of the output's closure.
- `ignoreSelfRefs` controls whether self-references should be considered when - `ignoreSelfRefs` controls whether self-references should be considered when
checking for allowed references/requisites. checking for allowed references/requisites.

View file

@ -17,7 +17,7 @@ It outputs an attribute set, and produces a [store derivation] as a side effect
A symbolic name for the derivation. A symbolic name for the derivation.
It is added to the [store path] of the corresponding [store derivation] as well as to its [output paths](@docroot@/glossary.md#gloss-output-path). It is added to the [store path] of the corresponding [store derivation] as well as to its [output paths](@docroot@/glossary.md#gloss-output-path).
[store path]: @docroot@/glossary.md#gloss-store-path [store path]: @docroot@/store/store-path.md
> **Example** > **Example**
> >
@ -141,7 +141,7 @@ It outputs an attribute set, and produces a [store derivation] as a side effect
By default, a derivation produces a single output called `out`. By default, a derivation produces a single output called `out`.
However, derivations can produce multiple outputs. However, derivations can produce multiple outputs.
This allows the associated [store objects](@docroot@/glossary.md#gloss-store-object) and their [closures](@docroot@/glossary.md#gloss-closure) to be copied or garbage-collected separately. This allows the associated [store objects](@docroot@/store/store-object.md) and their [closures](@docroot@/glossary.md#gloss-closure) to be copied or garbage-collected separately.
> **Example** > **Example**
> >

View file

@ -2,9 +2,9 @@
The value of a Nix expression can depend on the contents of a [store object]. The value of a Nix expression can depend on the contents of a [store object].
[store object]: @docroot@/glossary.md#gloss-store-object [store object]: @docroot@/store/store-object.md
Passing an expression `expr` that evaluates to a [store path](@docroot@/glossary.md#gloss-store-path) to any built-in function which reads from the filesystem constitutes Import From Derivation (IFD): Passing an expression `expr` that evaluates to a [store path](@docroot@/store/store-path.md) to any built-in function which reads from the filesystem constitutes Import From Derivation (IFD):
- [`import`](./builtins.md#builtins-import)` expr` - [`import`](./builtins.md#builtins-import)` expr`
- [`builtins.readFile`](./builtins.md#builtins-readFile)` expr` - [`builtins.readFile`](./builtins.md#builtins-readFile)` expr`

View file

@ -128,7 +128,7 @@ The result is a string.
> The file or directory at *path* must exist and is copied to the [store]. > The file or directory at *path* must exist and is copied to the [store].
> The path appears in the result as the corresponding [store path]. > The path appears in the result as the corresponding [store path].
[store path]: @docroot@/glossary.md#gloss-store-path [store path]: @docroot@/store/store-path.md
[store]: @docroot@/glossary.md#gloss-store [store]: @docroot@/glossary.md#gloss-store
[String and path concatenation]: #string-and-path-concatenation [String and path concatenation]: #string-and-path-concatenation

View file

@ -107,9 +107,9 @@ An expression that is interpolated must evaluate to one of the following:
A string interpolates to itself. A string interpolates to itself.
A path in an interpolated expression is first copied into the Nix store, and the resulting string is the [store path] of the newly created [store object](@docroot@/glossary.md#gloss-store-object). A path in an interpolated expression is first copied into the Nix store, and the resulting string is the [store path] of the newly created [store object](@docroot@/store/store-object.md).
[store path]: @docroot@/glossary.md#gloss-store-path [store path]: @docroot@/store/store-path.md
> **Example** > **Example**
> >

View file

@ -124,7 +124,7 @@
For example, assume you used a file path in an interpolated string during a `nix repl` session. For example, assume you used a file path in an interpolated string during a `nix repl` session.
Later in the same session, after having changed the file contents, evaluating the interpolated string with the file path again might not return a new [store path], since Nix might not re-read the file contents. Use `:r` to reset the repl as needed. Later in the same session, after having changed the file contents, evaluating the interpolated string with the file path again might not return a new [store path], since Nix might not re-read the file contents. Use `:r` to reset the repl as needed.
[store path]: @docroot@/glossary.md#gloss-store-path [store path]: @docroot@/store/store-path.md
Path literals can also include [string interpolation], besides being [interpolated into other expressions]. Path literals can also include [string interpolation], besides being [interpolated into other expressions].

View file

@ -28,9 +28,9 @@ Info about a [store object].
Content address of this store object's file system object, used to compute its store path. Content address of this store object's file system object, used to compute its store path.
[store path]: @docroot@/glossary.md#gloss-store-path [store path]: @docroot@/store/store-path.md
[file system object]: @docroot@/store/file-system-object.md [file system object]: @docroot@/store/file-system-object.md
[Nix Archive]: @docroot@/glossary.md#gloss-nar [Nix Archive]: @docroot@/store/file-system-object/content-address.md#serial-nix-archive
## Impure fields ## Impure fields

View file

@ -1,9 +1,10 @@
# Nix Archive (NAR) format # Nix Archive (NAR) format
This is the complete specification of the Nix Archive format. This is the complete specification of the [Nix Archive] format.
The Nix Archive format closely follows the abstract specification of a [file system object] tree, The Nix Archive format closely follows the abstract specification of a [file system object] tree,
because it is designed to serialize exactly that data structure. because it is designed to serialize exactly that data structure.
[Nix Archive]: @docroot@/store/file-system-object/content-address.md#nix-archive
[file system object]: @docroot@/store/file-system-object.md [file system object]: @docroot@/store/file-system-object.md
The format of this specification is close to [Extended BackusNaur form](https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_form), with the exception of the `str(..)` function / parameterized rule, which length-prefixes and pads strings. The format of this specification is close to [Extended BackusNaur form](https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_form), with the exception of the `str(..)` function / parameterized rule, which length-prefixes and pads strings.

View file

@ -1,12 +1,14 @@
# Complete Store Path Calculation # Complete Store Path Calculation
This is the complete specification for how store paths are calculated. This is the complete specification for how [store path]s are calculated.
The format of this specification is close to [Extended BackusNaur form](https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_form), but must deviate for a few things such as hash functions which we treat as bidirectional for specification purposes. The format of this specification is close to [Extended BackusNaur form](https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_form), but must deviate for a few things such as hash functions which we treat as bidirectional for specification purposes.
Regular users do *not* need to know this information --- store paths can be treated as black boxes computed from the properties of the store objects they refer to. Regular users do *not* need to know this information --- store paths can be treated as black boxes computed from the properties of the store objects they refer to.
But for those interested in exactly how Nix works, e.g. if they are reimplementing it, this information can be useful. But for those interested in exactly how Nix works, e.g. if they are reimplementing it, this information can be useful.
[store path](@docroot@/store/store-path.md)
## Store path proper ## Store path proper
```ebnf ```ebnf
@ -113,7 +115,7 @@ where
Note that `id` = `"out"`, regardless of the name part of the store path. Note that `id` = `"out"`, regardless of the name part of the store path.
Also note that NAR + SHA-256 must not use this case, and instead must use the `type` = `"source:" ...` case. Also note that NAR + SHA-256 must not use this case, and instead must use the `type` = `"source:" ...` case.
[Nix Archive (NAR)]: @docroot@/glossary.md#gloss-NAR [Nix Archive (NAR)]: @docroot@/store/file-system-object/content-address.md#serial-nix-archive
[sha-256]: https://en.m.wikipedia.org/wiki/SHA-256 [sha-256]: https://en.m.wikipedia.org/wiki/SHA-256
### Historical Note ### Historical Note

View file

@ -22,7 +22,7 @@ Link: <flakeref>; rel="immutable"
*flakeref* must be a tarball flakeref. It can contain the tarball flake attributes *flakeref* must be a tarball flakeref. It can contain the tarball flake attributes
`narHash`, `rev`, `revCount` and `lastModified`. If `narHash` is included, its `narHash`, `rev`, `revCount` and `lastModified`. If `narHash` is included, its
value must be the NAR hash of the unpacked tarball (as computed via value must be the [NAR hash][Nix Archive] of the unpacked tarball (as computed via
`nix hash path`). Nix checks the contents of the returned tarball `nix hash path`). Nix checks the contents of the returned tarball
against the `narHash` attribute. The `rev` and `revCount` attributes against the `narHash` attribute. The `rev` and `revCount` attributes
are useful when the tarball flake is a mirror of a fetcher type that are useful when the tarball flake is a mirror of a fetcher type that
@ -40,3 +40,5 @@ Link: <https://example.org/hello/442793d9ec0584f6a6e82fa253850c8085bb150a.tar.gz
For tarball flakes, the value of the `lastModified` flake attribute is For tarball flakes, the value of the `lastModified` flake attribute is
defined as the timestamp of the newest file inside the tarball. defined as the timestamp of the newest file inside the tarball.
[Nix Archive]: @docroot@/store/file-system-object/content-address.md#serial-nix-archive

View file

@ -0,0 +1,80 @@
# Content-Addressing File System Objects
For many operations, Nix needs to calculate [a content addresses](@docroot@/glossary.md#gloss-content-address) of [a file system object][file system object].
Usually this is needed as part of content addressing [store objects], since store objects always have a root file system object.
But some command-line utilities also just work on "raw" file system objects, not part of any store object.
Every content addressing scheme Nix uses ultimately involves feeding data into a [hash function](https://en.wikipedia.org/wiki/Hash_function), and getting back an opaque fixed-size digest which is deemed a content address.
The various *methods* of content addressing thus differ in how abstract data (in this case, a file system object and its descendents) are fed into the hash function.
## Serialising File System Objects { #serial }
The simplest method is to serialise the entire file system object tree into a single binary string, and then hash that binary string, yielding the content address.
In this section we describe the currently-supported methods of serialising file system objects.
### Flat { #serial-flat }
A single file object can just be hashed by its contents.
This is not enough information to encode the fact that the file system object is a file,
but if we *already* know that the FSO is a single non-executable file by other means, it is sufficient.
### Nix Archive (NAR) { #serial-nix-archive }
For the other cases of [file system objects][file system object], especially directories with arbitrary descendents, we need a more complex serialisation format.
Examples of such serialisations are the ZIP and TAR file formats.
However, for our purposes these formats have two problems:
- They do not have a canonical serialisation, meaning that given an FSO, there can
be many different serialisations.
For instance, TAR files can have variable amounts of padding between archive members;
and some archive formats leave the order of directory entries undefined.
This would be bad because we use serialisation to compute cryptographic hashes over file system objects, and for those hashes to be useful as a content address or for integrity checking, uniqueness is crucial.
Otherwise, correct hashes would report false mismatches, and the store would fail to find the content.
- They store more information than we have in our notion of FSOs, such as time stamps.
This can cause FSOs that Nix should consider equal to hash to different values on different machines, just because the dates differ.
- As a practical consideration, the TAR format is the only truly universal format in the Unix environment.
It has many problems, such as an inability to deal with long file names and files larger than 2^33 bytes.
Current implementations such as GNU Tar work around these limitations in various ways.
For these reasons, Nix has its very own archive format—the Nix Archive (NAR) format,
which is carefully designed to avoid the problems described above.
The exact specification of the Nix Archive format is in `protocols/nix-archive.md`
## Content addressing File System Objects beyond a single serialisation pass
Serialising the entire tree and then hashing that binary string is not the only option for content addressing, however.
Another technique is that of a [Merkle graph](https://en.wikipedia.org/wiki/Merkle_tree), where previously computed hashes are included in subsequent byte strings to be hashed.
In particular, the Merkle graphs can match the original graph structure of file system objects:
we can first hash (serialised) child file system objects, and then hash parent objects using the hashes of their children in the serialisation (to be hashed) of the parent file system objects.
Currently, there is one such Merkle DAG content addressing method supported.
### Git ([experimental][xp-feature-git-hashing]) { #git }
> **Warning**
>
> This method is part of the [`git-hashing`][xp-feature-git-hashing] experimental feature.
Git's file system model is very close to Nix's, and so Git's content addressing method is a pretty good fit.
Just as with regular Git, files and symlinks are hashed as git "blobs", and directories are hashed as git "trees".
However, one difference between Nix's and Git's file system model needs special treatment.
Plain files, executable files, and symlinks are not differentiated as distinctly addressable objects, but by their context: by the directory entry that refers to them.
That means so long as the root object is a directory, there is no problem:
every non-directory object is owned by a parent directory, and the entry that refers to it provides the missing information.
However, if the root object is not a directory, then we have no way of knowing which one of an executable file, non-executable file, or symlink it is supposed to be.
In response to this, we have decided to treat a bare file as non-executable file.
This is similar to do what we do with [flat serialisation](#flat), which also lacks this information.
To avoid an address collision, attempts to hash a bare executable file or symlink will result in an error (just as would happen for flat serialisation also).
Thus, Git can encode some, but not all of Nix's "File System Objects", and this sort of content-addressing is likewise partial.
In the future, we may support a Git-like hash for such file system objects, or we may adopt another Merkle DAG format which is capable of representing all Nix file system objects.
[file system object]: ../file-system-object.md
[store object]: ../store-object.md
[xp-feature-git-hashing]: @docroot@/contributing/experimental-features.md#xp-feature-git-hashing

View file

@ -1,5 +1,11 @@
# Store Path # Store Path
> **Example**
>
> `/nix/store/a040m110amc4h71lds2jmr8qrkj2jhxd-git-2.38.1`
>
> A rendered store path
Nix implements references to [store objects](./index.md#store-object) as *store paths*. Nix implements references to [store objects](./index.md#store-object) as *store paths*.
Think of a store path as an [opaque], [unique identifier]: Think of a store path as an [opaque], [unique identifier]:
@ -37,6 +43,10 @@ A store path is rendered to a file system path as the concatenation of
> store directory digest name > store directory digest name
> ``` > ```
Exactly how the digest is calculated depends on the type of store path.
Store path digests are *supposed* to be opaque, and so for most operations, it is not necessary to know the details.
That said, the manual has a full [specification of store path digests](@docroot@/protocols/store-path.md).
## Store Directory ## Store Directory
Every [Nix store](./index.md) has a store directory. Every [Nix store](./index.md) has a store directory.

View file

@ -81,9 +81,15 @@ Args::Flag fileIngestionMethod(FileIngestionMethod * method)
How to compute the hash of the input. How to compute the hash of the input.
One of: One of:
- `nar` (the default): Serialises the input as an archive (following the [_Nix Archive Format_](https://edolstra.github.io/pubs/phd-thesis.pdf#page=101)) and passes that to the hash function. - `nar` (the default):
Serialises the input as a
[Nix Archive](@docroot@/store/file-system-object/content-address.md#serial-nix-archive)
and passes that to the hash function.
- `flat`: Assumes that the input is a single file and directly passes it to the hash function; - `flat`:
Assumes that the input is a single file and
[directly passes](@docroot@/store/file-system-object/content-address.md#serial-flat)
it to the hash function.
)", )",
.labels = {"file-ingestion-method"}, .labels = {"file-ingestion-method"},
.handler = {[method](std::string s) { .handler = {[method](std::string s) {
@ -97,16 +103,24 @@ Args::Flag contentAddressMethod(ContentAddressMethod * method)
return Args::Flag { return Args::Flag {
.longName = "mode", .longName = "mode",
// FIXME indentation carefully made for context, this is messed up. // FIXME indentation carefully made for context, this is messed up.
/* FIXME link to store object content-addressing not file system
object content addressing once we have that page. */
.description = R"( .description = R"(
How to compute the content-address of the store object. How to compute the content-address of the store object.
One of: One of:
- `nar` (the default): Serialises the input as an archive (following the [_Nix Archive Format_](https://edolstra.github.io/pubs/phd-thesis.pdf#page=101)) and passes that to the hash function. - `nar` (the default):
Serialises the input as a
[Nix Archive](@docroot@/store/file-system-object/content-address.md#serial-nix-archive)
and passes that to the hash function.
- `flat`: Assumes that the input is a single file and directly passes it to the hash function; - `flat`:
Assumes that the input is a single file and
[directly passes](@docroot@/store/file-system-object/content-address.md#serial-flat)
it to the hash function.
- `text`: Like `flat`, but used for - `text`: Like `flat`, but used for
[derivations](@docroot@/glossary.md#store-derivation) serialized in store object and [derivations](@docroot@/glossary.md#store-derivation) serialized in store object and
[`builtins.toFile`](@docroot@/language/builtins.html#builtins-toFile). [`builtins.toFile`](@docroot@/language/builtins.html#builtins-toFile).
For advanced use-cases only; For advanced use-cases only;
for regular usage prefer `nar` and `flat. for regular usage prefer `nar` and `flat.

View file

@ -4515,7 +4515,7 @@ void EvalState::createBaseEnv()
1683705525 1683705525
``` ```
The [store path](@docroot@/glossary.md#gloss-store-path) of a derivation depending on `currentTime` will differ for each evaluation, unless both evaluate `builtins.currentTime` in the same second. The [store path](@docroot@/store/store-path.md) of a derivation depending on `currentTime` will differ for each evaluation, unless both evaluate `builtins.currentTime` in the same second.
)", )",
.impureOnly = true, .impureOnly = true,
}); });

View file

@ -200,8 +200,8 @@ static RegisterPrimOp primop_fetchTree({
.doc = R"( .doc = R"(
Fetch a file system tree or a plain file using one of the supported backends and return an attribute set with: Fetch a file system tree or a plain file using one of the supported backends and return an attribute set with:
- the resulting fixed-output [store path](@docroot@/glossary.md#gloss-store-path) - the resulting fixed-output [store path](@docroot@/store/store-path.md)
- the corresponding [NAR](@docroot@/glossary.md#gloss-nar) hash - the corresponding [NAR](@docroot@/store/file-system-object/content-address.md#serial-nix-archive) hash
- backend-specific metadata (currently not documented). <!-- TODO: document output attributes --> - backend-specific metadata (currently not documented). <!-- TODO: document output attributes -->
*input* must be an attribute set with the following attributes: *input* must be an attribute set with the following attributes:

View file

@ -910,7 +910,7 @@ public:
"substituters", "substituters",
R"( R"(
A list of [URLs of Nix stores](@docroot@/store/types/index.md#store-url-format) to be used as substituters, separated by whitespace. A list of [URLs of Nix stores](@docroot@/store/types/index.md#store-url-format) to be used as substituters, separated by whitespace.
A substituter is an additional [store](@docroot@/glossary.md#gloss-store) from which Nix can obtain [store objects](@docroot@/glossary.md#gloss-store-object) instead of building them. A substituter is an additional [store](@docroot@/glossary.md#gloss-store) from which Nix can obtain [store objects](@docroot@/store/store-object.md) instead of building them.
Substituters are tried based on their priority value, which each substituter can set independently. Substituters are tried based on their priority value, which each substituter can set independently.
Lower value means higher priority. Lower value means higher priority.

View file

@ -13,7 +13,7 @@ struct Hash;
* \ref StorePath "Store path" is the fundamental reference type of Nix. * \ref StorePath "Store path" is the fundamental reference type of Nix.
* A store paths refers to a Store object. * A store paths refers to a Store object.
* *
* See glossary.html#gloss-store-path for more information on a * See store/store-path.html for more information on a
* conceptual level. * conceptual level.
*/ */
class StorePath class StorePath

View file

@ -12,16 +12,28 @@ struct SourcePath;
/** /**
* An enumeration of the ways we can serialize file system * An enumeration of the ways we can serialize file system
* objects. * objects.
*
* See `file-system-object/content-address.md#serial` in the manual for
* a user-facing description of this concept, but note that this type is also
* used for storing or sending copies; not just for addressing.
* Note also that there are other content addressing methods that don't
* correspond to a serialisation method.
*/ */
enum struct FileSerialisationMethod : uint8_t { enum struct FileSerialisationMethod : uint8_t {
/** /**
* Flat-file. The contents of a single file exactly. * Flat-file. The contents of a single file exactly.
*
* See `file-system-object/content-address.md#serial-flat` in the
* manual.
*/ */
Flat, Flat,
/** /**
* Nix Archive. Serializes the file-system object in * Nix Archive. Serializes the file-system object in
* Nix Archive format. * Nix Archive format.
*
* See `file-system-object/content-address.md#serial-nix-archive` in
* the manual.
*/ */
Recursive, Recursive,
}; };
@ -81,33 +93,32 @@ HashResult hashPath(
/** /**
* An enumeration of the ways we can ingest file system * An enumeration of the ways we can ingest file system
* objects, producing a hash or digest. * objects, producing a hash or digest.
*
* See `file-system-object/content-address.md` in the manual for a
* user-facing description of this concept.
*/ */
enum struct FileIngestionMethod : uint8_t { enum struct FileIngestionMethod : uint8_t {
/** /**
* Hash `FileSerialisationMethod::Flat` serialisation. * Hash `FileSerialisationMethod::Flat` serialisation.
*
* See `file-system-object/content-address.md#serial-flat` in the
* manual.
*/ */
Flat, Flat,
/** /**
* Hash `FileSerialisationMethod::Git` serialisation. * Hash `FileSerialisationMethod::Recursive` serialisation.
*
* See `file-system-object/content-address.md#serial-flat` in the
* manual.
*/ */
Recursive, Recursive,
/** /**
* Git hashing. In particular files are hashed as git "blobs", and * Git hashing.
* directories are hashed as git "trees".
* *
* Unlike `Flat` and `Recursive`, this is not a hash of a single * See `file-system-object/content-address.md#serial-git` in the
* serialisation but a [Merkle * manual.
* DAG](https://en.wikipedia.org/wiki/Merkle_tree) of multiple
* rounds of serialisation and hashing.
*
* @note Git's data model is slightly different, in that a plain
* file doesn't have an executable bit, directory entries do
* instead. We decide treat a bare file as non-executable by fiat,
* as we do with `FileIngestionMethod::Flat` which also lacks this
* information. Thus, Git can encode some but all of Nix's "File
* System Objects", and this sort of hashing is likewise partial.
*/ */
Git, Git,
}; };

View file

@ -50,7 +50,7 @@ By default, this command only shows top-level derivations, but with
`nix derivation show` outputs a JSON map of [store path]s to derivations in the following format: `nix derivation show` outputs a JSON map of [store path]s to derivations in the following format:
[store path]: @docroot@/glossary.md#gloss-store-path [store path]: @docroot@/store/store-path.md
{{#include ../../protocols/json/derivation.md}} {{#include ../../protocols/json/derivation.md}}

View file

@ -58,7 +58,7 @@ struct AuthorizationSettings : Config {
this, {"root"}, "trusted-users", this, {"root"}, "trusted-users",
R"( R"(
A list of user names, separated by whitespace. A list of user names, separated by whitespace.
These users will have additional rights when connecting to the Nix daemon, such as the ability to specify additional [substituters](#conf-substituters), or to import unsigned [NARs](@docroot@/glossary.md#gloss-nar). These users will have additional rights when connecting to the Nix daemon, such as the ability to specify additional [substituters](#conf-substituters), or to import unsigned realisations or unsigned input-addressed store objects.
You can also specify groups by prefixing names with `@`. You can also specify groups by prefixing names with `@`.
For instance, `@wheel` means all users in the `wheel` group. For instance, `@wheel` means all users in the `wheel` group.