diff --git a/doc/manual/src/SUMMARY.md.in b/doc/manual/src/SUMMARY.md.in index 167f54206..70dea4fbd 100644 --- a/doc/manual/src/SUMMARY.md.in +++ b/doc/manual/src/SUMMARY.md.in @@ -109,6 +109,7 @@ - [Store Object Info](protocols/json/store-object-info.md) - [Derivation](protocols/json/derivation.md) - [Serving Tarball Flakes](protocols/tarball-fetcher.md) + - [Store Path Specification](protocols/store-path.md) - [Derivation "ATerm" file format](protocols/derivation-aterm.md) - [Glossary](glossary.md) - [Contributing](contributing/index.md) diff --git a/doc/manual/src/protocols/store-path.md b/doc/manual/src/protocols/store-path.md new file mode 100644 index 000000000..fcf8038fc --- /dev/null +++ b/doc/manual/src/protocols/store-path.md @@ -0,0 +1,126 @@ +# Complete Store Path Calculation + +This is the complete specification for how store paths are calculated. + +The format of this specification is close to [Extended Backus–Naur form](https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_form), but must deviate for a few things such as hash functions which we treat as bidirectional for specification purposes. + +Regular users do *not* need to know this information --- store paths can be treated as black boxes computed from the properties of the store objects they refer to. +But for those interested in exactly how Nix works, e.g. if they are reimplementing it, this information can be useful. + +## Store path proper + +```ebnf +store-path = store-dir "/" digest "-" name +``` +where + +- `name` = the name of the store object. + +- `store-dir` = the [store directory](@docroot@/store/store-path.md#store-directory) + +- `digest` = base-32 representation of the first 160 bits of a [SHA-256] hash of `fingerprint` + + This the hash part of the store name + +## Fingerprint + +- ```ebnf + fingerprint = type ":" sha256 ":" inner-digest ":" store ":" name + ``` + + Note that it includes the location of the store as well as the name to make sure that changes to either of those are reflected in the hash + (e.g. you won't get `/nix/store/-name1` and `/nix/store/-name2`, or `/gnu/store/-name1`, with equal hash parts). + +- `type` = one of: + + - ```ebnf + | "text" ( ":" store-path )* + ``` + + for encoded derivations written to the store. + The optional trailing store paths are the references of the store object. + + - ```ebnf + | "source" ( ":" store-path )* + ``` + + For paths copied to the store and hashed via a [Nix Archive (NAR)] and [SHA-256][sha-256]. + Just like in the text case, we can have the store objects referenced by their paths. + Additionally, we can have an optional `:self` label to denote self reference. + + - ```ebnf + | "output:" id + ``` + + For either the outputs built from derivations, + paths copied to the store hashed that area single file hashed directly, or the via a hash algorithm other than [SHA-256][sha-256]. + (in that case "source" is used; this is only necessary for compatibility). + + `id` is the name of the output (usually, "out"). + For content-addressed store objects, `id`, is always "out". + +- `inner-digest` = base-16 representation of a SHA-256 hash of `inner-fingerprint` + +## Inner fingerprint + +- `inner-fingerprint` = one of the following based on `type`: + + - if `type` = `"text:" ...`: + + the string written to the resulting store path. + + - if `type` = `"source:" ...`: + + the the hash of the [Nix Archive (NAR)] serialization of the [file system object](@docroot@/store/file-system-object.md) of the store object. + + - if `type` = `"output:" id`: + + - For input-addressed derivation outputs: + + the [ATerm](@docroot@/protocols/derivation-aterm.md) serialization of the derivation modulo fixed output derivations. + + - For content-addressed store paths: + + ```ebnf + "fixed:out:" rec algo ":" hash ":" + ``` + + where + + - `rec` = one of: + + - ```ebnf + | "r:" + ``` + hashes of the for [Nix Archive (NAR)] (arbitrary file system object) serialization + + - ```ebnf + | "" + ``` + (empty string) for hashes of the flat (single file) serialization + + - ```ebnf + algo = "md5" | "sha1" | "sha256" + ``` + + - `hash` = base-16 representation of the path or flat hash of the contents of the path (or expected contents of the path for fixed-output derivations). + + Note that `id` = `"out"`, regardless of the name part of the store path. + Also note that NAR + SHA-256 must not use this case, and instead must use the `type` = `"source:" ...` case. + +[Nix Archive (NAR)]: @docroot@/glossary.md#gloss-NAR +[sha-256]: https://en.m.wikipedia.org/wiki/SHA-256 + +### Historical Note + +The `type` = `"source:" ...` and `type` = `"output:out"` grammars technically overlap in purpose, +in that both can represent data hashed by its SHA-256 NAR serialization. + +The original reason for this way of computing names was to prevent name collisions (for security). +For instance, the thinking was that it shouldn't be feasible to come up with a derivation whose output path collides with the path for a copied source. +The former would have an `inner-fingerprint` starting with `output:out:`, while the latter would have an `inner-fingerprint` starting with `source:`. + +Since `64519cfd657d024ae6e2bb74cb21ad21b886fd2a` (2008), however, it was decided that separating derivation-produced vs manually-hashed content-addressed data like this was not useful. +Now, data that is content-addressed with SHA-256 + NAR-serialization always uses the `source:...` construction, regardless of how it was produced (manually or by derivation). +This allows freely switching between using [fixed-output derivations](@docroot@/glossary.md#gloss-fixed-output-derivation) for fetching, and fetching out-of-band and then manually adding. +It also removes the ambiguity from the grammar. diff --git a/src/libstore/store-api.cc b/src/libstore/store-api.cc index e3715343e..4238cbbf5 100644 --- a/src/libstore/store-api.cc +++ b/src/libstore/store-api.cc @@ -65,85 +65,13 @@ StorePath Store::followLinksToStorePath(std::string_view path) const } -/* Store paths have the following form: +/* +The exact specification of store paths is in `protocols/store-path.md` +in the Nix manual. These few functions implement that specification. - = /- - - where - - = the location of the Nix store, usually /nix/store - - = a human readable name for the path, typically obtained - from the name attribute of the derivation, or the name of the - source file from which the store path is created. For derivation - outputs other than the default "out" output, the string "-" - is suffixed to . - - = base-32 representation of the first 160 bits of a SHA-256 - hash of ; the hash part of the store name - - = the string ":sha256:

::"; - note that it includes the location of the store as well as the - name to make sure that changes to either of those are reflected - in the hash (e.g. you won't get /nix/store/-name1 and - /nix/store/-name2 with equal hash parts). - - = one of: - "text:::..." - for plain text files written to the store using - addTextToStore(); ... are the store paths referenced - by this path, in the form described by - "source:::...::self" - for paths copied to the store using addToStore() when recursive - = true and hashAlgo = "sha256". Just like in the text case, we - can have the store paths referenced by the path. - Additionally, we can have an optional :self label to denote self - reference. - "output:" - for either the outputs created by derivations, OR paths copied - to the store using addToStore() with recursive != true or - hashAlgo != "sha256" (in that case "source" is used; it's - silly, but it's done that way for compatibility). is the - name of the output (usually, "out"). - -

= base-16 representation of a SHA-256 hash of - - = - if = "text:...": - the string written to the resulting store path - if = "source:...": - the serialisation of the path from which this store path is - copied, as returned by hashPath() - if = "output:": - for non-fixed derivation outputs: - the derivation (see hashDerivationModulo() in - primops.cc) - for paths copied by addToStore() or produced by fixed-output - derivations: - the string "fixed:out:::", where - = "r:" for recursive (path) hashes, or "" for flat - (file) hashes - = "md5", "sha1" or "sha256" - = base-16 representation of the path or flat hash of - the contents of the path (or expected contents of the - path for fixed-output derivations) - - Note that since an output derivation has always type output, while - something added by addToStore can have type output or source depending - on the hash, this means that the same input can be hashed differently - if added to the store via addToStore or via a derivation, in the sha256 - recursive case. - - It would have been nicer to handle fixed-output derivations under - "source", e.g. have something like "source:", but we're - stuck with this for now... - - The main reason for this way of computing names is to prevent name - collisions (for security). For instance, it shouldn't be feasible - to come up with a derivation whose output path collides with the - path for a copied source. The former would have a starting with - "output:out:", while the latter would have a starting with - "source:". +If changes to these functions go beyond mere implementation changes i.e. +also update the user-visible behavior, please update the specification +to match. */