2023-11-05 02:10:55 +02:00
# Complete Store Path Calculation
2024-04-10 00:07:39 +03:00
This is the complete specification for how [store path]s are calculated.
2023-11-05 02:10:55 +02:00
2024-02-12 18:21:54 +02:00
The format of this specification is close to [Extended Backus– Naur form ](https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_form ), but must deviate for a few things such as hash functions which we treat as bidirectional for specification purposes.
2023-11-05 02:10:55 +02:00
Regular users do *not* need to know this information --- store paths can be treated as black boxes computed from the properties of the store objects they refer to.
But for those interested in exactly how Nix works, e.g. if they are reimplementing it, this information can be useful.
2024-04-10 00:07:39 +03:00
[store path ](@docroot@/store/store-path.md )
2024-02-12 18:33:06 +02:00
## Store path proper
2024-02-12 18:16:53 +02:00
```ebnf
store-path = store-dir "/" digest "-" name
2023-11-05 02:10:55 +02:00
```
where
2024-02-12 18:26:02 +02:00
- `name` = the name of the store object.
- `store-dir` = the [store directory ](@docroot@/store/store-path.md#store-directory )
2024-02-12 18:22:08 +02:00
- `digest` = base-32 representation of the first 160 bits of a [SHA-256] hash of `fingerprint`
2023-11-05 02:10:55 +02:00
2024-02-12 18:26:02 +02:00
This the hash part of the store name
2023-11-05 02:10:55 +02:00
2024-02-12 18:33:06 +02:00
## Fingerprint
2024-02-12 18:16:53 +02:00
2024-02-12 18:33:06 +02:00
- ```ebnf
fingerprint = type ":" sha256 ":" inner-digest ":" store ":" name
2024-02-12 18:16:53 +02:00
```
2023-11-05 02:10:55 +02:00
Note that it includes the location of the store as well as the name to make sure that changes to either of those are reflected in the hash
(e.g. you won't get `/nix/store/<digest>-name1` and `/nix/store/<digest>-name2` , or `/gnu/store/<digest>-name1` , with equal hash parts).
2024-02-12 18:16:53 +02:00
- `type` = one of:
2023-11-05 02:10:55 +02:00
2024-02-12 18:16:53 +02:00
- ```ebnf
2024-04-10 00:07:39 +03:00
| "text" { ":" store-path }
2023-11-05 02:10:55 +02:00
```
2024-04-10 00:07:39 +03:00
This is for the
["Text" ](@docroot@/store/store-object/content-address.md#method-text )
method of content addressing store objects.
2024-02-12 18:16:53 +02:00
The optional trailing store paths are the references of the store object.
2023-11-05 02:10:55 +02:00
2024-02-12 18:16:53 +02:00
- ```ebnf
2024-04-10 00:07:39 +03:00
| "source" { ":" store-path } [ ":self" ]
2023-11-05 02:10:55 +02:00
```
2024-04-10 00:07:39 +03:00
This is for the
["Nix Archive" ](@docroot@/store/store-object/content-address.md#method-nix-archive )
method of content addressing store objects,
if the hash algorithm is [SHA-256].
Just like in the "Text" case, we can have the store objects referenced by their paths.
2023-11-05 02:10:55 +02:00
Additionally, we can have an optional `:self` label to denote self reference.
2024-02-12 18:16:53 +02:00
- ```ebnf
2024-02-12 18:33:06 +02:00
| "output:" id
2023-11-05 02:10:55 +02:00
```
For either the outputs built from derivations,
2024-04-10 00:07:39 +03:00
or content-addressed store objects that are not using one of the two above cases.
To be explicit about the latter, that is currently these methods:
- ["Flat" ](@docroot@/store/store-object/content-address.md#method-flat )
- ["Git" ](@docroot@/store/store-object/content-address.md#method-git )
- ["Nix Archive" ](@docroot@/store/store-object/content-address.md#method-nix-archive ) if the hash algorithm is not [SHA-256].
2023-11-05 02:10:55 +02:00
2024-02-12 18:16:53 +02:00
`id` is the name of the output (usually, "out").
For content-addressed store objects, `id` , is always "out".
2023-11-05 02:10:55 +02:00
2024-02-12 18:22:08 +02:00
- `inner-digest` = base-16 representation of a SHA-256 hash of `inner-fingerprint`
2023-11-05 02:10:55 +02:00
2024-02-12 18:33:06 +02:00
## Inner fingerprint
2024-02-12 18:22:08 +02:00
- `inner-fingerprint` = one of the following based on `type` :
2023-11-05 02:10:55 +02:00
2024-02-12 18:16:53 +02:00
- if `type` = `"text:" ...` :
2023-11-05 02:10:55 +02:00
the string written to the resulting store path.
2024-02-12 18:16:53 +02:00
- if `type` = `"source:" ...` :
2023-11-05 02:10:55 +02:00
2024-08-28 10:01:56 +03:00
the hash of the [Nix Archive (NAR)] serialization of the [file system object ](@docroot@/store/file-system-object.md ) of the store object.
2023-11-05 02:10:55 +02:00
2024-02-12 18:16:53 +02:00
- if `type` = `"output:" id` :
2023-11-05 02:10:55 +02:00
- For input-addressed derivation outputs:
the [ATerm ](@docroot@/protocols/derivation-aterm.md ) serialization of the derivation modulo fixed output derivations.
- For content-addressed store paths:
2024-02-12 18:16:53 +02:00
```ebnf
"fixed:out:" rec algo ":" hash ":"
```
where
2023-11-05 02:10:55 +02:00
2024-02-12 18:16:53 +02:00
- `rec` = one of:
2023-11-05 02:10:55 +02:00
2023-09-04 16:51:23 +03:00
- ```ebnf
| ""
```
(empty string) for hashes of the flat (single file) serialization
2024-02-12 18:33:06 +02:00
- ```ebnf
| "r:"
```
hashes of the for [Nix Archive (NAR)] (arbitrary file system object) serialization
2023-11-05 02:10:55 +02:00
2024-02-12 18:33:06 +02:00
- ```ebnf
2023-09-04 16:51:23 +03:00
| "git:"
2024-02-12 18:33:06 +02:00
```
2023-09-04 16:51:23 +03:00
hashes of the [Git blob/tree ](https://git-scm.com/book/en/v2/Git-Internals-Git-Objects ) [Merkel tree ](https://en.wikipedia.org/wiki/Merkle_tree ) format
2023-11-05 02:10:55 +02:00
2024-02-12 19:30:28 +02:00
- ```ebnf
2024-02-12 18:33:06 +02:00
algo = "md5" | "sha1" | "sha256"
```
2023-11-05 02:10:55 +02:00
2024-02-12 18:16:53 +02:00
- `hash` = base-16 representation of the path or flat hash of the contents of the path (or expected contents of the path for fixed-output derivations).
2023-11-05 02:10:55 +02:00
2024-02-12 18:33:06 +02:00
Note that `id` = `"out"` , regardless of the name part of the store path.
2024-02-12 18:16:53 +02:00
Also note that NAR + SHA-256 must not use this case, and instead must use the `type` = `"source:" ...` case.
2023-11-05 02:10:55 +02:00
2024-04-10 00:07:39 +03:00
[Nix Archive (NAR)]: @docroot@/store/file -system-object/content-address.md#serial-nix-archive
2024-04-10 00:07:39 +03:00
[SHA-256]: https://en.m.wikipedia.org/wiki/SHA-256
2023-11-05 02:10:55 +02:00
2024-02-12 19:01:54 +02:00
### Historical Note
2023-11-05 02:10:55 +02:00
2024-02-12 18:16:53 +02:00
The `type` = `"source:" ...` and `type` = `"output:out"` grammars technically overlap in purpose,
in that both can represent data hashed by its SHA-256 NAR serialization.
2023-11-05 02:10:55 +02:00
The original reason for this way of computing names was to prevent name collisions (for security).
For instance, the thinking was that it shouldn't be feasible to come up with a derivation whose output path collides with the path for a copied source.
2024-02-12 18:22:08 +02:00
The former would have an `inner-fingerprint` starting with `output:out:` , while the latter would have an `inner-fingerprint` starting with `source:` .
2023-11-05 02:10:55 +02:00
2024-02-12 18:34:54 +02:00
Since `64519cfd657d024ae6e2bb74cb21ad21b886fd2a` (2008), however, it was decided that separating derivation-produced vs manually-hashed content-addressed data like this was not useful.
2024-02-12 19:04:37 +02:00
Now, data that is content-addressed with SHA-256 + NAR-serialization always uses the `source:...` construction, regardless of how it was produced (manually or by derivation).
2023-11-05 02:10:55 +02:00
This allows freely switching between using [fixed-output derivations ](@docroot@/glossary.md#gloss-fixed-output-derivation ) for fetching, and fetching out-of-band and then manually adding.
It also removes the ambiguity from the grammar.