I wouldn't call it *good* yet, but this will do for now.
- `RetrieveRegularNARSink` renamed to `RegularFileSink` and moved
accordingly because it actually has nothing to do with NARs in
particular.
- its `fd` field is also marked private
- `copyRecursive` introduced to dump a `SourceAccessor` into a
`ParseSink`.
- `NullParseSink` made so `ParseSink` no longer has sketchy default
methods.
This was done while updating #8918 to work with the new
`SourceAccessor`.
This introduces some shared infrastructure for our notion of protocols.
We can then define multiple protocols in terms of that notion.
We an also express how particular protocols depend on each other.
For example, we can define a common protocol and a worker protocol,
where the second depends on the first in terms of the data types it can
read and write.
The "serve" protocol can just use the common one for now, but will
eventually need its own machinary just like the worker protocol for
version-aware serialisers
Pass this around instead of `Source &` and `Sink &` directly. This will
give us something to put the protocol version on once the time comes.
To do this ergonomically, we need to expose `RemoteStore::Connection`,
so do that too. Give it some more API docs while we are at it.
See API docs on that struct for why. The pasing as as template argument
doesn't yet happen in that commit, but will instead happen in later
commit.
Also make `WorkerOp` (now `Op`) and enum struct. This led us to catch
that two operations were not handled!
Co-authored-by: Robert Hensing <roberth@users.noreply.github.com>
This is generally a fine practice: Putting implementations in headers
makes them harder to read and slows compilation. Unfortunately it is
necessary for templates, but we can ameliorate that by putting them in a
separate header. Only files which need to instantiate those templates
will need to include the header with the implementation; the rest can
just include the declaration.
This is now documenting in the contributing guide.
Also, it just happens that these polymorphic serializers are the
protocol agnostic ones. (Worker and serve protocol have the same logic
for these container types.) This means by doing this general template
cleanup, we are also getting a head start on better indicating which
code is protocol-specific and which code is shared between protocols.
This is the more typically way to do [Argument-dependent
lookup](https://en.cppreference.com/w/cpp/language/adl)-leveraging
generic serializers in C++. It makes the relationship between the `read`
and `write` methods more clear and rigorous, and also looks more
familiar to users coming from other languages that do not have C++'s
libertine ad-hoc overloading.
I am returning to this because during the review in
https://github.com/NixOS/nix/pull/6223, it came up as something that
would make the code easier to read --- easier today hopefully already,
but definitely easier if we were have multiple codified protocols with
code sharing between them as that PR seeks to accomplish.
If I recall correctly, the main criticism of this the first time around
(in 2020) was that having to specify the type when writing, e.g.
`WorkerProto<MyType>::write`, was too verbose and cumbersome. This is
now addressed with the `workerProtoWrite` wrapper function.
This method is also the way `nlohmann::json`, which we have used for a
number of years now, does its serializers, for what its worth.
This reverts commit 45a0ed82f0. That
commit in turn reverted 9ab07e99f5.
Instead, `Hash` uses `std::optional<HashType>`. In the future, we may
also make `Hash` itself require a known hash type, encoraging people to
use `std::optional<Hash>` instead.
The idea is it's always more flexible to consumer a `Source` than a
plain string, and it might even reduce memory consumption.
I also looked at `addToStoreFromDump` with its `// FIXME: remove?`, but
the worked needed for that is far more up for interpretation, so I
punted for now.
Most functions now take a StorePath argument rather than a Path (which
is just an alias for std::string). The StorePath constructor ensures
that the path is syntactically correct (i.e. it looks like
<store-dir>/<base32-hash>-<name>). Similarly, functions like
buildPaths() now take a StorePathWithOutputs, rather than abusing Path
by adding a '!<outputs>' suffix.
Note that the StorePath type is implemented in Rust. This involves
some hackery to allow Rust values to be used directly in C++, via a
helper type whose destructor calls the Rust type's drop()
function. The main issue is the dynamic nature of C++ move semantics:
after we have moved a Rust value, we should not call the drop function
on the original value. So when we move a value, we set the original
value to bitwise zero, and the destructor only calls drop() if the
value is not bitwise zero. This should be sufficient for most types.
Also lots of minor cleanups to the C++ API to make it more modern
(e.g. using std::optional and std::string_view in some places).
Functions like copyClosure() had 3 bool arguments, which creates a
severe risk of mixing up arguments.
Also, implement copyClosure() using copyPaths().
The store parameter "write-nar-listing=1" will cause BinaryCacheStore
to write a file ‘<store-hash>.ls.xz’ for each ‘<store-hash>.narinfo’
added to the binary cache. This file contains an XZ-compressed JSON
file describing the contents of the NAR, excluding the contents of
regular files.
E.g.
{
"version": 1,
"root": {
"type": "directory",
"entries": {
"lib": {
"type": "directory",
"entries": {
"Mcrt1.o": {
"type": "regular",
"size": 1288
},
"Scrt1.o": {
"type": "regular",
"size": 3920
},
}
}
}
...
}
}
(The actual file has no indentation.)
This is intended to speed up the NixOS channels programs index
generator [1], since fetching gazillions of large NARs from
cache.nixos.org is currently a bottleneck for updating the regular
(non-small) channel.
[1] https://github.com/NixOS/nixos-channel-scripts/blob/master/generate-programs-index.cc