after #6218 `Symbol` no longer confers a uniqueness invariant on the
string it wraps, it is now possible to create multiple symbols that
compare equal but whose string contents have different addresses. this
guarantee is now only provided by `SymbolIdx`, leaving `Symbol` only as
a string wrapper that knows about the intricacies of how symbols need to
be formatted for output.
this change renames `SymbolIdx` to `Symbol` to restore the previous
semantics of `Symbol` to that name. we also keep the wrapper type and
rename it to `SymbolStr` instead of returning plain strings from lookups
into the symbol table because symbols are formatted for output in many
places. theoretically we do not need `SymbolStr`, only a function that
formats a string for output as a symbol, but having to wrap every symbol
that appears in a message into eg `formatSymbol()` is error-prone and
inconvient.
this slightly increases the amount of memory used for any given symbol, but this
increase is more than made up for if the symbol is referenced more than once in
the EvalState that holds it. on average every symbol should be referenced at
least twice (once to introduce a binding, once to use it), so we expect no
increase in memory on average.
symbol tables are limited to 2³² entries like position tables, and similar
arguments apply to why overflow is not likely: 2³² symbols would require as many
string instances (at 24 bytes each) and map entries (at 24 bytes or more each,
assuming that the map holds on average at most one item per bucket as the docs
say). a full symbol table would require at least 192GB of memory just for
symbols, which is well out of reach. (an ofborg eval of nixpks today creates
less than a million symbols!)
PosTable deduplicates origin information, so using symbols for paths is no
longer necessary. moving away from path Symbols also reduces the usage of
symbols for things that are not keys in attribute sets, which will become
important in the future when we turn symbols into indices as well.
Pos objects are somewhat wasteful as they duplicate the origin file name and
input type for each object. on files that produce more than one Pos when parsed
this a sizeable waste of memory (one pointer per Pos). the same goes for
ptr<Pos> on 64 bit machines: parsing enough source to require 8 bytes to locate
a position would need at least 8GB of input and 64GB of expression memory. it's
not likely that we'll hit that any time soon, so we can use a uint32_t index to
locate positions instead.
when we introduce position and symbol tables we'll need to do lookups to turn
indices into those tables into actual positions/symbols. having the error
functions as members of EvalState will avoid a lot of churn for adding lookups
into the tables for each caller.
we don't *need* symbols here. the only advantage they have over strings is
making call-counting slightly faster, but that's a diagnostic feature and thus
needn't be optimized.
this also fixes a move bug that previously didn't show up: PrimOp structs were
accessed after being moved from, which technically invalidates them. previously
the names remained valid because Symbol copies on move, but strings are
invalidated. we now copy the entire primop struct instead of moving since primop
registration happen once and are not performance-sensitive.
Impure derivations are derivations that can produce a different result
every time they're built. Example:
stdenv.mkDerivation {
name = "impure";
__impure = true; # marks this derivation as impure
outputHashAlgo = "sha256";
outputHashMode = "recursive";
buildCommand = "date > $out";
};
Some important characteristics:
* This requires the 'impure-derivations' experimental feature.
* Impure derivations are not "cached". Thus, running "nix-build" on
the example above multiple times will cause a rebuild every time.
* They are implemented similar to CA derivations, i.e. the output is
moved to a content-addressed path in the store. The difference is
that we don't register a realisation in the Nix database.
* Pure derivations are not allowed to depend on impure derivations. In
the future fixed-output derivations will be allowed to depend on
impure derivations, thus forming an "impurity barrier" in the
dependency graph.
* When sandboxing is enabled, impure derivations can access the
network in the same way as fixed-output derivations. In relaxed
sandboxing mode, they can access the local filesystem.
This is useful whenever we want to evaluate something to a store path
(e.g. in get-drvs.cc).
Extracted from the lazy-trees branch (where we can require that a
store path must come from a store source tree accessor).
This was introduced in #6174. However fetch{url,Tarball} are legacy
and we shouldn't have an undocumented attribute that does the same
thing as one that already exists ('sha256').
we'll retain the old coerceToString interface that returns a string, but callers
that don't need the returned value to outlive the Value it came from can save
copies by using the new interface instead. for values that weren't stringy we'll
pass a new buffer argument that'll be used for storage and shouldn't be
inspected.
keeping it as a simple data member means it won't be scanned by the GC, so
eventually the GC will collect a cache that is still referenced (resulting in
use-after-free of cache elements).
fixes#5962
- Make passing the position to `forceValue` mandatory,
this way we remember people that the position is
important for better error messages
- Add pos to all `forceValue` calls
gives about 1% improvement on system eval, a bit less on nix search.
# before
nix search --no-eval-cache --offline ../nixpkgs hello
Time (mean ± σ): 7.419 s ± 0.045 s [User: 6.362 s, System: 0.794 s]
Range (min … max): 7.335 s … 7.517 s 20 runs
nix eval --raw --impure --expr 'with import <nixpkgs/nixos> {}; system'
Time (mean ± σ): 2.921 s ± 0.023 s [User: 2.626 s, System: 0.210 s]
Range (min … max): 2.883 s … 2.957 s 20 runs
# after
nix search --no-eval-cache --offline ../nixpkgs hello
Time (mean ± σ): 7.370 s ± 0.059 s [User: 6.333 s, System: 0.791 s]
Range (min … max): 7.286 s … 7.541 s 20 runs
nix eval --raw --impure --expr 'with import <nixpkgs/nixos> {}; system'
Time (mean ± σ): 2.891 s ± 0.033 s [User: 2.606 s, System: 0.210 s]
Range (min … max): 2.823 s … 2.958 s 20 runs
when given a string yacc will copy the entire input to a newly allocated
location so that it can add a second terminating NUL byte. since the
parser is a very internal thing to EvalState we can ensure that having
two terminating NUL bytes is always possible without copying, and have
the parser itself merely check that the expected NULs are present.
# before
Benchmark 1: nix search --offline nixpkgs hello
Time (mean ± σ): 572.4 ms ± 2.3 ms [User: 563.4 ms, System: 8.6 ms]
Range (min … max): 566.9 ms … 579.1 ms 50 runs
Benchmark 2: nix eval -f ../nixpkgs/pkgs/development/haskell-modules/hackage-packages.nix
Time (mean ± σ): 381.7 ms ± 1.0 ms [User: 348.3 ms, System: 33.1 ms]
Range (min … max): 380.2 ms … 387.7 ms 50 runs
Benchmark 3: nix eval --raw --impure --expr 'with import <nixpkgs/nixos> {}; system'
Time (mean ± σ): 2.936 s ± 0.005 s [User: 2.715 s, System: 0.221 s]
Range (min … max): 2.923 s … 2.946 s 50 runs
# after
Benchmark 1: nix search --offline nixpkgs hello
Time (mean ± σ): 571.7 ms ± 2.4 ms [User: 563.3 ms, System: 8.0 ms]
Range (min … max): 566.7 ms … 579.7 ms 50 runs
Benchmark 2: nix eval -f ../nixpkgs/pkgs/development/haskell-modules/hackage-packages.nix
Time (mean ± σ): 376.6 ms ± 1.0 ms [User: 345.8 ms, System: 30.5 ms]
Range (min … max): 374.5 ms … 379.1 ms 50 runs
Benchmark 3: nix eval --raw --impure --expr 'with import <nixpkgs/nixos> {}; system'
Time (mean ± σ): 2.922 s ± 0.006 s [User: 2.707 s, System: 0.215 s]
Range (min … max): 2.906 s … 2.934 s 50 runs
there's a few symbols in primops we can create once and pick them out of
EvalState afterwards instead of creating them every time we need them. this
gives almost 1% speedup to an uncached nix search.
Previously you had to remember to call value->attrs->sort() after
populating value->attrs. Now there is a BindingsBuilder helper that
wraps Bindings and ensures that sort() is called before you can use
it.
calling GC_malloc for each value is significantly more expensive than
allocating a bunch of values at once with GC_malloc_many. "a bunch" here
is a GC block size, ie 16KiB or less.
this gives a 1.5% performance boost when evaluating our nixos system.
tested with
nix eval --raw --impure --expr 'with import <nixpkgs/nixos> {}; system'
# on master
Time (mean ± σ): 3.335 s ± 0.007 s [User: 2.774 s, System: 0.293 s]
Range (min … max): 3.315 s … 3.347 s 50 runs
# with this change
Time (mean ± σ): 3.288 s ± 0.006 s [User: 2.728 s, System: 0.292 s]
Range (min … max): 3.274 s … 3.307 s 50 runs
We now parse function applications as a vector of arguments rather
than as a chain of binary applications, e.g. 'substring 1 2 "foo"' is
parsed as
ExprCall { .fun = <substring>, .args = [ <1>, <2>, <"foo"> ] }
rather than
ExprApp (ExprApp (ExprApp <substring> <1>) <2>) <"foo">
This allows primops to be called immediately (if enough arguments are
supplied) without having to allocate intermediate tPrimOpApp values.
On
$ nix-instantiate --dry-run '<nixpkgs/nixos/release-combined.nix>' -A nixos.tests.simple.x86_64-linux
this gives a substantial performance improvement:
user CPU time: median = 0.9209 mean = 0.9218 stddev = 0.0073 min = 0.9086 max = 0.9340 [rejected, p=0.00000, Δ=-0.21433±0.00677]
elapsed time: median = 1.0585 mean = 1.0584 stddev = 0.0024 min = 1.0523 max = 1.0623 [rejected, p=0.00000, Δ=-0.20594±0.00236]
because it reduces the number of tPrimOpApp allocations from 551990 to
42534 (i.e. only small minority of primop calls are partially
applied) which in turn reduces time spent in the garbage collector.
Rather than having them plain strings scattered through the whole
codebase, create an enum containing all the known experimental features.
This means that
- Nix can now `warn` when an unkwown experimental feature is passed
(making it much nicer to spot typos and spot deprecated features)
- It’s now easy to remove a feature altogether (once the feature isn’t
experimental anymore or is dropped) by just removing the field for the
enum and letting the compiler point us to all the now invalid usages
of it.
I found it somewhat confusing to have an error like
error: attribute 'getFlake' missing
if the required experimental-feature (`flakes`) is not enabled. Instead,
I'd expect Nix to throw an error just like it's the case when using e.g. `nix
flake` without `flakes` being enabled.
With this change, the error looks like this:
$ nix-instantiate -E 'builtins.getFlake "nixpkgs"'
error: Cannot call 'builtins.getFlake' because experimental Nix feature 'flakes' is disabled. You can enable it via '--extra-experimental-features flakes'.
at «string»:1:1:
1| builtins.getFlake "nixpkgs"
| ^
I didn't use `settings.requireExperimentalFeature` here on purpose
because this doesn't contain a position. Also, it doesn't seem as if we
need to catch the error and check for the missing feature here since
this already happens at evaluation time.
This fixes a use-after-free bug:
1. s = new EvalState();
2. callFlake()
3. static vCallFlake now references s
4. delete s;
5. s2 = new EvalState();
6. callFlake()
7. static vCallFlake still references s
8. crash
Nix 2.3 did not have a problem with recreating EvalState.
This fixes a class of crashes and introduces ptr<T> to make the
code robust against this failure mode going forward.
Thanks regnat for the idea of a ref<T> without overhead!
Closes#4895Closes#4893Closes#5127Closes#5113
Most functions now take a StorePath argument rather than a Path (which
is just an alias for std::string). The StorePath constructor ensures
that the path is syntactically correct (i.e. it looks like
<store-dir>/<base32-hash>-<name>). Similarly, functions like
buildPaths() now take a StorePathWithOutputs, rather than abusing Path
by adding a '!<outputs>' suffix.
Note that the StorePath type is implemented in Rust. This involves
some hackery to allow Rust values to be used directly in C++, via a
helper type whose destructor calls the Rust type's drop()
function. The main issue is the dynamic nature of C++ move semantics:
after we have moved a Rust value, we should not call the drop function
on the original value. So when we move a value, we set the original
value to bitwise zero, and the destructor only calls drop() if the
value is not bitwise zero. This should be sufficient for most types.
Also lots of minor cleanups to the C++ API to make it more modern
(e.g. using std::optional and std::string_view in some places).
With this patch, and this file I called `log.py`:
#!/usr/bin/env nix-shell
#!nix-shell -i python3 -p python3 --pure
import sys
from pprint import pprint
stack = []
timestack = []
for line in open(sys.argv[1]):
components = line.strip().split(" ", 2)
if components[0] != "function-trace":
continue
direction = components[1]
components = components[2].rsplit(" ", 2)
loc = components[0]
_at = components[1]
time = int(components[2])
if direction == "entered":
stack.append(loc)
timestack.append(time)
elif direction == "exited":
dur = time - timestack.pop()
vst = ";".join(stack)
print(f"{vst} {dur}")
stack.pop()
and:
nix-instantiate --trace-function-calls -vvvv ../nixpkgs/pkgs/top-level/release.nix -A unstable > log.matthewbauer 2>&1
./log.py ./log.matthewbauer > log.matthewbauer.folded
flamegraph.pl --title matthewbauer-post-pr log.matthewbauer.folded > log.matthewbauer.folded.svg
I can make flame graphs like: http://gsc.io/log.matthewbauer.folded.svg
---
Includes test cases around function call failures and tryEval. Uses
RAII so the finish is always called at the end of the function.
This allows using an arbitrary "provides" attribute from the specified
flake. For example:
nix build --flake nixpkgs packages.hello
(Maybe provides.packages should be used for consistency...)
We want to encourage a brave new world of hermetic evaluation for
source-level reproducibility, so flakes should not poke around in the
filesystem outside of their explicit dependencies.
Note that the default installation source remains impure in that it
can refer to mutable flakes, so "nix build nixpkgs.hello" still works
(and fetches the latest nixpkgs, unless it has been pinned by the
user).
A problem with pure evaluation is that builtins.currentSystem is
unavailable. For the moment, I've hard-coded "x86_64-linux" in the
nixpkgs flake. Eventually, "system" should be a flake function
argument.
If the Env denotes a 'with', then values[0] may be an Expr* cast to a
Value*. For code that generically traverses Values/Envs, it's useful
to know this.
In this mode, the following restrictions apply:
* The builtins currentTime, currentSystem and storePath throw an
error.
* $NIX_PATH and -I are ignored.
* fetchGit and fetchMercurial require a revision hash.
* fetchurl and fetchTarball require a sha256 attribute.
* No file system access is allowed outside of the paths returned by
fetch{Git,Mercurial,url,Tarball}. Thus 'nix build -f ./foo.nix' is
not allowed.
Thus, the evaluation result is completely reproducible from the
command line arguments. E.g.
nix build --pure-eval '(
let
nix = fetchGit { url = https://github.com/NixOS/nixpkgs.git; rev = "9c927de4b179a6dd210dd88d34bda8af4b575680"; };
nixpkgs = fetchGit { url = https://github.com/NixOS/nixpkgs.git; ref = "release-17.09"; rev = "66b4de79e3841530e6d9c6baf98702aa1f7124e4"; };
in (import (nix + "/release.nix") { inherit nix nixpkgs; }).build.x86_64-linux
)'
The goal is to enable completely reproducible and traceable
evaluation. For example, a NixOS configuration could be fully
described by a single Git commit hash. 'nixos-rebuild' would do
something like
nix build --pure-eval '(
(import (fetchGit { url = file:///my-nixos-config; rev = "..."; })).system
')
where the Git repository /my-nixos-config would use further fetchGit
calls or Git externals to fetch Nixpkgs and whatever other
dependencies it has. Either way, the commit hash would uniquely
identify the NixOS configuration and allow it to reproduced.
Functions like copyClosure() had 3 bool arguments, which creates a
severe risk of mixing up arguments.
Also, implement copyClosure() using copyPaths().
Previously, all derivation attributes had to be coerced into strings
so that they could be passed via the environment. This is lossy
(e.g. lists get flattened, necessitating configureFlags
vs. configureFlagsArray, of which the latter cannot be specified as an
attribute), doesn't support attribute sets at all, and has size
limitations (necessitating hacks like passAsFile).
This patch adds a new mode for passing attributes to builders, namely
encoded as a JSON file ".attrs.json" in the current directory of the
builder. This mode is activated via the special attribute
__structuredAttrs = true;
(The idea is that one day we can set this in stdenv.mkDerivation.)
For example,
stdenv.mkDerivation {
__structuredAttrs = true;
name = "foo";
buildInputs = [ pkgs.hello pkgs.cowsay ];
doCheck = true;
hardening.format = false;
}
results in a ".attrs.json" file containing (sans the indentation):
{
"buildInputs": [],
"builder": "/nix/store/ygl61ycpr2vjqrx775l1r2mw1g2rb754-bash-4.3-p48/bin/bash",
"configureFlags": [
"--with-foo",
"--with-bar=1 2"
],
"doCheck": true,
"hardening": {
"format": false
},
"name": "foo",
"nativeBuildInputs": [
"/nix/store/10h6li26i7g6z3mdpvra09yyf10mmzdr-hello-2.10",
"/nix/store/4jnvjin0r6wp6cv1hdm5jbkx3vinlcvk-cowsay-3.03"
],
"propagatedBuildInputs": [],
"propagatedNativeBuildInputs": [],
"stdenv": "/nix/store/f3hw3p8armnzy6xhd4h8s7anfjrs15n2-stdenv",
"system": "x86_64-linux"
}
"passAsFile" is ignored in this mode because it's not needed - large
strings are included directly in the JSON representation.
It is up to the builder to do something with the JSON
representation. For example, in bash-based builders, lists/attrsets of
string values could be mapped to bash (associative) arrays.
The implementation of "partition" in Nixpkgs is O(n^2) (because of the
use of ++), and for some reason was causing stack overflows in
multi-threaded evaluation (not sure why).
This reduces "nix-env -qa --drv-path" runtime by 0.197s and memory
usage by 298 MiB (in non-Boehm mode).
That is, unless --file is specified, the Nix search path is
synthesized into an attribute set. Thus you can say
$ nix build nixpkgs.hello
assuming $NIX_PATH contains an entry of the form "nixpkgs=...". This
is more verbose than
$ nix build hello
but is less ambiguous.
Thus, -I / $NIX_PATH entries are now downloaded only when they are
needed for evaluation. An error to download an entry is a non-fatal
warning (just like non-existant paths).
This does change the semantics of builtins.nixPath, which now returns
the original, rather than resulting path. E.g., before we had
[ { path = "/nix/store/hgm3yxf1lrrwa3z14zpqaj5p9vs0qklk-nixexprs.tar.xz"; prefix = "nixpkgs"; } ... ]
but now
[ { path = "https://nixos.org/channels/nixos-16.03/nixexprs.tar.xz"; prefix = "nixpkgs"; } ... ]
Fixes#792.
Also, move a few free-standing functions into StoreAPI and Derivation.
Also, introduce a non-nullable smart pointer, ref<T>, which is just a
wrapper around std::shared_ptr ensuring that the pointer is never
null. (For reference-counted values, this is better than passing a
"T&", because the latter doesn't maintain the refcount. Usually, the
caller will have a shared_ptr keeping the value alive, but that's not
always the case, e.g., when passing a reference to a std::thread via
std::bind.)
For example, "${{ foo = "bar"; __toString = x: x.foo; }}" evaluates
to "bar".
With this, we can delay calling functions like mkDerivation,
buildPythonPackage, etc. until we actually need a derivation, enabling
overrides and other modifications to happen by simple attribute set
update.
This modification moves Attr and Bindings structures into their own header
file which is dedicated to the attribute set representation. The goal of to
isolate pieces of code which are related to the attribute set
representation. Thus future modifications of the attribute set
representation will only have to modify these files, and not every other
file across the evaluator.
If ‘--option restrict-eval true’ is given, the evaluator will throw an
exception if an attempt is made to access any file outside of the Nix
search path. This is primarily intended for Hydra, where we don't want
people doing ‘builtins.readFile ~/.ssh/id_dsa’ or stuff like that.
With this, attribute sets with a `__functor` attribute can be applied
just like normal functions. This can be used to attach arbitrary
metadata to a function without callers needing to treat it specially.