every stringy token the lexer returns is turned into a Symbol and not
used further, so we don't have to strdup. using a string_view is
sufficient, but due to limitations of the current parser we have to use
a POD type that holds the same information.
gives ~2% on system build, 6% on search, 8% on parsing alone
# before
Benchmark 1: nix search --offline nixpkgs hello
Time (mean ± σ): 610.6 ms ± 2.4 ms [User: 602.5 ms, System: 7.8 ms]
Range (min … max): 606.6 ms … 617.3 ms 50 runs
Benchmark 2: nix eval -f hackage-packages.nix
Time (mean ± σ): 430.1 ms ± 1.4 ms [User: 393.1 ms, System: 36.7 ms]
Range (min … max): 428.2 ms … 434.2 ms 50 runs
Benchmark 3: nix eval --raw --impure --expr 'with import <nixpkgs/nixos> {}; system'
Time (mean ± σ): 3.032 s ± 0.005 s [User: 2.808 s, System: 0.223 s]
Range (min … max): 3.023 s … 3.041 s 50 runs
# after
Benchmark 1: nix search --offline nixpkgs hello
Time (mean ± σ): 574.7 ms ± 2.8 ms [User: 566.3 ms, System: 8.0 ms]
Range (min … max): 569.2 ms … 580.7 ms 50 runs
Benchmark 2: nix eval -f hackage-packages.nix
Time (mean ± σ): 394.4 ms ± 0.8 ms [User: 361.8 ms, System: 32.3 ms]
Range (min … max): 392.7 ms … 395.7 ms 50 runs
Benchmark 3: nix eval --raw --impure --expr 'with import <nixpkgs/nixos> {}; system'
Time (mean ± σ): 2.976 s ± 0.005 s [User: 2.757 s, System: 0.218 s]
Range (min … max): 2.966 s … 2.990 s 50 runs
We now parse function applications as a vector of arguments rather
than as a chain of binary applications, e.g. 'substring 1 2 "foo"' is
parsed as
ExprCall { .fun = <substring>, .args = [ <1>, <2>, <"foo"> ] }
rather than
ExprApp (ExprApp (ExprApp <substring> <1>) <2>) <"foo">
This allows primops to be called immediately (if enough arguments are
supplied) without having to allocate intermediate tPrimOpApp values.
On
$ nix-instantiate --dry-run '<nixpkgs/nixos/release-combined.nix>' -A nixos.tests.simple.x86_64-linux
this gives a substantial performance improvement:
user CPU time: median = 0.9209 mean = 0.9218 stddev = 0.0073 min = 0.9086 max = 0.9340 [rejected, p=0.00000, Δ=-0.21433±0.00677]
elapsed time: median = 1.0585 mean = 1.0584 stddev = 0.0024 min = 1.0523 max = 1.0623 [rejected, p=0.00000, Δ=-0.20594±0.00236]
because it reduces the number of tPrimOpApp allocations from 551990 to
42534 (i.e. only small minority of primop calls are partially
applied) which in turn reduces time spent in the garbage collector.
Using a 64bit integer on 32bit systems will come with a bit of a
performance overhead, but given that Nix doesn't use a lot of integers
compared to other types, I think the overhead is negligible also
considering that 32bit systems are in decline.
The biggest advantage however is that when we use a consistent integer
size across all platforms it's less likely that we miss things that we
break due to that. One example would be:
https://github.com/NixOS/nixpkgs/pull/44233
On Hydra it will evaluate, because the evaluator runs on a 64bit
machine, but when evaluating the same on a 32bit machine it will fail,
so using 64bit integers should make that consistent.
While the change of the type in value.hh is rather easy to do, we have a
few more options available for doing the conversion in the lexer:
* Via an #ifdef on the architecture and using strtol() or strtoll()
accordingly depending on which architecture we are. For the #ifdef
we would need another AX_COMPILE_CHECK_SIZEOF in configure.ac.
* Using istringstream, which would involve copying the value.
* As we're already using boost, lexical_cast might be a good idea.
Spoiler: I went for the latter, first of all because lexical_cast does
have an overload for const char* and second of all, because it doesn't
involve copying around the input string. Also, because istringstream
seems to come with a bigger overhead than boost::lexical_cast:
https://www.boost.org/doc/libs/release/doc/html/boost_lexical_cast/performance.html
The first method (still using strtol/strtoll) also wasn't something I
pursued further, because it is also locale-aware which I doubt is what
we want, given that the regex for int is [0-9]+.
Signed-off-by: aszlig <aszlig@nix.build>
Fixes: #2339
Flex's regexes have an annoying feature: the dot matches everything
except a newline. This causes problems for expressions like:
"${0}\
"
where the backslash-newline combination matches this rule instead of the
intended one mentioned in the comment:
<STRING>\$|\\|\$\\ {
/* This can only occur when we reach EOF, otherwise the above
(...|\$[^\{\"\\]|\\.|\$\\.)+ would have triggered.
This is technically invalid, but we leave the problem to the
parser who fails with exact location. */
return STR;
}
However, the parser actually accepts the resulting token sequence
('"' DOLLAR_CURLY 0 '}' STR '"'), which is a problem because the lexer
rule didn't assign anything to yylval. Ultimately this leads to a crash
when dereferencing a NULL pointer in ExprConcatStrings::bindVars().
The fix does change the syntax of the language in some corner cases
but I think it's only turning previously invalid (or crashing) syntax
to valid syntax. E.g.
"a\
b"
and
''a''\
b''
were previously syntax errors but now both result in "a\nb".
Found by afl-fuzz.
With catch-all rules, we hide potential errors.
It turns out that a4744254 made one cath-all useless. Flex detected that
is was impossible to reach.
The other is more subtle, as it can only trigger on unfinished escapes
in unfinished strings, which only occurs at EOF.
Now, in addition to a."${b}".c, you can write a.${b}.c (applicable
wherever dynamic attributes are valid).
Signed-off-by: Shea Levy <shea@shealevy.com>
In Nixpkgs, the attribute in all-packages.nix corresponding to a
package is usually equal to the package name. However, this doesn't
work if the package contains a dash, which is fairly common. The
convention is to replace the dash with an underscore (e.g. "dbus-lib"
becomes "dbus_glib"), but that's annoying. So now dashes are valid in
variable / attribute names, allowing you to write:
dbus-glib = callPackage ../development/libraries/dbus-glib { };
and
buildInputs = [ dbus-glib ];
Since we don't have a negation or subtraction operation in Nix, this
is unambiguous.
brackets, e.g.
import <nixpkgs/pkgs/lib>
are resolved by looking them up relative to the elements listed in
the search path. This allows us to get rid of hacks like
import "${builtins.getEnv "NIXPKGS_ALL"}/pkgs/lib"
The search path can be specified through the ‘-I’ command-line flag
and through the colon-separated ‘NIX_PATH’ environment variable,
e.g.,
$ nix-build -I /etc/nixos ...
If a file is not found in the search path, an error message is
lazily thrown.
errors with position info.
* For all positions, use the position of the first character of the
first token, rather than the last character of the first token plus
one.