Commit graph

88 commits

Author SHA1 Message Date
Rebecca Turner
c0e7f50c1a
Rename hintfmt to HintFmt 2024-02-08 11:58:25 -08:00
Rebecca Turner
c6a89c1a16
libexpr: Support structured error classes
While preparing PRs like #9753, I've had to change error messages in
dozens of code paths. It would be nice if instead of

    EvalError("expected 'boolean' but found '%1%'", showType(v))

we could write

    TypeError(v, "boolean")

or similar. Then, changing the error message could be a mechanical
refactor with the compiler pointing out places the constructor needs to
be changed, rather than the error-prone process of grepping through the
codebase. Structured errors would also help prevent the "same" error
from having multiple slightly different messages, and could be a first
step towards error codes / an error index.

This PR reworks the exception infrastructure in `libexpr` to
support exception types with different constructor signatures than
`BaseError`. Actually refactoring the exceptions to use structured data
will come in a future PR (this one is big enough already, as it has to
touch every exception in `libexpr`).

The core design is in `eval-error.hh`. Generally, errors like this:

    state.error("'%s' is not a string", getAttrPathStr())
      .debugThrow<TypeError>()

are transformed like this:

    state.error<TypeError>("'%s' is not a string", getAttrPathStr())
      .debugThrow()

The type annotation has moved from `ErrorBuilder::debugThrow` to
`EvalState::error`.
2024-02-01 16:39:38 -08:00
pennae
b596cc9e79 decouple parser and EvalState
there's no reason the parser itself should be doing semantic analysis
like bindVars. split this bit apart (retaining the previous name in
EvalState) and have the parser really do *only* parsing, decoupled from
EvalState.
2024-01-15 16:52:18 +01:00
pennae
835a6c7bcf rename ParserState::{makeCurPos -> at}
most instances of this being used do not refer to the "current"
position, sometimes not even to one reasonably close by. it could also
be called `makePos` instead, but `at` seems clear in context.
2024-01-15 16:52:18 +01:00
pennae
0076056164 move ParseData to own header, rename to ParserState
ParserState better describes what this struct really is. the parser
really does modify its state (most notably position and symbol tables),
so calling it that rather than obliquely "data" (which implies being
input only) makes sense.
2024-01-15 16:52:18 +01:00
John Ericson
e739a5002d Avoid Windows macros in the parser and lexer
`FLOAT`, `INT`, and `IN` are identifers taken by macros.

The name `IN_KW` is chosen to match `OR_KW`, which is presumably named
that way for the same reason of dodging macros.
2024-01-12 19:51:36 -05:00
pennae
b78e77b34c use custom location type in the parser
~1% parser speedup from not using TLS indirections, less on system eval.
this could have also gone in flex yyextra data, but that's significantly
slower for some reason (albeit still faster than thread locals).

before:

  Time (mean ± σ):      4.231 s ±  0.004 s    [User: 3.725 s, System: 0.504 s]
  Range (min … max):    4.226 s …  4.240 s    10 runs

after:

  Time (mean ± σ):      4.224 s ±  0.005 s    [User: 3.711 s, System: 0.512 s]
  Range (min … max):    4.218 s …  4.234 s    10 runs
2023-12-19 19:32:16 +01:00
pennae
2e0321912a use aligned flex tables
~2% speedup on parsing without eval, less (but still significant) on
system eval. having flex generate faster parsers leads to very strange
misparses. maybe re2c is worth investigating.

before:

  Time (mean ± σ):      4.260 s ±  0.003 s    [User: 3.754 s, System: 0.505 s]
  Range (min … max):    4.257 s …  4.266 s    10 runs

after:

  Time (mean ± σ):      4.231 s ±  0.004 s    [User: 3.725 s, System: 0.504 s]
  Range (min … max):    4.226 s …  4.240 s    10 runs
2023-12-19 19:32:16 +01:00
Yingchi Long
3c90340fe6 libexpr: use thread_local to make the parser thread-safe
If we call `adjustLoc`, the global variable `prev_yylloc` is shared
between threads and racy.

Currently, nix itself does not concurrently parsing files, but this is
helpful for libexpr users. (The parser is thread-safe except this.)
2023-07-03 16:05:43 +08:00
Eelco Dolstra
27ebb97d0a
Handle EOFs in string literals correctly
We can't return a STR token without setting a valid StringToken,
otherwise the parser will crash.

Fixes #6562.
2022-05-25 17:58:13 +02:00
pennae
6526d1676b replace most Pos objects/ptrs with indexes into a position table
Pos objects are somewhat wasteful as they duplicate the origin file name and
input type for each object. on files that produce more than one Pos when parsed
this a sizeable waste of memory (one pointer per Pos). the same goes for
ptr<Pos> on 64 bit machines: parsing enough source to require 8 bytes to locate
a position would need at least 8GB of input and 64GB of expression memory. it's
not likely that we'll hit that any time soon, so we can use a uint32_t index to
locate positions instead.
2022-04-21 21:46:06 +02:00
Sergei Trofimovich
9174d884d7 lexer: add error location to lexer errors
Before the change lexter errors did not report the location:

    $ nix build -f. mc
    error: path has a trailing slash
    (use '--show-trace' to show detailed location information)

Note that it's not clear what file generates the error.

After the change location is reported:

    $ src/nix/nix --extra-experimental-features nix-command build -f ~/nm mc
    error: path has a trailing slash

           at .../pkgs/development/libraries/glib/default.nix:54:18:

               53|   };
               54|   src = /tmp/foo/;
                 |                  ^
               55|
    (use '--show-trace' to show detailed location information)

Here we see both problematic file and the string itself.
2022-03-24 08:16:14 +00:00
pennae
0a7746603e remove ExprIndStr
it can be replaced with StringToken if we add another bit if information to
StringToken, namely whether this string should take part in indentation scanning
or not. since all escaping terminates indentation scanning we need to set this
bit only for the non-escaped IND_STRING rule.

this improves performance by about 1%.

 before

  nix search --no-eval-cache --offline ../nixpkgs hello
    Time (mean ± σ):      8.880 s ±  0.048 s    [User: 6.809 s, System: 1.643 s]
    Range (min … max):    8.781 s …  8.993 s    20 runs

  nix eval -f ../nixpkgs/pkgs/development/haskell-modules/hackage-packages.nix
    Time (mean ± σ):     375.0 ms ±   2.2 ms    [User: 339.8 ms, System: 35.2 ms]
    Range (min … max):   371.5 ms … 379.3 ms    20 runs

  nix eval --raw --impure --expr 'with import <nixpkgs/nixos> {}; system'
    Time (mean ± σ):      2.831 s ±  0.040 s    [User: 2.536 s, System: 0.225 s]
    Range (min … max):    2.769 s …  2.912 s    20 runs

 after

  nix search --no-eval-cache --offline ../nixpkgs hello
    Time (mean ± σ):      8.832 s ±  0.048 s    [User: 6.757 s, System: 1.657 s]
    Range (min … max):    8.743 s …  8.921 s    20 runs

  nix eval -f ../nixpkgs/pkgs/development/haskell-modules/hackage-packages.nix
    Time (mean ± σ):     367.4 ms ±   3.2 ms    [User: 332.7 ms, System: 34.7 ms]
    Range (min … max):   364.6 ms … 374.6 ms    20 runs

  nix eval --raw --impure --expr 'with import <nixpkgs/nixos> {}; system'
    Time (mean ± σ):      2.810 s ±  0.030 s    [User: 2.517 s, System: 0.225 s]
    Range (min … max):    2.742 s …  2.854 s    20 runs
2022-01-19 13:39:42 +01:00
pennae
72f42093e7 optimize unescapeStr
mainly to avoid an allocation and a copy of a string that can be
modified in place (ever since EvalState holds on to the buffer, not the
generated parser itself).

 # before

Benchmark 1: nix search --offline nixpkgs hello
  Time (mean ± σ):     571.7 ms ±   2.4 ms    [User: 563.3 ms, System: 8.0 ms]
  Range (min … max):   566.7 ms … 579.7 ms    50 runs

Benchmark 2: nix eval -f ../nixpkgs/pkgs/development/haskell-modules/hackage-packages.nix
  Time (mean ± σ):     376.6 ms ±   1.0 ms    [User: 345.8 ms, System: 30.5 ms]
  Range (min … max):   374.5 ms … 379.1 ms    50 runs

Benchmark 3: nix eval --raw --impure --expr 'with import <nixpkgs/nixos> {}; system'
  Time (mean ± σ):      2.922 s ±  0.006 s    [User: 2.707 s, System: 0.215 s]
  Range (min … max):    2.906 s …  2.934 s    50 runs

 # after

Benchmark 1: nix search --offline nixpkgs hello
  Time (mean ± σ):     570.4 ms ±   2.8 ms    [User: 561.3 ms, System: 8.6 ms]
  Range (min … max):   564.6 ms … 578.1 ms    50 runs

Benchmark 2: nix eval -f ../nixpkgs/pkgs/development/haskell-modules/hackage-packages.nix
  Time (mean ± σ):     375.4 ms ±   1.3 ms    [User: 343.2 ms, System: 31.7 ms]
  Range (min … max):   373.4 ms … 378.2 ms    50 runs

Benchmark 3: nix eval --raw --impure --expr 'with import <nixpkgs/nixos> {}; system'
  Time (mean ± σ):      2.925 s ±  0.006 s    [User: 2.704 s, System: 0.219 s]
  Range (min … max):    2.910 s …  2.942 s    50 runs
2022-01-13 18:06:15 +01:00
pennae
61a9d16d5c don't strdup tokens in the lexer
every stringy token the lexer returns is turned into a Symbol and not
used further, so we don't have to strdup. using a string_view is
sufficient, but due to limitations of the current parser we have to use
a POD type that holds the same information.

gives ~2% on system build, 6% on search, 8% on parsing alone

 # before

Benchmark 1: nix search --offline nixpkgs hello
  Time (mean ± σ):     610.6 ms ±   2.4 ms    [User: 602.5 ms, System: 7.8 ms]
  Range (min … max):   606.6 ms … 617.3 ms    50 runs

Benchmark 2: nix eval -f hackage-packages.nix
  Time (mean ± σ):     430.1 ms ±   1.4 ms    [User: 393.1 ms, System: 36.7 ms]
  Range (min … max):   428.2 ms … 434.2 ms    50 runs

Benchmark 3: nix eval --raw --impure --expr 'with import <nixpkgs/nixos> {}; system'
  Time (mean ± σ):      3.032 s ±  0.005 s    [User: 2.808 s, System: 0.223 s]
  Range (min … max):    3.023 s …  3.041 s    50 runs

 # after

Benchmark 1: nix search --offline nixpkgs hello
  Time (mean ± σ):     574.7 ms ±   2.8 ms    [User: 566.3 ms, System: 8.0 ms]
  Range (min … max):   569.2 ms … 580.7 ms    50 runs

Benchmark 2: nix eval -f hackage-packages.nix
  Time (mean ± σ):     394.4 ms ±   0.8 ms    [User: 361.8 ms, System: 32.3 ms]
  Range (min … max):   392.7 ms … 395.7 ms    50 runs

Benchmark 3: nix eval --raw --impure --expr 'with import <nixpkgs/nixos> {}; system'
  Time (mean ± σ):      2.976 s ±  0.005 s    [User: 2.757 s, System: 0.218 s]
  Range (min … max):    2.966 s …  2.990 s    50 runs
2022-01-13 18:06:14 +01:00
Eelco Dolstra
81e7c40264 Optimize primop calls
We now parse function applications as a vector of arguments rather
than as a chain of binary applications, e.g. 'substring 1 2 "foo"' is
parsed as

  ExprCall { .fun = <substring>, .args = [ <1>, <2>, <"foo"> ] }

rather than

  ExprApp (ExprApp (ExprApp <substring> <1>) <2>) <"foo">

This allows primops to be called immediately (if enough arguments are
supplied) without having to allocate intermediate tPrimOpApp values.

On

  $ nix-instantiate --dry-run '<nixpkgs/nixos/release-combined.nix>' -A nixos.tests.simple.x86_64-linux

this gives a substantial performance improvement:

  user CPU time:      median =      0.9209  mean =      0.9218  stddev =      0.0073  min =      0.9086  max =      0.9340  [rejected, p=0.00000, Δ=-0.21433±0.00677]
  elapsed time:       median =      1.0585  mean =      1.0584  stddev =      0.0024  min =      1.0523  max =      1.0623  [rejected, p=0.00000, Δ=-0.20594±0.00236]

because it reduces the number of tPrimOpApp allocations from 551990 to
42534 (i.e. only small minority of primop calls are partially
applied) which in turn reduces time spent in the garbage collector.
2021-11-04 15:03:40 +01:00
Taeer Bar-Yam
f14660d5e2 reset yylloc when yyless(0) is called 2021-09-29 19:47:01 -04:00
Taeer Bar-Yam
8f9429dcab add antiquotations to paths 2021-08-06 06:46:05 -04:00
Pamplemousse
99f8fc995b libexpr: Fix read out-of-bound on the heap
Signed-off-by: Pamplemousse <xav.maso@gmail.com>
2021-07-14 09:09:42 -07:00
regnat
0d9e1af695 Remove an unknown pragma gcc warning 2020-12-02 14:33:20 +01:00
regnat
438977731c shut up clang warnings
- Fix some class/struct discrepancies
- Explicit the overloading of `run` in the `Cmd*` classes
- Ignore a warning in the generated lexer
2020-12-01 15:04:03 +01:00
Eelco Dolstra
e14e62fddd Remove trailing whitespace 2020-06-15 14:12:39 +02:00
Ben Burdette
3bc9155dfc a few more 'format's rremoved 2020-04-22 15:00:11 -06:00
Guillaume Maudoux
6a5bf9b143 simplify handling of extra '}' 2018-10-27 00:14:51 +02:00
aszlig
0ad643ed5c
libexpr: Use int64_t for NixInt
Using a 64bit integer on 32bit systems will come with a bit of a
performance overhead, but given that Nix doesn't use a lot of integers
compared to other types, I think the overhead is negligible also
considering that 32bit systems are in decline.

The biggest advantage however is that when we use a consistent integer
size across all platforms it's less likely that we miss things that we
break due to that. One example would be:

https://github.com/NixOS/nixpkgs/pull/44233

On Hydra it will evaluate, because the evaluator runs on a 64bit
machine, but when evaluating the same on a 32bit machine it will fail,
so using 64bit integers should make that consistent.

While the change of the type in value.hh is rather easy to do, we have a
few more options available for doing the conversion in the lexer:

  * Via an #ifdef on the architecture and using strtol() or strtoll()
    accordingly depending on which architecture we are. For the #ifdef
    we would need another AX_COMPILE_CHECK_SIZEOF in configure.ac.
  * Using istringstream, which would involve copying the value.
  * As we're already using boost, lexical_cast might be a good idea.

Spoiler: I went for the latter, first of all because lexical_cast does
have an overload for const char* and second of all, because it doesn't
involve copying around the input string. Also, because istringstream
seems to come with a bigger overhead than boost::lexical_cast:

https://www.boost.org/doc/libs/release/doc/html/boost_lexical_cast/performance.html

The first method (still using strtol/strtoll) also wasn't something I
pursued further, because it is also locale-aware which I doubt is what
we want, given that the regex for int is [0-9]+.

Signed-off-by: aszlig <aszlig@nix.build>
Fixes: #2339
2018-08-29 01:05:52 +02:00
Eelco Dolstra
1ad19232c4
Don't return negative numbers from the flex tokenizer
Fixes #1374.
Closes #2129.
2018-05-11 12:05:12 +02:00
Eelco Dolstra
f3c85f9eb3
Revert "Throw a specific error for incomplete parse errors."
This reverts commit 6498adb002. We don't
actually use IncompleteParseError in 'nix repl'.
2018-05-11 11:40:50 +02:00
Tuomas Tynkkynen
a0e38c16bc libexpr: Recognize newline in more places in lexer
Flex's regexes have an annoying feature: the dot matches everything
except a newline. This causes problems for expressions like:

"${0}\
"

where the backslash-newline combination matches this rule instead of the
intended one mentioned in the comment:

    <STRING>\$|\\|\$\\ {
                    /* This can only occur when we reach EOF, otherwise the above
                    (...|\$[^\{\"\\]|\\.|\$\\.)+ would have triggered.
                    This is technically invalid, but we leave the problem to the
                    parser who fails with exact location. */
                    return STR;
                }
However, the parser actually accepts the resulting token sequence
('"' DOLLAR_CURLY 0 '}' STR '"'), which is a problem because the lexer
rule didn't assign anything to yylval. Ultimately this leads to a crash
when dereferencing a NULL pointer in ExprConcatStrings::bindVars().

The fix does change the syntax of the language in some corner cases
but I think it's only turning previously invalid (or crashing) syntax
to valid syntax. E.g.

"a\
b"

and

''a''\
b''

were previously syntax errors but now both result in "a\nb".

Found by afl-fuzz.
2018-03-02 17:30:48 +02:00
Tuomas Tynkkynen
f67a7007a2 libexpr: Pre-reserve space in string in unescapeStr()
Avoids some malloc() traffic.
2018-02-16 04:39:43 +02:00
Eelco Dolstra
2c39e4eca0
Revert "Don't parse "x:x" as a URI"
This reverts commit f90f660b24.

This broke Hydra's release.nix, which contained

  preCheck = ''export LOGNAME=${LOGNAME:-foo}'';
2017-11-14 15:10:52 +01:00
Eelco Dolstra
f90f660b24
Don't parse "x:x" as a URI
URIs now have to contain "://" or start with "channel:".
2017-10-30 17:58:01 +01:00
Jörg Thalheim
2fd8f8bb99 Replace Unicode quotes in user-facing strings by ASCII
Relevant RFC: NixOS/rfcs#4

$ ag -l | xargs sed -i -e "/\"/s/’/'/g;/\"/s/‘/'/g"
2017-07-30 12:32:45 +01:00
Guillaume Maudoux
a143014d73 lexer: remove catch-all rules hiding real errors
With catch-all rules, we hide potential errors.
It turns out that a4744254 made one cath-all useless. Flex detected that
is was impossible to reach.
The other is more subtle, as it can only trigger on unfinished escapes
in unfinished strings, which only occurs at EOF.
2017-05-01 01:18:06 +02:00
Guillaume Maudoux
a474425425 Fix lexer to support $' in multiline strings. 2017-05-01 01:15:40 +02:00
Eelco Dolstra
603f08506e
Tweak error message 2016-12-06 17:18:40 +01:00
Guillaume Maudoux
e4b82af387 Improve error message on trailing path slashes 2016-11-27 17:48:46 +01:00
Guillaume Maudoux
a5e761dddb Fix comments parsing
Fixed the parsing of multiline strings ending with an even number of
stars, like /** this **/.
Added test cases for comments.
2016-11-13 17:20:34 +01:00
Scott Olson
6498adb002 Throw a specific error for incomplete parse errors.
`nix-repl` will use this for deciding whether to keep waiting for input or
error out right away.
2016-02-24 04:32:21 -06:00
Eelco Dolstra
b3e8d72770 Merge pull request #762 from ctheune/ctheune-floats
Implement floats
2016-02-12 12:49:59 +01:00
Eelco Dolstra
5d8b7eb3e1 Revert "Revert "next try for "don't abort when given unmatched '}' with 'start-condition stack underflow'. This fixes #751"""
This reverts commit b669d3d2e8.
2016-01-20 16:34:42 +01:00
Eelco Dolstra
b669d3d2e8 Revert "next try for "don't abort when given unmatched '}' with 'start-condition stack underflow'. This fixes #751""
This reverts commit ed23c8568e. Let's
merge this *after* the 1.11.1 release.
2016-01-20 00:05:28 +01:00
Fabian Schmitthenner
ed23c8568e next try for "don't abort when given unmatched '}' with 'start-condition stack underflow'. This fixes #751"
This reverts commit 8120b6fb8a and fixes the regression introduced in
8d22b26448.
2016-01-19 20:35:35 +00:00
Eelco Dolstra
8120b6fb8a Revert "don't abort when given unmatched '}' with 'start-condition stack underflow'. This fixes #751"
This reverts commit 8d22b26448. It
breaks Nixpkgs:

$ nix-env -qa
error: syntax error, unexpected IND_STR, expecting '}', at /home/eelco/Dev/nixpkgs-stable/pkgs/top-level/python-packages.nix:7605:8
2016-01-19 20:33:32 +01:00
Fabian Schmitthenner
8d22b26448 don't abort when given unmatched '}' with 'start-condition stack underflow'. This fixes #751 2016-01-12 20:40:41 +00:00
Christian Theune
a12a43046b Edge condition: parser did not pick up floats starting exactly with 0. 2016-01-05 09:54:49 +01:00
Christian Theune
f872262e08 Fix up float parsing. 2016-01-05 09:46:37 +01:00
Christian Theune
494fc5acbb Try a simplified version of float lexing that didn't work.
The last one I tried was botchered anyway ...
2016-01-05 00:53:22 +01:00
Christian Theune
14ebde5289 First hit at providing support for floats in the language. 2016-01-05 00:40:40 +01:00
Guillaume Maudoux
467977f203 Fix the parsing of "$"'s in strings. 2015-07-03 14:09:58 +02:00
Guillaume Maudoux
65e4dcd69b Fix the hack that resets the scanner state. 2015-07-03 13:53:36 +02:00