after #6218 `Symbol` no longer confers a uniqueness invariant on the
string it wraps, it is now possible to create multiple symbols that
compare equal but whose string contents have different addresses. this
guarantee is now only provided by `SymbolIdx`, leaving `Symbol` only as
a string wrapper that knows about the intricacies of how symbols need to
be formatted for output.
this change renames `SymbolIdx` to `Symbol` to restore the previous
semantics of `Symbol` to that name. we also keep the wrapper type and
rename it to `SymbolStr` instead of returning plain strings from lookups
into the symbol table because symbols are formatted for output in many
places. theoretically we do not need `SymbolStr`, only a function that
formats a string for output as a symbol, but having to wrap every symbol
that appears in a message into eg `formatSymbol()` is error-prone and
inconvient.
this slightly increases the amount of memory used for any given symbol, but this
increase is more than made up for if the symbol is referenced more than once in
the EvalState that holds it. on average every symbol should be referenced at
least twice (once to introduce a binding, once to use it), so we expect no
increase in memory on average.
symbol tables are limited to 2³² entries like position tables, and similar
arguments apply to why overflow is not likely: 2³² symbols would require as many
string instances (at 24 bytes each) and map entries (at 24 bytes or more each,
assuming that the map holds on average at most one item per bucket as the docs
say). a full symbol table would require at least 192GB of memory just for
symbols, which is well out of reach. (an ofborg eval of nixpks today creates
less than a million symbols!)
Pos objects are somewhat wasteful as they duplicate the origin file name and
input type for each object. on files that produce more than one Pos when parsed
this a sizeable waste of memory (one pointer per Pos). the same goes for
ptr<Pos> on 64 bit machines: parsing enough source to require 8 bytes to locate
a position would need at least 8GB of input and 64GB of expression memory. it's
not likely that we'll hit that any time soon, so we can use a uint32_t index to
locate positions instead.
only file and line of the returned position were ever used, it wasn't actually
used a position. as such we may as well use a path+int pair for only those two
values and remove a use of Pos that would not work well with a position table.
- Make passing the position to `forceValue` mandatory,
this way we remember people that the position is
important for better error messages
- Add pos to all `forceValue` calls
It does not operate on a derivation and does not return a
derivation path. Instead it works at the language level,
where a distinct term "package" is more appropriate to
distinguish the parent object of `meta.position`; an
attribute which doesn't even make it into the derivation.
The value pointers of lists with 1 or 2 elements are now stored in the
list value itself. In particular, this makes the "concatMap (x: if
cond then [(f x)] else [])" idiom cheaper.
This is requires if you have attribute names with dots in them. So
you can now say:
$ nix-instantiate '<nixos>' -A 'config.systemd.units."postgresql.service".text' --eval-only
Fixes#151.
This prevents some duplicate evaluation in nix-env and
nix-instantiate.
Also, when traversing ~/.nix-defexpr, only read regular files with the
extension .nix. Previously it was reading files like
.../channels/binary-caches/<name>. The only reason this didn't cause
problems is pure luck (namely, <name> shadows an actual Nix
expression, the binary-caches files happen to be syntactically valid
Nix expressions, and we iterate over the directory contents in just
the right order).
tree). This saves a lot of memory. The vector should be sorted so
that names can be looked up using binary search, but this is not the
case yet. (Surprisingly, looking up attributes using linear search
doesn't have a big impact on performance.)
Memory consumption for
$ nix-instantiate /etc/nixos/nixos/tests -A bittorrent.test --readonly-mode
on x86_64-linux with GC enabled is now 185 MiB (compared to 946
MiB on the trunk).
a pointer to a Value, rather than the Value directly. This improves
the effectiveness of garbage collection a lot: if the Value is
stored inside the set directly, then any live pointer to the Value
causes all other attributes in the set to be live as well.
values. This improves sharing and gives another speed up.
Evaluation of the NixOS system attribute is now almost 7 times
faster than the old evaluator.
efficiently. The symbol table ensures that there is only one copy
of each symbol, thus allowing symbols to be compared efficiently
using a pointer equality test.