If a root is a regular file, then its name must denote a store
path. For instance, the existence of the file
/nix/var/nix/gcroots/per-user/eelco/hydra-roots/wzc3cy1wwwd6d0dgxpa77ijr1yp50s6v-libxml2-2.7.7
would cause
/nix/store/wzc3cy1wwwd6d0dgxpa77ijr1yp50s6v-libxml2-2.7.7
to be a root.
This is useful because it involves less I/O (no need for a readlink()
call) and takes up less disk space (the symlink target typically takes
up a full disk block, while directory entries are packed more
efficiently). This is particularly important for hydra.nixos.org,
which has hundreds of thousands of roots, and where reading the roots
can take 25 minutes.
‘trusted-users’ is a list of users and groups that have elevated
rights, such as the ability to specify binary caches. It defaults to
‘root’. A typical value would be ‘@wheel’ to specify all users in the
wheel group.
‘allowed-users’ is a list of users and groups that are allowed to
connect to the daemon. It defaults to ‘*’. A typical value would be
‘@users’ to specify the ‘users’ group.
When running NixOps under Mac OS X, we need to be able to import store
paths built on Linux into the local Nix store. However, HFS+ is
usually case-insensitive, so if there are directories with file names
that differ only in case, then importing will fail.
The solution is to add a suffix ("~nix~case~hack~<integer>") to
colliding files. For instance, if we have a directory containing
xt_CONNMARK.h and xt_connmark.h, then the latter will be renamed to
"xt_connmark.h~nix~case~hack~1". If a store path is dumped as a NAR,
the suffixes are removed. Thus, importing and exporting via a
case-insensitive Nix store is round-tripping. So when NixOps calls
nix-copy-closure to copy the path to a Linux machine, you get the
original file names back.
Closes#119.
This makes things more efficient (we don't need to use an SSH master
connection, and we only start a single remote process) and gets rid of
locking issues (the remote nix-store process will keep inputs and
outputs locked as long as they're needed).
It also makes it more or less secure to connect directly to the root
account on the build machine, using a forced command
(e.g. ‘command="nix-store --serve --write"’). This bypasses the Nix
daemon and is therefore more efficient.
Also, don't call nix-store to import the output paths.
When copying a large path causes the daemon to run out of memory, you
now get:
error: Nix daemon out of memory
instead of:
error: writing to file: Broken pipe
If a build log is not available locally, then ‘nix-store -l’ will now
try to download it from the servers listed in the ‘log-servers’ option
in nix.conf. For instance, if you have:
log-servers = http://hydra.nixos.org/log
then it will try to get logs from http://hydra.nixos.org/log/<base
name of the store path>. So you can do things like:
$ nix-store -l $(which xterm)
and get a log even if xterm wasn't built locally.
By preloading all inodes in the /nix/store/.links directory, we can
quickly determine of a hardlinked file was already linked to the hashed
links.
This is tolerant of removing the .links directory, it will simply
recalculate all hashes in the store.
If an inode in the Nix store has more than 1 link, it probably means that it was linked into .links/ by us. If so, skip.
There's a possibility that something else hardlinked the file, so it would be nice to be able to override this.
Also, by looking at the number of hardlinks for each of the files in .links/, you can get deduplication numbers and space savings.
While running Python 3’s test suite, we noticed that on some systems
/dev/pts/ptmx is created with permissions 0 (that’s the case with my
Nixpkgs-originating 3.0.43 kernel, but someone with a Debian-originating
3.10-3 reported not having this problem.)
There’s still the problem that people without
CONFIG_DEVPTS_MULTIPLE_INSTANCES=y are screwed (as noted in build.cc),
but I don’t see how we could work around it.
Since the addition of build-max-log-size, a call to
handleChildOutput() can result in cancellation of a goal. This
invalidated the "j" iterator in the waitForInput() loop, even though
it was still used afterwards. Likewise for the maxSilentTime
handling.
Probably fixes#231. At least it gets rid of the valgrind warnings.
The daemon now creates /dev deterministically (thanks!). However, it
expects /dev/kvm to be present.
The patch below restricts that requirement (1) to Linux-based systems,
and (2) to systems where /dev/kvm already exists.
I’m not sure about the way to handle (2). We could special-case
/dev/kvm and create it (instead of bind-mounting it) in the chroot, so
it’s always available; however, it wouldn’t help much since most likely,
if /dev/kvm missing, then KVM support is missing.
We were relying on SubstitutionGoal's destructor releasing the lock,
but if a goal is a top-level goal, the destructor won't run in a
timely manner since its reference count won't drop to zero. So
release it explicitly.
Fixes#178.
The flag ‘--check’ to ‘nix-store -r’ or ‘nix-build’ will cause Nix to
redo the build of a derivation whose output paths are already valid.
If the new output differs from the original output, an error is
printed. This makes it easier to test if a build is deterministic.
(Obviously this cannot catch all sources of non-determinism, but it
catches the most common one, namely the current time.)
For example:
$ nix-build '<nixpkgs>' -A patchelf
...
$ nix-build '<nixpkgs>' -A patchelf --check
error: derivation `/nix/store/1ipvxsdnbhl1rw6siz6x92s7sc8nwkkb-patchelf-0.6' may not be deterministic: hash mismatch in output `/nix/store/4pc1dmw5xkwmc6q3gdc9i5nbjl4dkjpp-patchelf-0.6.drv'
The --check build fails if not all outputs are valid. Thus the first
call to nix-build is necessary to ensure that all outputs are valid.
The current outputs are left untouched: the new outputs are either put
in a chroot or diverted to a different location in the store using
hash rewriting.
This substituter connects to a remote host, runs nix-store --serve
there, and then forwards substituter commands on to the remote host and
sends their results to the calling program. The ssh-substituter-hosts
option can be specified as a list of hosts to try.
This is an initial implementation and, while it works, it has some
limitations:
* Only the first host is used
* There is no caching of query results (all queries are sent to the
remote machine)
* There is no informative output (such as progress bars)
* Some failure modes may cause unhelpful error messages
* There is no concept of trusted-ssh-substituter-hosts
Signed-off-by: Shea Levy <shea@shealevy.com>
nix-store --export takes a tmproot, which can only release by exiting.
Substituters don't currently work in a way that could take advantage of
the looping, anyway.
Signed-off-by: Shea Levy <shea@shealevy.com>
This is essentially the substituter API operating on the local store,
which will be used by the ssh substituter. It runs in a loop rather than
just taking one command so that in the future nix will be able to keep
one connection open for multiple instances of the substituter.
Signed-off-by: Shea Levy <shea@shealevy.com>
Namely:
nix-store: derivations.cc:242: nix::Hash nix::hashDerivationModulo(nix::StoreAPI&, nix::Derivation): Assertion `store.isValidPath(i->first)' failed.
This happened because of the derivation output correctness check being
applied before the references of a derivation are valid.
*headdesk*
*headdesk*
*headdesk*
So since commit 22144afa8d, Nix hasn't
actually checked whether the content of a downloaded NAR matches the
hash specified in the manifest / NAR info file. Urghhh...
In particular "libutil" was always a problem because it collides with
Glibc's libutil. Even if we install into $(libdir)/nix, the linker
sometimes got confused (e.g. if a program links against libstore but
not libutil, then ld would report undefined symbols in libstore
because it was looking at Glibc's libutil).
Note that adding --show-trace prevents functions calls from being
tail-recursive, so an expression that evaluates without --show-trace
may fail with a stack overflow if --show-trace is given.
I.e. "nix-store -q --roots" will now show (for example)
/home/eelco/Dev/nixpkgs/result
rather than
/nix/var/nix/gcroots/auto/53222qsppi12s2hkap8dm2lg8xhhyk6v
There is no risk of getting an inconsistent result here: if the ID
returned by queryValidPathId() is deleted from the database
concurrently, subsequent queries involving that ID will simply fail
(since IDs are never reused).
In the Hydra build farm we fairly regularly get SQLITE_PROTOCOL errors
(e.g., "querying path in database: locking protocol"). The docs for
this error code say that it "is returned if some other process is
messing with file locks and has violated the file locking protocol
that SQLite uses on its rollback journal files." However, the SQLite
source code reveals that this error can also occur under high load:
if( cnt>5 ){
int nDelay = 1; /* Pause time in microseconds */
if( cnt>100 ){
VVA_ONLY( pWal->lockError = 1; )
return SQLITE_PROTOCOL;
}
if( cnt>=10 ) nDelay = (cnt-9)*238; /* Max delay 21ms. Total delay 996ms */
sqlite3OsSleep(pWal->pVfs, nDelay);
}
i.e. if certain locks cannot be not acquired, SQLite will retry a
number of times before giving up and returing SQLITE_PROTOCOL. The
comments say:
Circumstances that cause a RETRY should only last for the briefest
instances of time. No I/O or other system calls are done while the
locks are held, so the locks should not be held for very long. But
if we are unlucky, another process that is holding a lock might get
paged out or take a page-fault that is time-consuming to resolve,
during the few nanoseconds that it is holding the lock. In that case,
it might take longer than normal for the lock to free.
...
The total delay time before giving up is less than 1 second.
On a heavily loaded machine like lucifer (the main Hydra server),
which often has dozens of processes waiting for I/O, it seems to me
that a page fault could easily take more than a second to resolve.
So, let's treat SQLITE_PROTOCOL as SQLITE_BUSY and retry the
transaction.
Issue NixOS/hydra#14.