This replaces the O(n) search complexity in our insert code with a
lookup of O(log n). It also makes removing waitees easier as we can use
the extract method provided by the set class.
If there were many top-level goals (which are not destroyed until the
very end), commands like
$ nix copy --to 'ssh://localhost?remote-store=/tmp/nix' \
/run/current-system --no-check-sigs --substitute-on-destination
could fail with "Too many open files". So now we do some explicit
cleanup from amDone(). It would be cleaner to separate goals from
their temporary internal state, but that would be a bigger refactor.