More efficient Simulacrum structure #115

Open
opened 2024-08-17 02:48:31 +03:00 by max · 0 comments
Owner

The Simulacrum is currently set up to test one service per NixOS test, and each test always includes all machines (5 hours + Nowhere, so 6 machines in total). With the current 7 tests, that's 42 NixOS evaluations. When evaluating the repo in HCI, hercules-ci-agent-worker takes about 20GB of RAM.

Ideas for better memory efficiency:

  • Test multiple services at once by grouping services (e.g. basic network, all storage services, everything else)
    • Ultimate form: test everything in one massive NixOS test
    • Maybe only use the grouped tests in the flake checks, but still allow running the stuff for individual services via Void CLI
  • Only deploy required nodes, and dynamically scale down nodes for services that would be fine with fewer nodes
    • e.g. Consul runs on 5 nodes, but could theoretically be reduced to 3, so if the "service under test" requires 2 nodes, the test should contain those 2 nodes running the service and Consul, and one extra node just running Consul
    • could be done by defining a minimumReplicas or minimumNodeCount option for node groups
    • come up with a "binpacking" function that picks the smallest possible set of nodes given all the required services, the nodes they can run on, and the minimum amount of instances
The Simulacrum is currently set up to test one service per NixOS test, and each test always includes all machines (5 hours + Nowhere, so 6 machines in total). With the current 7 tests, that's 42 NixOS evaluations. When evaluating the repo in HCI, `hercules-ci-agent-worker` takes about 20GB of RAM. Ideas for better memory efficiency: - Test multiple services at once by grouping services (e.g. basic network, all storage services, everything else) - Ultimate form: test everything in one massive NixOS test - Maybe only use the grouped tests in the flake checks, but still allow running the stuff for individual services via Void CLI - Only deploy required nodes, and dynamically scale down nodes for services that would be fine with fewer nodes - e.g. Consul runs on 5 nodes, but could theoretically be reduced to 3, so if the "service under test" requires 2 nodes, the test should contain those 2 nodes running the service and Consul, and one extra node just running Consul - could be done by defining a `minimumReplicas` or `minimumNodeCount` option for node groups - come up with a "binpacking" function that picks the smallest possible set of nodes given all the required services, the nodes they can run on, and the minimum amount of instances
max self-assigned this 2024-11-16 23:53:13 +02:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: privatevoid.net/depot#115
No description provided.