WIP: PostgreSQL Cluster Auto-Upgrade #92

Draft
max wants to merge 5 commits from pg-auto-migrations into master
max commented 2023-08-26 20:38:38 +03:00 (Migrated from git.privatevoid.net)

Adds the ability to automatically upgrade PostgreSQL nodes in a Patroni cluster across major versions with (ideally) minimal downtime.

Upgrade step-by-step:

  • be stopped
  • on startup, check highest pg version in cluster
    • TODO: fix setting CONSUL_HTTP_ADDR
  • if version matches our target, wait until the leader is that version
    • requires us to run a switchover/failover the first time around
  • if version is above target, crash
  • if version is below target, perform self-upgrade:
    • nuke data dir, initdb with target version
    • find the leader
    • TODO: find the leader's port number (patroni config?)
    • pause auto failover
    • prevent leader from stopping itself
    • for all databases on leader, dump schema and load into ours
    • temporarily switch cluster wal_level to logical
    • for all databases on leader, create replication slot for all tables, subscribe to each replication slot on ours
    • write down the current wal lsn on the leader
    • wait until we reach that lsn
    • switch the leader into read-only mode
    • sync the remaining data over, i.e. wait for replication
      • TODO: what if a ddl change happened between schema dump and read-only?
    • drop the publications and subscriptions
    • reset cluster's wal_level to previous value
    • set the cluster's sysid to our new sysid
      • TODO: is this desirable? it kills all other nodes, removing the leader
    • perform a switchover to us
      • TODO: should we do this ourselves? (currently, the second node to upgrade forces a failover on us)
    • we are now the leader
    • wait for the rest of the replicas to do the same procedure
    • cluster upgraded with minimal downtime!
Adds the ability to automatically upgrade PostgreSQL nodes in a Patroni cluster across major versions with (ideally) minimal downtime. Upgrade step-by-step: - [x] be stopped - [x] on startup, check highest pg version in cluster - [ ] TODO: fix setting CONSUL_HTTP_ADDR - [x] if version matches our target, wait until the leader is that version - [x] requires us to run a switchover/failover the first time around - [ ] if version is above target, crash - [ ] if version is below target, perform self-upgrade: - [x] nuke data dir, initdb with target version - [x] find the leader - [ ] TODO: find the leader's port number (patroni config?) - [ ] pause auto failover - [ ] prevent leader from stopping itself - [x] for all databases on leader, dump schema and load into ours - [ ] temporarily switch cluster wal_level to logical - [x] for all databases on leader, create replication slot for all tables, subscribe to each replication slot on ours - [ ] write down the current wal lsn on the leader - [ ] wait until we reach that lsn - [ ] switch the leader into read-only mode - [ ] sync the remaining data over, i.e. wait for replication - [ ] TODO: what if a ddl change happened between schema dump and read-only? - [ ] drop the publications and subscriptions - [ ] reset cluster's wal_level to previous value - [x] set the cluster's sysid to our new sysid - [ ] TODO: is this desirable? it kills all other nodes, removing the leader - [x] perform a switchover to us - [ ] TODO: should we do this ourselves? (currently, the second node to upgrade forces a failover on us) - [x] we are now the leader - [x] wait for the rest of the replicas to do the same procedure - [x] cluster upgraded with minimal downtime!
max commented 2023-08-26 20:38:39 +03:00 (Migrated from git.privatevoid.net)

assigned to @max

assigned to @max
max commented 2023-08-26 21:52:57 +03:00 (Migrated from git.privatevoid.net)

changed the description

changed the description
max commented 2023-08-26 23:27:41 +03:00 (Migrated from git.privatevoid.net)

added 2 commits

  • 9498e87e - modules/patroni: pause patroni during upgrade, disable python stdout buffering for migration script
  • a12c249e - checks/patroni-migration: load a big example database

Compare with previous version

added 2 commits <ul><li>9498e87e - modules/patroni: pause patroni during upgrade, disable python stdout buffering for migration script</li><li>a12c249e - checks/patroni-migration: load a big example database</li></ul> [Compare with previous version](/private-void/depot/-/merge_requests/47/diffs?diff_id=992&start_sha=eedae3f8a488be85c5861b29dfd7eaf00b8f10da)
max commented 2023-08-26 23:28:56 +03:00 (Migrated from git.privatevoid.net)
Maybe use Spilo's upgrade script? https://github.com/zalando/spilo/blob/master/postgres-appliance/major_upgrade/inplace_upgrade.py
max changed title from Draft: PostgreSQL Cluster Auto-Upgrade to WIP: Draft: PostgreSQL Cluster Auto-Upgrade 2024-07-10 02:55:51 +03:00
max changed title from WIP: Draft: PostgreSQL Cluster Auto-Upgrade to WIP: PostgreSQL Cluster Auto-Upgrade 2024-07-10 02:55:56 +03:00
This pull request has changes conflicting with the target branch.
  • packages/checks/default.nix
View command line instructions

Checkout

From your project repository, check out a new branch and test the changes.
git fetch -u origin pg-auto-migrations:pg-auto-migrations
git checkout pg-auto-migrations

Merge

Merge the changes and update on Forgejo.
git checkout master
git merge --no-ff pg-auto-migrations
git checkout master
git merge --ff-only pg-auto-migrations
git checkout pg-auto-migrations
git rebase master
git checkout master
git merge --no-ff pg-auto-migrations
git checkout master
git merge --squash pg-auto-migrations
git checkout master
git merge --ff-only pg-auto-migrations
git checkout master
git merge pg-auto-migrations
git push origin master
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: privatevoid.net/depot#92
No description provided.