Migration plan — move Postgres off the USB drive onto internal SSD
Goal: get TerraPulse's Postgres data off /mnt/ursa (the consumer WD easystore
USB drive that dropped and corrupted on 2026-05-30, see
incident postmortem) and onto internal
NVMe, eliminating the USB-transport drop risk for the primary database.
Decision (Mike, 2026-06-01): reclaim /mnt/whoru for the landing zone, archive
its /home first. Migrate PG16 (TerraPulse, 153 GB) first.
Situation / constraints (verified 2026-06-01)
- Postgres data: PG16 (5433, TerraPulse) = 153 GB; PG15 (5432, secondary / nominate.ai) = 268 GB. Total 421 GB.
- Internal storage = 2 NVMe only, both ~full:
- Samsung 980 1 TB:
/(19 GB free) +/mnt/whoru(456 GB, 100% full). - WD SN520 512 GB:
/mnt/storage(23 GB free). - Every other disk on the box (17 of them) is USB.
- Samsung 980 1 TB:
- Landing zone =
/mnt/whoru(nvme0n1p2, UUID381fb52a-f773-4dda-b774-1647189209d5). It is a dormant OS clone of host "akira" (its own root tree + fstab;/homefor bisenbek+ubuntu; last modified 2024-12; nothing written in 2025–2026). In the current fstab asnofail. - Capacity math: reclaimed 456 GB fits PG16 (153 GB) with ~300 GB headroom. It does NOT safely fit PG16+PG15 (421/456 leaves no room for WAL/temp/vacuum/ growth). So PG16-only on internal; PG15 is a separate later decision (Phase 3).
- Backups: a valid 6 GB dump from 2026-05-31 is on
/mnt/backup-ursa(separate disk), verified restorable. This is the safety net if anything goes wrong.
Archive target for whoru's /home (380 GB)
Needs a non-critical USB disk with ≥380 GB free. Candidates (free space):
/mnt/marzano (1.6 TB, 8% used) ← proposed, /mnt/nom01 (2.9 TB),
/mnt/nom02 (2.8 TB), /mnt/tm (1.4 TB). NOT /mnt/backup-ursa (reserved for PG
backups). Mike to confirm/redirect.
Phase 0 — Pre-flight (no downtime, no risk)
- Confirm fresh valid backup exists on
/mnt/backup-ursa(already true; re-runterrapulse-backup.serviceif stale). - Confirm no symlinks/tablespaces point outside the PG16 data dir:
(Expect only pg_default/pg_global = inside data dir. If external tablespaces exist, add them to the rsync set.)sudo -u postgres psql -p 5433 -Atc "select spcname, pg_tablespace_location(oid) from pg_tablespace;" - Record current data_directory + row-count baseline for post-migration validation:
sudo -u postgres psql -p 5433 -d terrapulse -Atc "select count(*) from observations;" # expect ~264.7M
Phase 1 — Archive & reclaim /mnt/whoru (no PG downtime)
- Archive akira's /home to the chosen USB disk (verify-on-copy):
sudo mkdir -p /mnt/marzano/akira-archive-2026-06 sudo rsync -aHAX --info=progress2 /mnt/whoru/home/ /mnt/marzano/akira-archive-2026-06/home/ # plus a small reference tar of system config sudo tar -czf /mnt/marzano/akira-archive-2026-06/etc-var-log.tar.gz -C /mnt/whoru etc var/log 2>/dev/null || true - Verify the archive (counts + a checksum dry-run must report no differences):
DO NOT proceed until the archive verifies clean.sudo rsync -aHAXn --checksum /mnt/whoru/home/ /mnt/marzano/akira-archive-2026-06/home/ | head diff <(sudo find /mnt/whoru/home -type f | wc -l) <(sudo find /mnt/marzano/akira-archive-2026-06/home -type f | wc -l) - Unmount + remove from fstab (it's
nofail, safe):sudo umount /mnt/whoru sudo sed -i '/381fb52a-f773-4dda-b774-1647189209d5/d' /etc/fstab - Reformat the partition fresh and mount as the PG data home:
sudo mkfs.ext4 -L pgdata /dev/nvme0n1p2 sudo mkdir -p /var/lib/pgdata echo 'LABEL=pgdata /var/lib/pgdata ext4 defaults 0 2' | sudo tee -a /etc/fstab sudo systemctl daemon-reload && sudo mount /var/lib/pgdata sudo chown postgres:postgres /var/lib/pgdata && sudo chmod 700 /var/lib/pgdata
Phase 2 — Migrate PG16 (minimal downtime)
- Pre-sync LIVE (PG still serving — copies the bulk of 153 GB now):
sudo rsync -aHAX --delete --info=progress2 \ /mnt/ursa/data/terrapulse/postgres/16/main/ /var/lib/pgdata/16-main/ - Maintenance window (downtime starts — target a few minutes):
# quiesce writers + API, then stop the cluster sudo systemctl stop terrapulse terrapulse-blitzortung terrapulse-glm terrapulse-pulse sudo systemctl stop postgresql@16-main # final delta sync (fast — only what changed since pre-sync) sudo rsync -aHAX --delete --info=progress2 \ /mnt/ursa/data/terrapulse/postgres/16/main/ /var/lib/pgdata/16-main/ sudo chown -R postgres:postgres /var/lib/pgdata/16-main && sudo chmod 700 /var/lib/pgdata/16-main # repoint the cluster sudo sed -i "s#^data_directory = .*#data_directory = '/var/lib/pgdata/16-main'#" \ /etc/postgresql/16/main/postgresql.conf sudo systemctl start postgresql@16-main - Validate before declaring success:
- log shows
database system is ready to accept connectionsand a completed checkpoint (not the EUCLEAN PANIC from the incident). select count(*) from observations;matches the Phase-0 baseline (~264.7M).select pg_relation_filepath('observations');resolves under/var/lib/pgdata.
- log shows
- Bring services back + verify: start terrapulse units; API health
{"status":"ok","db":true}; site 200; fresh observation timestamp advancing.
Phase 3 — Soak & reclaim (days later)
- Leave the old USB data dir intact, renamed, for a soak period:
(Do this only after Phase 2 validates; the rename makes accidental use obvious.)sudo mv /mnt/ursa/data/terrapulse/postgres/16/main /mnt/ursa/data/terrapulse/postgres/16/main.PRE-MIGRATION - After ~3–7 days of healthy operation on internal NVMe, delete the old dir to reclaim USB space.
Phase 4 — PG15 (separate decision, later)
268 GB won't fit on the reclaimed SSD alongside PG16 with safe headroom. Options: (a) slim PG15 then co-locate; (b) dedicated hardware; (c) leave on a hardened USB setup (usb-storage not uas, power-saving already off). This is the nominate.ai DB; lower priority for TerraPulse's risk. Decide after PG16 is settled.
Rollback
At every point before Phase 3's delete, the original USB data dir is untouched. If PG16 won't start on internal:
sudo systemctl stop postgresql@16-main
sudo sed -i "s#^data_directory = .*#data_directory = '/mnt/ursa/data/terrapulse/postgres/16/main'#" \
/etc/postgresql/16/main/postgresql.conf
sudo systemctl start postgresql@16-main # back on USB, zero data loss
Last-resort: restore from the verified dump on /mnt/backup-ursa.
Risk notes
- rsync of a live PG data dir (Phase 2 pre-sync) is safe ONLY because a final delta sync runs after the cluster is stopped; the stopped-state copy is the authoritative one.
- The internal Samsung 980 is fast NVMe; 153 GB final-delta sync after a good pre-sync should be seconds-to-minutes. Pre-sync read is gated by USB read speed.
- Keep the box's power-saving disabled (done 2026-05-31) so nothing suspends mid-copy.
FLAG FOR BRAD — PG15 (nominatim) is still on the USB drive (2026-06-09)
This part is not TerraPulse's to migrate — it's yours. Raising it because it's a real, unresolved risk on shared hardware.
What: PG15 on port 5432 (data dir /mnt/ursa/data/postgresql, the nominatim
database backing nominate.ai / geo.campaignbrain.dev) is still living on
/mnt/ursa — the consumer WD easystore USB drive whose transport drop corrupted
the ext4 filesystem and took both businesses down on 2026-05-30.
Why it matters: TerraPulse already moved its own DB (PG16) off that drive onto
internal NVMe on 2026-06-01, which is the only reason TerraPulse survived the
2026-06-01 hard-freeze with zero data loss. PG15/nominatim did not get that
protection — it's still exposed to the same USB-transport-drop failure mode. If the
drive drops again, nominate.ai's geocoding DB is the thing that corrupts.
Why TerraPulse isn't doing it: that cluster is yours (Campaign Brain /
nominate.ai). TerraPulse only consumes it over HTTP at geo.campaignbrain.dev; we
never touch the cluster directly, and per the box's tenancy boundary we don't move
other tenants' data. Hence: flag, not action.
Constraints worth knowing (from the PG16 migration above):
- PG15
nominatimis ~268 GB. The internal NVMe landing zone used for PG16 does not have room for both (421 GB of PG vs 456 GB reclaimed). PG15 needs its own internal target, or a cleanup, before it can move. /mnt/ursais a SHARED mount;dragonfli.servicepropagates it into the host namespace (mount --make-privatebefore any umount). Quiesce both businesses' services before touching the mount.- A verified-restorable backup pattern and the full PG16 playbook are above — the same Phase 1–3 approach applies to PG15 if/when you decide to move it.
Ask: decide whether to migrate PG15/nominatim off the USB drive (recommended,
same rationale that saved PG16), and if so, where its ~268 GB lands.