The Event-Spine Framework — Generalizing the Storm Storehouse
Status: foundational design doc · Opened: 2026-06-10 · Owner: Mike + Claude (engine room)
The tropical-cyclone monitor (issue #227) turned out to be the first instance of a general pattern, not a one-off. This doc names the pattern, states what any new event type needs to qualify, and inventories the candidates — ranked by how well they fit and verified against what is actually live in our scheduler as of 2026-06-10.
This is a measured-reality framework. The bright-line test from
docs/hurricane-arc-data-sources.md applies to every spine and every sensor here: is this a
measurement of something that physically happened, or a computer's estimate of what will, might,
or "would have" happened? The first is in; the second is out. No forecasts, no reanalysis, no
model grids — not as spines, not as sweep layers.
The pattern: spine + edge + sweep = slot
The TC monitor decomposes into four separable parts. Only the first two are phenomenon-specific; the last two are generic machinery we already built.
- Spine — an authoritative catalog of discrete events, each with a stable ID, positions,
times, and (ideally) a lifecycle. For tropical cyclones this is IBTrACS: 13,543 storms,
1842 → present, one
sidper storm. - Live edge — a feed that extends the spine to right now, in the same normalized shape. For TCs this is the NHC observed-fix feed (15-min poll, forecast products dropped).
- Sweep — for each event fix, query our own ingested sensors within a distance-and-time
window (bbox prefilter +
ST_DistanceSphere, profile-sensor collapse, per-fix caps). The engine insrc/terrapulse/monitor/storm_sweep.pyis already event-agnostic in everything but its name and its hardcoded sensor list / radius / window. - Slot — one addressable dossier per event (
<id>.json+<id>_hits.parquet+ a disk-rebuildable index). The slot is the unit of work: fill strategically, per event or per class of events, never as a blanket obligation.
The product is the storehouse, not any single report. Reports are queries we run against slots later. This was the TC arc's reframe and it carries over unchanged.
What qualifies as a spine
A candidate event type is workable when its catalog has:
- Stable IDs — an event can be addressed, revisited, and enriched over time. No ID, no slot.
- Measured-reality provenance — the catalog records what physically happened (best-track fixes, hypocenters, surveyed tornado paths), not what a model said would happen. Human-issued warnings are forecast products and fail this test as spines, even when the underlying agency is one we trust for measurements.
- Positions and times — enough to drive the sweep window. A point event (earthquake) is fine; a track (storm) is richer; a purely temporal event (geomagnetic storm) skips the spatial query entirely.
- Depth — a historical archive worth backfilling (the 1842 equivalent), so the storehouse has mass on day one.
- A live edge — ideally, a same-shape feed for events happening now. Optional but valuable; it is what makes a monitor rather than an archive.
The sweep parameters (radius, time window, sensor list) are per-kind settings, not framework changes. TCs use 500 km / ±6 h; a geomagnetic storm uses no radius at all and a multi-day window; an earthquake might use 1000 km / ±1 h tight around the origin time.
Architecture change (small, deliberate)
The slot key generalizes from sid to (kind, event_id):
- One storehouse, dossiers tagged by
kind(tc,geomagnetic_storm,earthquake, …). Subdirectory per kind underdata/storm_storehouse/(or rename the root todata/event_storehouse/when the second kind lands — cheap, since the storehouse is gitignored and rebuildable). - Each kind registers: spine source slug(s), live-edge slug (optional), sweep radius, sweep
window, sensor slugs. The existing
SENSOR_SLUGS/radius/window constants instorm_sweep.pybecome per-kind config. rebuild_index_from_disk()and the per-event crash-safe writes carry over as-is. The index gains akindfield.- Backfill stays one-shot and idempotent per kind, same as
scripts/backfill_storms.py.
Nothing here requires touching the TC monitor's behavior; the first new kind is added beside it, and the rename/refactor happens at that moment, not speculatively.
Candidate inventory (ranked by fit)
Every "live now" claim below was verified against the scheduler JOBS list on 2026-06-10.
1. Geomagnetic storms — best fit, zero new fetchers
| Part | What | Status |
|---|---|---|
| Spine | Our own dst_index series: a storm = Dst crossing a threshold (e.g. −50 nT), lifecycle = onset → main phase → recovery. Kyoto's Dst record runs from 1957. |
live |
| Live edge | Same feed — the spine and the edge are one source. noaa_space_weather (Kp) corroborates. |
live |
| Sweep layers | intermagnet magnetometers, nmdb_cosmic_rays (Forbush decreases — detector already running), goes_xray, dscovr_solar_wind, hamqsl_propagation, celestrak (Starlink drag), silso_sunspots for context. |
all live |
| Sweep geometry | Planetary events — time-window only, no spatial query. The cheapest sweep possible. | — |
| History depth | 1957 → present (Dst); sensor depth varies by layer. | — |
Notes: the Starlink thermosphere pilot was, in hindsight, one hand-built dossier of this kind.
A geomagnetic-storm storehouse gives that work a permanent home and makes every future storm a
slot that fills itself. Event IDs would be ours to mint (e.g. gst-YYYYMMDD); that is fine —
the threshold rule must be written down once, before backfill, and never tuned afterward.
2. Earthquakes — strong fit, catalogs already ingested
| Part | What | Status |
|---|---|---|
| Spine | usgs_earthquake (ComCat IDs), corroborated by emsc, isc, gfz_geofon. Decades deep, already deduplicated by our canonical query. |
live |
| Live edge | Same feeds — minutes-fresh. | live |
| Sweep layers | noaa_tides (tsunami / seiche signatures), usgs_water (gauges and wells genuinely respond to passing seismic waves), intermagnet. |
all live |
| Sweep geometry | Point event: tight time window (±1 h), radius scaled to magnitude. | — |
| History depth | ComCat is effectively bottomless for M4.5+. | — |
Notes: volume is the design decision. Tens of thousands of quakes a year means the slot floor (magnitude ≥ 5.5? ≥ 6?) decides whether the storehouse has 500 meaningful slots or 500,000 empty ones. Strategic-fill principle applies: big/coastal/famous quakes get rich fills.
3. Tornadoes / severe convective — good fit, lightning is the headline pairing
| Part | What | Status |
|---|---|---|
| Spine | NCEI Storm Events database: tornado segments with IDs, surveyed start/end points (real tracks), back to 1950; hail and wind reports too. Publishes ~75 days behind. | needs fetcher |
| Live edge | spc_reports — measured storm reports (tornado/hail/wind), already live. |
live |
| Sweep layers | blitzortung_lightning (the killer pairing — strike rates around a surveyed tornado track), igra_soundings, noaa_tides/usgs_water where coastal/flooding. |
live |
| Sweep geometry | Track event like a TC, but hours not days: small radius (~100 km), tight window (±1 h per segment). | — |
| History depth | 1950 → present for tornado tracks. Lightning history starts 2026-04 → pre-2026 slots are track-only by construction, exactly like the TC backfill. | — |
Notes: one new fetcher (NCEI Storm Events bulk CSVs — measured reports, comfortably inside the
bright line). SPC outlooks (spc_outlook) are forecasts and must never be a spine or sweep
layer; SPC reports are measurements and already are one.
4. Volcanic eruptions — workable, slow-burn
| Part | What | Status |
|---|---|---|
| Spine | Smithsonian GVP eruption catalog: volcano numbers + eruption IDs, centuries deep, real start/end lifecycles. | live (smithsonian_gvp) |
| Live edge | usgs_volcanoes status feed + GVP weekly. |
live |
| Sweep layers | Earthquake catalogs near the edifice (the classic precursor measurement); igra_soundings for plumes near launch sites. Waveform seismic is the known IRIS gap (hurricane-arc T1) — when that fetcher lands, both this kind and TC microseisms benefit. |
live / gap |
| Sweep geometry | Fixed location, long lifecycle (weeks–years): small radius, very wide window. | — |
| History depth | Centuries (GVP). | — |
5. Winter storms — the asked-about case; honest answer: weakest spine, most construction
There is no IBTrACS for winter storms. No agency assigns official IDs and tracks. The pieces:
- NCEI Storm Events "episodes" — IDs exist, but footprints are county zones (not tracks) and publication lags ~75 days. Usable as a coarse historical spine.
- NESIS (Northeast Snowfall Impact Scale) — a real ranked catalog of major Northeast snowstorms, but only ~30-odd events and regional.
- NWS winter warnings (
nws_alerts) — human-issued forecast products. Fails the bright line as a spine. (They stay fine as platform data; they just can't define our events.) - The measured-reality path: GHCN-Daily station snowfall/precip observations — real gauges read by real observers, century-deep, free, and we do not currently ingest it. With GHCN-D in, we could construct events ourselves (e.g. contiguous days where ≥N stations in a region report ≥X cm), with the rule written down before any backfill.
Verdict: doable, and the gap is fillable with a genuine measurement source (GHCN-Daily is a worthy fetcher regardless — it upgrades several other kinds' sweeps too). But it is the most work for the fuzziest spine: we would be minting both the catalog and the IDs. Park it behind the kinds that come with authoritative spines; revisit once GHCN-D is ingested.
Exotic spines (real, cheap, already live — for later)
- Gravitational-wave events —
gwosc_eventsis live; discrete cataloged events with IDs. Sweep layers are thin (what on Earth co-measures a GW?) but the slots cost nothing. - Fireballs / bolides — CNEOS fireball catalog +
gmn_meteors+nasa_neo; the fireball paper (#221) already walked this ground. Point events, global, well-ID'd. - Solar flares / CMEs —
goes_xrayflare events and coronagraph-observed CMEs are measurements; most flares fold naturally into geomagnetic-storm dossiers as sweep layers rather than needing their own kind.nasa_donkirequires care: its observation records are in, its model runs (WSA-ENLIL propagation) are out.
Sequencing recommendation
- Geomagnetic storms first. Zero new fetchers, spine derived from data we already own,
time-only sweep (cheapest possible), ~70 years of backfill depth, and it retroactively gives
the Starlink pilot a permanent home. This is also the moment the generic refactor happens:
per-kind config,
(kind, event_id)slot keys, storehouse rename. - Earthquakes second. Pure config once the refactor exists; the only real decision is the magnitude floor.
- Tornadoes when the NCEI Storm Events fetcher lands. One bulk-CSV fetcher buys a 75-year track spine.
- Winter storms after GHCN-Daily. The fetcher is the prerequisite; the event-construction rule gets written and frozen before any backfill.
- Volcanoes / exotics opportunistically. Slots are cheap; add kinds when a sweep layer makes them interesting.
Same operating mode as the TC arc: live-forward first per kind, history backfill second,
strategic per-slot fills always. Multi-hour backfills run under nohup (background jobs die on
session exit — learned 2026-06-10).