The Event-Spine Framework — Generalizing the Storm Storehouse

Status: foundational design doc · Opened: 2026-06-10 · Realized: 2026-06-14 (six kinds live) · Owner: Mike + Claude (engine room) House name: the Eventdex treatment (Mike, 2026-06-11 — renamed from "Pokedex" for professionalism, no accent mark). "Give X the Eventdex treatment" = build it a spine, a live edge, a sweep, and one dossier slot per event.

The tropical-cyclone monitor (issue #227) turned out to be the first instance of a general pattern, not a one-off. This doc names the pattern, states what any new event type needs to qualify, and inventories the candidates — ranked by how well they fit and verified against what is actually live in our scheduler.

This is a measured-reality framework. The bright-line test from docs/hurricane-arc-data-sources.md applies to every spine and every sensor here: is this a measurement of something that physically happened, or a computer's estimate of what will, might, or "would have" happened? The first is in; the second is out. No forecasts, no reanalysis, no model grids — not as spines, not as sweep layers.

First-class extensions (2026-06-16): two additions live in a companion doc, docs/yeardex-framework.md, and are part of this framework, not a side branch: Yeardex (the slot may be a calendar year instead of a discrete event, giving annual statistical series and inventories a home) and the multi-source cited-slot rule (any slot may carry data from several sources, each datum cited; the entry kind's multi-source slot space, below, generalized to every kind). Both inherit the measured-reality bright line unchanged.

Status: six kinds live (2026-06-14)

The framework below was designed 2026-06-10 and fully realized over the four days that followed. All six kinds are complete (spine + live edge + sweep + backfilled dossiers), all sweeps run on the shared 30-minute tick, and the storehouse holds 138,755 dossiers.

Kind	`kind`	Spine	Slots	Built
Tropical cyclones	`tc`	IBTrACS, 1842→	13,544	2026-06-10 (prototype)
Geomagnetic storms	`gst`	our `dst_index`, Dst threshold, 1957→	1,147	2026-06-11 (#228)
Earthquakes	`eq`	`usgs_earthquake` ComCat	13,839	2026-06-11
Tornadoes	`tor`	SPC tornado history, 1950→	73,628	2026-06-11
Volcanic eruptions	`vol`	Smithsonian GVP, 11k eruptions BCE→	11,089	2026-06-12
Atmospheric entry	`entry`	4 spines (GMN + SATCAT + CNEOS + meteorites)	25,508	2026-06-14

Two generalizations emerged during the build that the original design did not anticipate:

Multi-source kinds. The entry kind (anything entering the atmosphere and burning up, natural
- artificial) is the first kind fed by more than one spine catalog: GMN fireballs, CelesTrak SATCAT reentries, CNEOS bolides, and meteorite falls all land in one source-prefixed slot space (gmn:/sat:/cneos:/met:). One kind, four catalogs, globally-unique slot IDs. See the architecture note below.
Per-slot sweep eligibility. A slot need not be swept to be a full slot. In the entry kind, only located + sub-hour-precise events are swept; date-only reentries and year-only meteorite falls are full dossiers with an empty sensor section (index-only) — never lesser slots. This is the catalog-first, sweep-light posture: the spine is the product; sensor hits are a bonus.

Each kind froze its scope in a docs/scope-*-eventdex.md doc before backfill (floor, sweep geometry, ID rule, provisional model), and that freeze was never tuned afterward — the discipline the original doc demanded for the geomagnetic threshold, applied to every kind.

The pattern: spine + edge + sweep = slot

The TC monitor decomposes into four separable parts. Only the first two are phenomenon-specific; the last two are generic machinery we already built.

Spine — an authoritative catalog of discrete events, each with a stable ID, positions, times, and (ideally) a lifecycle. For tropical cyclones this is IBTrACS: 13,543 storms, 1842 → present, one sid per storm.
Live edge — a feed that extends the spine to right now, in the same normalized shape. For TCs this is the NHC observed-fix feed (15-min poll, forecast products dropped).
Sweep — for each event fix, query our own ingested sensors within a distance-and-time window (bbox prefilter + ST_DistanceSphere, profile-sensor collapse, per-fix caps). The engine in src/terrapulse/monitor/storm_sweep.py is already event-agnostic in everything but its name and its hardcoded sensor list / radius / window.
Slot — one addressable dossier per event (<id>.json + <id>_hits.parquet + a disk-rebuildable index). The slot is the unit of work: fill strategically, per event or per class of events, never as a blanket obligation.

The product is the storehouse, not any single report. Reports are queries we run against slots later. This was the TC arc's reframe and it carries over unchanged.

What qualifies as a spine

A candidate event type is workable when its catalog has:

Stable IDs — an event can be addressed, revisited, and enriched over time. No ID, no slot.
Measured-reality provenance — the catalog records what physically happened (best-track fixes, hypocenters, surveyed tornado paths), not what a model said would happen. Human-issued warnings are forecast products and fail this test as spines, even when the underlying agency is one we trust for measurements.
Positions and times — enough to drive the sweep window. A point event (earthquake) is fine; a track (storm) is richer; a purely temporal event (geomagnetic storm) skips the spatial query entirely.
Depth — a historical archive worth backfilling (the 1842 equivalent), so the storehouse has mass on day one.
A live edge — ideally, a same-shape feed for events happening now. Optional but valuable; it is what makes a monitor rather than an archive.

The sweep parameters (radius, time window, sensor list) are per-kind settings, not framework changes. TCs use 500 km / ±6 h; a geomagnetic storm uses no radius at all and a multi-day window; an earthquake might use 1000 km / ±1 h tight around the origin time.

Architecture (as built)

The slot key generalized from sid to (kind, event_id), exactly as planned:

One storehouse, dossiers tagged by kind. The root was renamed data/storm_storehouse/ → data/event_storehouse/ when the second kind landed (cheap — the storehouse is gitignored and rebuildable). One subdirectory per kind; <event_id>.json + <event_id>_hits.parquet per slot.
Per-kind config is a dataclass. event_storehouse.KindConfig carries kind, sensor_slugs, profile_sensors, radius_km (None = planetary / time-only), window_hours_before, window_hours_after. The old SENSOR_SLUGS/radius/window constants in storm_sweep.py became these per-kind settings; storm_sweep's _bbox / _POINT_QUERY / _PROFILE_QUERY / per-fix-cap machinery is reused verbatim by every kind's sweep module.
rebuild_index_from_disk() and the per-event crash-safe writes carried over as-is. The index gained a kind field and aggregates counts per kind.
Backfill stays one-shot and idempotent per kind (scripts/backfill_*_dossiers.py), keyed on the stable event_id so re-runs rewrite in place rather than duplicating.

Multi-source slot spaces (the `entry` generalization, 2026-06-14)

A kind may be fed by several spine catalogs at once. The entry sweep module (src/terrapulse/monitor/entry_sweep.py) registers a SOURCES table mapping each source prefix to its (slug, metric, id_field, subtype), and builds every slot id as f"{prefix}:{native_id}". A single generic spine query (common columns + the source-native extra_json) windows all sources by timestamp, so live-edge and backfill share one code path across four catalogs. The prefix keeps the slot space globally unique even when two catalogs reuse a numeric id. This added no change to the storehouse or index layer — it lives entirely in the kind's sweep module, which is where phenomenon-specific logic belongs.

A live edge can also be repaved without touching the spine: when GMN's Datasette query API began hanging on its 3.1M-row table, the gmn_meteors live fetcher was repointed at GMN's static daily bulk exports (same schema as the spine reload) — a fetcher-only change, the slot contract unchanged.

Candidate inventory (ranked by fit)

Every "live now" claim below was verified against the scheduler JOBS list on 2026-06-10. All five ranked candidates were subsequently built (see the status table at the top); the original per-candidate analysis is preserved here as the record of why each qualified, annotated with how it actually shipped.

1. Geomagnetic storms — best fit, zero new fetchers

Part	What	Status
Spine	Our own `dst_index` series: a storm = Dst crossing a threshold (e.g. −50 nT), lifecycle = onset → main phase → recovery. Kyoto's Dst record runs from 1957.	live
Live edge	Same feed — the spine and the edge are one source. `noaa_space_weather` (Kp) corroborates.	live
Sweep layers	`intermagnet` magnetometers, `nmdb_cosmic_rays` (Forbush decreases — detector already running), `goes_xray`, `dscovr_solar_wind`, `hamqsl_propagation`, `celestrak` (Starlink drag), `silso_sunspots` for context.	all live
Sweep geometry	Planetary events — time-window only, no spatial query. The cheapest sweep possible.	—
History depth	1957 → present (Dst); sensor depth varies by layer.	—

Notes: the Starlink thermosphere pilot was, in hindsight, one hand-built dossier of this kind. A geomagnetic-storm storehouse gives that work a permanent home and makes every future storm a slot that fills itself. Event IDs would be ours to mint (e.g. gst-YYYYMMDD); that is fine — the threshold rule must be written down once, before backfill, and never tuned afterward.

Built 2026-06-11 (#228): kind = gst, 1,147 storms, threshold rule FROZEN (7e76b1b, do not tune). Time-only planetary sweep, as predicted the cheapest of all.

2. Earthquakes — strong fit, catalogs already ingested

Part	What	Status
Spine	`usgs_earthquake` (ComCat IDs), corroborated by `emsc`, `isc`, `gfz_geofon`. Decades deep, already deduplicated by our canonical query.	live
Live edge	Same feeds — minutes-fresh.	live
Sweep layers	`noaa_tides` (tsunami / seiche signatures), `usgs_water` (gauges and wells genuinely respond to passing seismic waves), `intermagnet`.	all live
Sweep geometry	Point event: tight time window (±1 h), radius scaled to magnitude.	—
History depth	ComCat is effectively bottomless for M4.5+.	—

Notes: volume is the design decision. Tens of thousands of quakes a year means the slot floor (magnitude ≥ 5.5? ≥ 6?) decides whether the storehouse has 500 meaningful slots or 500,000 empty ones. Strategic-fill principle applies: big/coastal/famous quakes get rich fills.

Built 2026-06-11: kind = eq, 13,839 slots, magnitude floor FROZEN (a49a515). ComCat reconciliation against the local catalog remains an open follow-up.

3. Tornadoes / severe convective — good fit, lightning is the headline pairing

Part	What	Status
Spine	NCEI Storm Events database: tornado segments with IDs, surveyed start/end points (real tracks), back to 1950; hail and wind reports too. Publishes ~75 days behind.	needs fetcher
Live edge	`spc_reports` — measured storm reports (tornado/hail/wind), already live.	live
Sweep layers	`blitzortung_lightning` (the killer pairing — strike rates around a surveyed tornado track), `igra_soundings`, `noaa_tides`/`usgs_water` where coastal/flooding.	live
Sweep geometry	Track event like a TC, but hours not days: small radius (~100 km), tight window (±1 h per segment).	—
History depth	1950 → present for tornado tracks. Lightning history starts 2026-04 → pre-2026 slots are track-only by construction, exactly like the TC backfill.	—

Notes: one new fetcher (NCEI Storm Events bulk CSVs — measured reports, comfortably inside the bright line). SPC outlooks (spc_outlook) are forecasts and must never be a spine or sweep layer; SPC reports are measurements and already are one.

Built 2026-06-11: kind = tor, 73,628 slots — and the spine turned out to be already ingested (spc_tornado_history, 1950→present, reloaded wholesale), so no new fetcher was needed after all. Scope FROZEN (8c263de): 150 km / ±6 h on both track endpoints, two-tier provisional model. Gotcha banked: SPC renumbers tornado IDs between annual releases, so the spine is re-pulled wholesale each spring rather than appended.

4. Volcanic eruptions — workable, slow-burn

Part	What	Status
Spine	Smithsonian GVP eruption catalog: volcano numbers + eruption IDs, centuries deep, real start/end lifecycles.	live (`smithsonian_gvp`)
Live edge	`usgs_volcanoes` status feed + GVP weekly.	live
Sweep layers	Earthquake catalogs near the edifice (the classic precursor measurement); `igra_soundings` for plumes near launch sites. Waveform seismic is the known IRIS gap (hurricane-arc T1) — when that fetcher lands, both this kind and TC microseisms benefit.	live / gap
Sweep geometry	Fixed location, long lifecycle (weeks–years): small radius, very wide window.	—
History depth	Centuries (GVP).	—

Built 2026-06-12: kind = vol, 11,089 eruptions (+ 1,215 volcanoes), no floor (full GVP catalog), scope FROZEN (0a88917): 100 km / [−14 d, +7 d], slot id = eruption_number. Gotchas banked: Smithsonian WAF 403s on startIndex≥4000 and on python clients (use curl + keyset pagination); BCE/year-0 trap means vol PG sessions must run SET TIME ZONE 'UTC'.

5. Winter storms — the asked-about case; honest answer: weakest spine, most construction

There is no IBTrACS for winter storms. No agency assigns official IDs and tracks. The pieces:

NCEI Storm Events "episodes" — IDs exist, but footprints are county zones (not tracks) and publication lags ~75 days. Usable as a coarse historical spine.
NESIS (Northeast Snowfall Impact Scale) — a real ranked catalog of major Northeast snowstorms, but only ~30-odd events and regional.
NWS winter warnings (nws_alerts) — human-issued forecast products. Fails the bright line as a spine. (They stay fine as platform data; they just can't define our events.)
The measured-reality path: GHCN-Daily station snowfall/precip observations — real gauges read by real observers, century-deep, free, and we do not currently ingest it. With GHCN-D in, we could construct events ourselves (e.g. contiguous days where ≥N stations in a region report ≥X cm), with the rule written down before any backfill.

Verdict: doable, and the gap is fillable with a genuine measurement source (GHCN-Daily is a worthy fetcher regardless — it upgrades several other kinds' sweeps too). But it is the most work for the fuzziest spine: we would be minting both the catalog and the IDs. Park it behind the kinds that come with authoritative spines; revisit once GHCN-D is ingested.

Not built (still parked, 2026-06-14): the only ranked candidate without a kind yet. Blocked on the GHCN-Daily fetcher, exactly as the verdict anticipated. The five authoritative-spine kinds came first; this remains the construct-your-own-catalog case.

Exotic spines (real, cheap, already live — for later)

Gravitational-wave events — gwosc_events is live; discrete cataloged events with IDs. Sweep layers are thin (what on Earth co-measures a GW?) but the slots cost nothing.
Fireballs / bolides — CNEOS fireball catalog + gmn_meteors + nasa_neo; the fireball paper (#221) already walked this ground. Point events, global, well-ID'd. → Built 2026-06-14 as the entry kind, and it grew well past "exotic": four spines (GMN fireballs AbsMag ≤ −4, CelesTrak SATCAT reentries, CNEOS bolides, meteorite falls 860 AD→), 25,508 slots, the first multi-source kind. Scope FROZEN (6e2f29f): 100 km / ±2 h, catalog-first sweep-light, measured reality reentry rule (observed decay epoch IN, TIP-predicted impact point OUT). The redundant nasa_neo fireball emission was retired in favor of the canonical cneos_bolide spine.
Solar flares / CMEs — goes_xray flare events and coronagraph-observed CMEs are measurements; most flares fold naturally into geomagnetic-storm dossiers as sweep layers rather than needing their own kind. nasa_donki requires care: its observation records are in, its model runs (WSA-ENLIL propagation) are out.

Sequencing — recommended vs realized

The plan held almost exactly, with two happy surprises: the tornado spine was already ingested (no new fetcher needed), and the "exotic" fireball idea grew into the full multi-source entry kind.

Geomagnetic storms first. ✅ 2026-06-11 (#228). Zero new fetchers, time-only sweep, and it gave the Starlink pilot a permanent home. This was also where the generic refactor landed: per-kind KindConfig, (kind, event_id) slot keys, storehouse rename.
Earthquakes second. ✅ 2026-06-11. Pure config once the refactor existed; the only real decision was the magnitude floor.
Tornadoes. ✅ 2026-06-11 — and no NCEI fetcher was needed; spc_tornado_history was already ingested and just needed a wholesale reload.
Volcanoes. ✅ 2026-06-12. Full GVP catalog, no floor.
Atmospheric entry. ✅ 2026-06-14. The fireball exotic, generalized to the first multi-source kind (four spines).
Winter storms. ⏸ still parked behind the GHCN-Daily fetcher (the construct-your-own-catalog case). Gravitational-wave and standalone solar-flare/CME kinds remain cheap-but-unbuilt exotics.

Same operating mode throughout: live-forward first per kind, history backfill second, strategic per-slot fills always; scope frozen in a docs/scope-*-eventdex.md doc before backfill and never tuned. Multi-hour backfills run under nohup (background jobs die on session exit — learned 2026-06-10).