Severe-Weather Split — FROZEN scope (Brick A, 2026-06-17)
Mike's ruling (2026-06-17): the non-tornado NCEI Storm Events data is not one phenomenon —
split it into separate phenomenon kinds, the way every other Eventdex is one phenomenon.
Guardrail: do not touch existing Eventdexes/Yeardexes. Overlapping types route to their
existing homes; they are not re-dexed here. All four open judgment calls resolved in favor of the
recommended leans (one wind kind; own lightning_report kind; dust/avalanche own kinds;
tsunami inside coastal). FROZEN — 15 new kinds (the 14 below + lightning_report).
Slot granularity (engine-room call, locked)
One slot = one NCEI EVENT_ID (one hazard occurrence in one county/forecast zone). NCEI splits a
multi-county storm into one EVENT_ID per county under a shared EPISODE_ID; EVENT_ID is the
record that carries its own magnitude, location, timing, and narrative, so it is the "one event" unit
per docs/dex-data-model.md. EPISODE_ID is preserved as a slot field for storm-system grouping.
This matches how the tor enrichment treated NCEI (per-segment).
Storage (locked) — spine parquet first, dossiers staged
The shared storehouse_index.json is already ~97 MB at 278k slots; file-per-slot for all ~1.94M
severe-wx events would push it past ~770 MB and make every kind's rebuild_index_from_disk() rescan
~2.2M files. So:
- Brick B (spine): one per-kind spine parquet in the storehouse (
<kind>/<kind>_spine.parquet), one row per slot. Fast, compact, the authoritative slot list papers draw from. No per-slot files, no shared-index bloat. - Brick D (dossiers + tests): per-slot JSON dossiers generated only after the index is made scalable (per-kind manifest, not one mega-index). Settle the index fix before generating ~1.94M files. Flagged, not silently deferred.
Source: NCEI Storm Events detail files (stormevents-csvfiles → the StormEvents_details*.csv.gz
archive, 1950–2026, already cached at data/stormevents_cache/). 2,023,627 total rows, 56 event
types. Same files + parser as the tornado-enrich build (scripts/match_stormevents_tornadoes.py),
filter inverted.
Excluded — route to existing kinds, leave them alone (per Mike's guardrail)
- Tornado (80,402) → existing
torkind. - Tropical Storm (7,146), Hurricane (Typhoon) (2,149), Tropical Depression (552),
Marine Tropical Storm (603), Marine Hurricane/Typhoon (109), Marine Tropical Depression
(31) → existing
tckind. - Volcanic Ashfall (78), Volcanic Ash (70) → existing
volkind. - Northern Lights (8) → aurora /
gst-adjacent, negligible; exclude.
(Lightning impact records are NOT excluded — they form their own lightning_report kind, resolution B.
The lightning detection Eventdex stays untouched.)
Proposed new phenomenon kinds
Ordered by volume. Each is its own Eventdex spine; the StormEvents EVENT_TYPE is preserved as a
slot sub-field so nothing is flattened away.
- wind (~734k) — Thunderstorm Wind (564,601), High Wind (96,811), Marine Thunderstorm Wind (43,140), Strong Wind (28,243), Marine High Wind (997), Marine Strong Wind (166).
- hail (~420k) — Hail (419,169), Marine Hail (856).
- winter (~285k) — Winter Storm (91,955), Winter Weather (84,694), Heavy Snow (75,954), Blizzard (17,153), Ice Storm (12,632), Lake-Effect Snow (2,880), Sleet (859).
- flood (~185k) — Flash Flood (110,337), Flood (70,189), Coastal Flood (4,647), Lakeshore Flood (359).
- drought (82,518) — Drought. (Open-window/slow; NCEI records bounded episodes — keep as a kind, note the velocity in scope.)
- heat (~56k) — Heat (34,898), Excessive Heat (21,179).
- cold (~54k) — Extreme Cold/Wind Chill (19,289), Cold/Wind Chill (19,130), Frost/Freeze (15,244).
- heavy_rain (~34k) — Heavy Rain (31,516), Debris Flow (2,522).
- fog (~18k) — Dense Fog (17,776), Freezing Fog (502), Marine Dense Fog (22).
- funnel (~16k) — Funnel Cloud (9,941), Waterspout (6,238). Non-touchdown / over-water rotation
(distinct from
tor, which requires ground contact). - coastal (~15k) — High Surf (10,640), Storm Surge/Tide (1,656), Rip Current (1,891), Astronomical Low Tide (784), Sneakerwave (68), Seiche (76), Tsunami (52).
- wildfire (~9.5k) — Wildfire (9,385), Dense Smoke (147).
- dust (~2.3k) — Dust Storm (2,066), Dust Devil (255).
- avalanche (869) — Avalanche.
- lightning_report (~18k) — Lightning (18,171), Marine Lightning (2). Storm-Data impact
(casualty/damage) records; distinct kind from the detection-based
lightningdex (resolution B).
That is 15 new kinds, ~1.96M slots, on top of the existing 13.
Judgment calls — RESOLVED (Mike took all four leans, 2026-06-17)
- A.
wind— RESOLVED: onewindkind. Thunderstorm/Marine-Thunderstorm wind (convective gust) and High/Strong wind (synoptic gradient) share one kind; theEVENT_TYPEdistinguishes driver as a slot field. - B. Lightning reports (18,171) — RESOLVED: own
lightning_reportkind. Storm-Data impact (casualty/damage) records, distinct from the detection-basedlightningdex (untouched). - C. Tail granularity — RESOLVED: own kinds.
dust(2.3k) andavalanche(869) each get their own kind; the principle is split. - D. Tsunami (52) — RESOLVED: inside
coastal. Promote to its own kind later if it grows.
Final frozen kind roster — 15 new kinds
The 14 proposed above plus lightning_report (resolution B), since the lightning impact records
form their own kind rather than folding into the existing detection dex. Total 15 new kinds,
~1.96M slots.
Build plan (once the map is frozen)
One shared engine, fanned across kinds: pull the cached NCEI details once, group by EVENT_ID
(slot granularity — county-segment vs EPISODE_ID storm-system is a downstream Brick-A sub-decision),
route each row to its kind by EVENT_TYPE, write per-kind spines into event_storehouse. Same
CST→UTC fix and parser as the tornado build. Bricks A (freeze) → B (spine, all kinds) → D (tests).