Listening for events…

Severe-Weather Split — FROZEN scope (Brick A, 2026-06-17)

Mike's ruling (2026-06-17): the non-tornado NCEI Storm Events data is not one phenomenon — split it into separate phenomenon kinds, the way every other Eventdex is one phenomenon. Guardrail: do not touch existing Eventdexes/Yeardexes. Overlapping types route to their existing homes; they are not re-dexed here. All four open judgment calls resolved in favor of the recommended leans (one wind kind; own lightning_report kind; dust/avalanche own kinds; tsunami inside coastal). FROZEN — 15 new kinds (the 14 below + lightning_report).

Slot granularity (engine-room call, locked)

One slot = one NCEI EVENT_ID (one hazard occurrence in one county/forecast zone). NCEI splits a multi-county storm into one EVENT_ID per county under a shared EPISODE_ID; EVENT_ID is the record that carries its own magnitude, location, timing, and narrative, so it is the "one event" unit per docs/dex-data-model.md. EPISODE_ID is preserved as a slot field for storm-system grouping. This matches how the tor enrichment treated NCEI (per-segment).

Storage (locked) — spine parquet first, dossiers staged

The shared storehouse_index.json is already ~97 MB at 278k slots; file-per-slot for all ~1.94M severe-wx events would push it past ~770 MB and make every kind's rebuild_index_from_disk() rescan ~2.2M files. So:

  • Brick B (spine): one per-kind spine parquet in the storehouse (<kind>/<kind>_spine.parquet), one row per slot. Fast, compact, the authoritative slot list papers draw from. No per-slot files, no shared-index bloat.
  • Brick D (dossiers + tests): per-slot JSON dossiers generated only after the index is made scalable (per-kind manifest, not one mega-index). Settle the index fix before generating ~1.94M files. Flagged, not silently deferred.

Source: NCEI Storm Events detail files (stormevents-csvfiles → the StormEvents_details*.csv.gz archive, 1950–2026, already cached at data/stormevents_cache/). 2,023,627 total rows, 56 event types. Same files + parser as the tornado-enrich build (scripts/match_stormevents_tornadoes.py), filter inverted.

Excluded — route to existing kinds, leave them alone (per Mike's guardrail)

  • Tornado (80,402) → existing tor kind.
  • Tropical Storm (7,146), Hurricane (Typhoon) (2,149), Tropical Depression (552), Marine Tropical Storm (603), Marine Hurricane/Typhoon (109), Marine Tropical Depression (31) → existing tc kind.
  • Volcanic Ashfall (78), Volcanic Ash (70) → existing vol kind.
  • Northern Lights (8) → aurora / gst-adjacent, negligible; exclude.

(Lightning impact records are NOT excluded — they form their own lightning_report kind, resolution B. The lightning detection Eventdex stays untouched.)

Proposed new phenomenon kinds

Ordered by volume. Each is its own Eventdex spine; the StormEvents EVENT_TYPE is preserved as a slot sub-field so nothing is flattened away.

  1. wind (~734k) — Thunderstorm Wind (564,601), High Wind (96,811), Marine Thunderstorm Wind (43,140), Strong Wind (28,243), Marine High Wind (997), Marine Strong Wind (166).
  2. hail (~420k) — Hail (419,169), Marine Hail (856).
  3. winter (~285k) — Winter Storm (91,955), Winter Weather (84,694), Heavy Snow (75,954), Blizzard (17,153), Ice Storm (12,632), Lake-Effect Snow (2,880), Sleet (859).
  4. flood (~185k) — Flash Flood (110,337), Flood (70,189), Coastal Flood (4,647), Lakeshore Flood (359).
  5. drought (82,518) — Drought. (Open-window/slow; NCEI records bounded episodes — keep as a kind, note the velocity in scope.)
  6. heat (~56k) — Heat (34,898), Excessive Heat (21,179).
  7. cold (~54k) — Extreme Cold/Wind Chill (19,289), Cold/Wind Chill (19,130), Frost/Freeze (15,244).
  8. heavy_rain (~34k) — Heavy Rain (31,516), Debris Flow (2,522).
  9. fog (~18k) — Dense Fog (17,776), Freezing Fog (502), Marine Dense Fog (22).
  10. funnel (~16k) — Funnel Cloud (9,941), Waterspout (6,238). Non-touchdown / over-water rotation (distinct from tor, which requires ground contact).
  11. coastal (~15k) — High Surf (10,640), Storm Surge/Tide (1,656), Rip Current (1,891), Astronomical Low Tide (784), Sneakerwave (68), Seiche (76), Tsunami (52).
  12. wildfire (~9.5k) — Wildfire (9,385), Dense Smoke (147).
  13. dust (~2.3k) — Dust Storm (2,066), Dust Devil (255).
  14. avalanche (869) — Avalanche.
  15. lightning_report (~18k) — Lightning (18,171), Marine Lightning (2). Storm-Data impact (casualty/damage) records; distinct kind from the detection-based lightning dex (resolution B).

That is 15 new kinds, ~1.96M slots, on top of the existing 13.

Judgment calls — RESOLVED (Mike took all four leans, 2026-06-17)

  • A. wind — RESOLVED: one wind kind. Thunderstorm/Marine-Thunderstorm wind (convective gust) and High/Strong wind (synoptic gradient) share one kind; the EVENT_TYPE distinguishes driver as a slot field.
  • B. Lightning reports (18,171) — RESOLVED: own lightning_report kind. Storm-Data impact (casualty/damage) records, distinct from the detection-based lightning dex (untouched).
  • C. Tail granularity — RESOLVED: own kinds. dust (2.3k) and avalanche (869) each get their own kind; the principle is split.
  • D. Tsunami (52) — RESOLVED: inside coastal. Promote to its own kind later if it grows.

Final frozen kind roster — 15 new kinds

The 14 proposed above plus lightning_report (resolution B), since the lightning impact records form their own kind rather than folding into the existing detection dex. Total 15 new kinds, ~1.96M slots.

Build plan (once the map is frozen)

One shared engine, fanned across kinds: pull the cached NCEI details once, group by EVENT_ID (slot granularity — county-segment vs EPISODE_ID storm-system is a downstream Brick-A sub-decision), route each row to its kind by EVENT_TYPE, write per-kind spines into event_storehouse. Same CST→UTC fix and parser as the tornado build. Bricks A (freeze) → B (spine, all kinds) → D (tests).

Live Feed