Listening for events…

Dex Coverage Map

The finish line for "dex everything we already hold." Mike's rule (2026-06-16): organize every phenomenon already ingested into the platform into a clean Eventdex/Yeardex kind before adding new data sources. New sources are easy to add and pile up forever; the value is getting the data we already hold into per-phenomenon slots papers can draw from.

Built to the canonical model: docs/dex-data-model.md (one slot = one event or one year; per-phenomenon parallel lists; measured reality only). See also docs/event-spine-framework.md, docs/yeardex-framework.md, and memory feedback_organize_not_process, feedback_measured_reality_only.

This is a burn-down checklist, not a vibe. Update it as kinds land.

Snapshot (2026-06-16)

  • 11 Eventdex kinds dexed: entry 25,536 · eq 13,842 · fema 5,191 · gst 1,147 · gw 431 · neutrino 348 · tc 13,545 · tor 73,634 · uhecr 109 · vol 11,089. (Plus the cosmic_coincidences cross-match read-across, which is downstream, not a kind.)
  • 2 Yeardex kinds dexed: landuse 16 year-slots · dairy 28 year-slots.
  • Platform holds 367 active datasources with observations. Most are the AutoSense research tail (dataset-*, 5xxx-*, hourly-precip-*, datasets-*-cdls), which is mostly statistical/research — year-shaped at best, much of it not discrete events. The genuine remaining dex work is bounded: ~8 event kinds + a Yeardex batch, below.

Done

  • eq — earthquakes (usgs_earthquake; emsc/isc/gfz_geofon are additional catalogs that could later merge into the same slots, cited).
  • tor — tornadoes (spc_tornado_history, enriched with stormevents-csvfiles NCEI narratives).
  • vol — volcanic eruptions (smithsonian_gvp; usgs_volcanoes monitoring adjacent).
  • entry — atmospheric entries / fireballs (gmn_meteors, cneos_fireballs, meteorite_falls, sat:).
  • gw — gravitational waves (gwosc_events).
  • neutrino — high-energy neutrinos (icecube_neutrinos, IceCat-1).
  • uhecr — ultra-high-energy cosmic rays (auger_uhecr).
  • gst — geomagnetic storms (Dst-derived).
  • tc — tropical cyclones (ibtracs_hurricanes, nhc_storms).
  • fema — US disaster declarations (fema_disasters).
  • landuse (Yeardex) — US major land uses (usda_landuse, 1945–2017).
  • dairy (Yeardex) — US dairy supply (usda_dairy, 1998–2025).

Remaining — event-shaped, measured, ingested, NOT yet dexed

Ordered roughly largest/most-distinct phenomenon first. Each gets its own scope freeze (Brick A) before building; Mike rules the load-bearing categorization calls.

  • lightningNEXT (first pick, Mike 2026-06-16)blitzortung_lightning (24M) + goes19_glm_flashes (17M) + goes18_glm_flashes (10M). The single largest undexed phenomenon; one strike/flash = one event. Heaviest build (~51M raw across three sensors). Brick A opens on the slot-granularity ruling: per-strike vs clustered flash-cell/storm-cell — per-strike at 51M is a different storage profile than any existing kind (largest is tor 73k); clustering to cells needs a space+time window rule. Mike's call; freeze before building.
  • severe_wx (non-tornado) — stormevents-csvfiles / spc_reports: hail / wind / flash flood / lightning report / winter. The deferred half of the tornado-enrich build; tornadoes already carved off into tor.
  • flare — solar flares + CMEs (nasa_donki, goes_xray). Discrete space-weather events; distinct from the gst storm kind (driver vs effect).
  • neo — near-Earth-object close approaches (nasa_neo 280k, jpl_sbdb).
  • transient — astronomical transients / alerts (fink_transients).
  • launch — orbital rocket launches (launch_library).
  • nuforc — UFO sighting reports (nuforc 80k). Already used in the earthquake-lights paper; no kind of its own yet.
  • alert — NWS alerts (nws_alerts 819k). Its own kind (currently only borrowed to enrich tor).

Remaining — year-shaped, NOT yet dexed (Yeardex batch)

  • ghg_inventory — US GHG inventory (ghg-inv, EPA GHG dataset-* family, 1990–latest).
  • power_emissions — electric-power emissions by state (egrid-*, dataset-annual-u-s-electric-power-industry-...).
  • edgar / carbon — global emissions grids (edgar, carbon-emissions) — verify measured-vs-modeled before dexing.
  • energy_stats — SEDS / RECS energy consumption (dataset-state-energy-data-system-seds, dataset-residential-energy-consumption-survey-recs-*).
  • usda_ag — USDA agricultural 5xxx tables beyond dairy/landuse (subject-scoped per Mike; several are already year-shaped statistical series).

Held OUT — measured-reality bright line (NOT dexed, stay live as platform sources)

Per feedback_measured_reality_only: model/forecast output is OUT of the dex even when the source is live. Do not dex:

  • cems-glofas, cems-glofas-ra-streamflow-analysis, copernicus EFAS (flood forecast).
  • open_meteo, open_meteo_marine, open_meteo_aqi, open_meteo_flood (weather/flood forecast).
  • era5_cloud and any reanalysis/objective-analysis grid.

These remain ingested and queryable as platform sources; they are simply not eligible for an Eventdex/Yeardex kind.

Not dex material (for reference)

  • Continuous monitoring time series with no discrete-event structure (igra_soundings, dscovr_solar_wind, usgs_water, noaa_tides, intermagnet, silso_sunspots, nmdb_cosmic_rays, radiation monitors safecast/radnet/radnet-EU) feed papers directly as series; they are not event lists. (Some, e.g. goes_xray, double as a flare-event source — dex the events, leave the raw series.)
  • The bulk AutoSense research tail (dataset-* EPA/DOE/USGS research datasets, datasets-*-cdls cropland rasters, hourly-precip-3240-*) is the curated research backlog (project_curated_sources): sort year-shaped ones into the Yeardex batch as they become relevant, leave the rest as the backlog they are. Nothing uploaded is wasted; not all of it is a dex kind.

Rule

When unsure how a phenomenon should be categorized (its own kind? merge into an existing slot? event vs year?), ask Mike — he holds the mental model. Build every kind to docs/dex-data-model.md.

Live Feed