Dex Coverage Map
The finish line for "dex everything we already hold." Mike's rule (2026-06-16): organize every phenomenon already ingested into the platform into a clean Eventdex/Yeardex kind before adding new data sources. New sources are easy to add and pile up forever; the value is getting the data we already hold into per-phenomenon slots papers can draw from.
Built to the canonical model: docs/dex-data-model.md (one slot = one event or one year;
per-phenomenon parallel lists; measured reality only). See also docs/event-spine-framework.md,
docs/yeardex-framework.md, and memory feedback_organize_not_process,
feedback_measured_reality_only.
This is a burn-down checklist, not a vibe. Update it as kinds land.
Snapshot (2026-06-16)
- 11 Eventdex kinds dexed:
entry25,536 ·eq13,842 ·fema5,191 ·gst1,147 ·gw431 ·neutrino348 ·tc13,545 ·tor73,634 ·uhecr109 ·vol11,089. (Plus thecosmic_coincidencescross-match read-across, which is downstream, not a kind.) - 2 Yeardex kinds dexed:
landuse16 year-slots ·dairy28 year-slots. - Platform holds 367 active datasources with observations. Most are the AutoSense research tail
(
dataset-*,5xxx-*,hourly-precip-*,datasets-*-cdls), which is mostly statistical/research — year-shaped at best, much of it not discrete events. The genuine remaining dex work is bounded: ~8 event kinds + a Yeardex batch, below.
Done
- eq — earthquakes (
usgs_earthquake;emsc/isc/gfz_geofonare additional catalogs that could later merge into the same slots, cited). - tor — tornadoes (
spc_tornado_history, enriched withstormevents-csvfilesNCEI narratives). - vol — volcanic eruptions (
smithsonian_gvp;usgs_volcanoesmonitoring adjacent). - entry — atmospheric entries / fireballs (
gmn_meteors,cneos_fireballs,meteorite_falls,sat:). - gw — gravitational waves (
gwosc_events). - neutrino — high-energy neutrinos (
icecube_neutrinos, IceCat-1). - uhecr — ultra-high-energy cosmic rays (
auger_uhecr). - gst — geomagnetic storms (Dst-derived).
- tc — tropical cyclones (
ibtracs_hurricanes,nhc_storms). - fema — US disaster declarations (
fema_disasters). - landuse (Yeardex) — US major land uses (
usda_landuse, 1945–2017). - dairy (Yeardex) — US dairy supply (
usda_dairy, 1998–2025).
Remaining — event-shaped, measured, ingested, NOT yet dexed
Ordered roughly largest/most-distinct phenomenon first. Each gets its own scope freeze (Brick A) before building; Mike rules the load-bearing categorization calls.
- lightning ← NEXT (first pick, Mike 2026-06-16) —
blitzortung_lightning(24M) +goes19_glm_flashes(17M) +goes18_glm_flashes(10M). The single largest undexed phenomenon; one strike/flash = one event. Heaviest build (~51M raw across three sensors). Brick A opens on the slot-granularity ruling: per-strike vs clustered flash-cell/storm-cell — per-strike at 51M is a different storage profile than any existing kind (largest istor73k); clustering to cells needs a space+time window rule. Mike's call; freeze before building. - severe_wx (non-tornado) —
stormevents-csvfiles/spc_reports: hail / wind / flash flood / lightning report / winter. The deferred half of the tornado-enrich build; tornadoes already carved off intotor. - flare — solar flares + CMEs (
nasa_donki,goes_xray). Discrete space-weather events; distinct from thegststorm kind (driver vs effect). - neo — near-Earth-object close approaches (
nasa_neo280k,jpl_sbdb). - transient — astronomical transients / alerts (
fink_transients). - launch — orbital rocket launches (
launch_library). - nuforc — UFO sighting reports (
nuforc80k). Already used in the earthquake-lights paper; no kind of its own yet. - alert — NWS alerts (
nws_alerts819k). Its own kind (currently only borrowed to enrichtor).
Remaining — year-shaped, NOT yet dexed (Yeardex batch)
- ghg_inventory — US GHG inventory (
ghg-inv, EPA GHGdataset-*family, 1990–latest). - power_emissions — electric-power emissions by state (
egrid-*,dataset-annual-u-s-electric-power-industry-...). - edgar / carbon — global emissions grids (
edgar,carbon-emissions) — verify measured-vs-modeled before dexing. - energy_stats — SEDS / RECS energy consumption (
dataset-state-energy-data-system-seds,dataset-residential-energy-consumption-survey-recs-*). - usda_ag — USDA agricultural
5xxxtables beyond dairy/landuse (subject-scoped per Mike; several are already year-shaped statistical series).
Held OUT — measured-reality bright line (NOT dexed, stay live as platform sources)
Per feedback_measured_reality_only: model/forecast output is OUT of the dex even when the source is
live. Do not dex:
cems-glofas,cems-glofas-ra-streamflow-analysis,copernicus EFAS(flood forecast).open_meteo,open_meteo_marine,open_meteo_aqi,open_meteo_flood(weather/flood forecast).era5_cloudand any reanalysis/objective-analysis grid.
These remain ingested and queryable as platform sources; they are simply not eligible for an Eventdex/Yeardex kind.
Not dex material (for reference)
- Continuous monitoring time series with no discrete-event structure (
igra_soundings,dscovr_solar_wind,usgs_water,noaa_tides,intermagnet,silso_sunspots,nmdb_cosmic_rays, radiation monitorssafecast/radnet/radnet-EU) feed papers directly as series; they are not event lists. (Some, e.g.goes_xray, double as a flare-event source — dex the events, leave the raw series.) - The bulk AutoSense research tail (
dataset-*EPA/DOE/USGS research datasets,datasets-*-cdlscropland rasters,hourly-precip-3240-*) is the curated research backlog (project_curated_sources): sort year-shaped ones into the Yeardex batch as they become relevant, leave the rest as the backlog they are. Nothing uploaded is wasted; not all of it is a dex kind.
Rule
When unsure how a phenomenon should be categorized (its own kind? merge into an existing slot?
event vs year?), ask Mike — he holds the mental model. Build every kind to
docs/dex-data-model.md.