Volcano Eventdex — Scope + Frozen Settings
Status: rule FROZEN 2026-06-12 (floor decided by Mike: no floor — full catalog, Uncertain included) · Owner: Mike + Claude (engine room)
Parent: docs/event-spine-framework.md (the Eventdex framework; this is kind #5 after tc, gst, eq, and tor)
This document scopes the volcanic-eruption kind of the event storehouse. Per the pre-registration discipline: the settings below get frozen before any backfill and are not tuned afterward. Like eq and tor, no detection rule needs inventing — the spine is an authoritative external catalog — but this kind has two properties no prior kind had: the events are long-lived (onset → end spans days to decades, and most cataloged eruptions have no recorded end), and the history is millennia deep (the catalog reaches the early Holocene via tephrochronology and ice cores).
All coverage claims below were verified against PG16 on 2026-06-12.
The spine
- Catalog: the Smithsonian Global Volcanism Program (GVP) Holocene eruption catalog,
served via WFS GeoServer at
webservices.volcano.si.edu, ingested live assmithsonian_gvp(metricvolcanic_eruption). - Slot ID (proposed): the GVP
eruption_number— GVP's persistent per-eruption identifier, globally unique (no per-year scoping like SPC om numbers). Stability across catalog updates is asserted by GVP and must be verified during the Brick B audit (the tornado kind taught us not to take ID stability on faith). - Per-row riches (verified in
extra_json): volcano name +vnum(GVP volcano number), VEI (ExplosivityIndexMax, present on 79%), start date (intimestamp_utc, day precision where GVP has it — only 95 of 3,369 post-1900 starts sit at the Jan-1 year-only default),end_year(29% — most eruptions have no recorded end), activity type (Confirmed vs Uncertain Eruption), evidence method (observation, tephrochronology, ice core, …), activity area. - Spot checks pass: Krakatau 1883 (VEI 6), Pinatubo 1991 (VEI 6), St. Helens 1980 (VEI 5), Hunga Tonga-Hunga Haʻapai 2021-12 (VEI 5) all present with correct start dates and VEI.
Local-copy defects (found in the Brick A survey, fix = Brick B reload)
maxFeatures: 5000cap is binding. Every fetch returns exactly 5,000 eruption rows; GVP's full Holocene eruption catalog is ~10k records. The local copy holds 7,770 distinct eruptions only because the server's return order varied across ticks — i.e., coverage is incomplete and luck-sampled. Brick B must pull the full catalog via WFSstartIndexpaging and reload wholesale.- BCE timestamps are broken. All 1,427 BCE eruptions carry fetch-time timestamps
(the normalizer cannot represent BCE dates and falls back to "now"). The reload must
key slot origin on the catalog's start_year/month/day fields, not on
timestamp_utc. - End-date precision dropped. The fetcher captures
EndDateYearonly; GVP servesEndDateMonth/EndDateDaytoo. The reload pulls all three. - Row duplication. 119,971 rows for 7,770 eruptions (~15× — append per tick, same
pattern as spc_reports). The spine extract deduplicates by
eruption_number, latest fetch wins.
The live edge
- The spine updates itself fast: GVP adds new confirmed eruptions within days-to-weeks (Lewotobi's 2026 eruption, onset 2026-03-02, is already cataloged with an eruption number). The live sweep runs on catalog newcomers — no separate product needed.
usgs_volcanoesalert feed (live, US volcanoes only, joins to the spine onvnum): alert-level history 2009 → present (53,668 rows; NORMAL/ADVISORY/WATCH/WARNING with color codes and observatory). Used as a context layer in dossiers, not a spine.- No provisional tier (proposed). Unlike tornadoes (preliminary reports vs annual surveyed file), GVP eruption numbers appear fast enough that a provisional slot model buys little and adds a retirement mechanism. Considered and deferred; revisit if a major eruption ever goes uncataloged for >30 days.
The decision: floor
Counts from the local spine (incomplete — full-catalog counts will be ~25% higher after the Brick B reload; relative proportions should hold):
| Floor | Slots (local) | Note |
|---|---|---|
| None (full catalog) | 7,770 | includes 809 Uncertain + 1,427 BCE paleo records |
| Confirmed only | 6,961 | drops Uncertain (the catalog's own flag) |
| CE era (year ≥ 1) | 6,343 | drops the deep-paleo tail |
| Since 1900 | 3,369 | the well-observed era |
| VEI 2+ | 4,196 | drops VEI 0–1 + the 1,618 VEI-unknowns |
| VEI 4+ ("large explosive") | 445 | Pinatubo-class and up |
Considerations:
- The whole catalog is smaller than the TC kind (13,544). Slot spam is not a failure mode here at any floor — every row is a vetted GVP catalog entry.
- VEI is unknown on 21% of eruptions; any VEI floor silently drops those.
- BCE/paleo slots will never accumulate sensor hits (no instrument layer reaches that era), but they are full slots like any other — complete spine metadata, an empty sensor section — and they are real measured events (tephra layers, ice cores) costing one JSON file each. ("Index-only" below means exactly this: a full slot whose sweep finds nothing because no sensor layer covers its era — never a lesser slot.)
- Uncertain Eruptions are the catalog's own honesty flag — the EFU analog. If included, the dossier carries the flag verbatim.
DECISION (Mike, 2026-06-12, FROZEN): no floor — every eruption in the GVP catalog gets a slot, Uncertain Eruptions included and flagged verbatim. The catalog is the floor, matching the tornado precedent. ~10k slots after the full-catalog reload, growing ~70/yr.
The sweep (per-kind settings, to freeze with the floor)
- Geometry: fixed point (the volcano edifice; eruptions don't move). One fix per event — simplest geometry of any kind yet.
- Radius: 100 km flat. Eruption phenomenology is local: precursor seismicity within tens of km of the edifice, volcanic lightning in the plume, ashfall warnings downwind.
- Window:
[onset − 14 d, onset + 7 d], onset-anchored. Pre-window catches the precursor seismicity ramp (days-to-weeks in the literature); post-window catches the opening phase. The full eruption duration is NOT swept — an eruption ongoing since 1934 cannot be swept wholesale, and an onset-anchored window is the honest, bounded rule. Eruptions with year-only start precision get index-only slots (no sweep — a ±10 d window around a fictitious Jan-1 anchor would be noise dressed as data). - Sweep layers (verified live 2026-06-12, with local depth):
usgs_earthquake— seismicity near the edifice, 1973 → present, 203k rows. The classic precursor measurement, and the first sweep layer in any kind with real multi-decade depth: volcano dossiers back to 1973 can carry actual sensor hits.blitzortung_lightning— volcanic lightning, ground network (2026-04-20+, 22.7M rows)goes18_glm_flashes/goes19_glm_flashes— satellite optical flashes, Americas coverage (2026-04-22+)nws_alerts— ashfall/volcano advisories, US only (2021+; known gap: no 2025 rows)igra_soundings— profile sensor, launch-collapsed, plume-environment context (2024-12-31+)usgs_volcanoes— alert-level history joined onvnum(2009+, US only; context layer in the dossier, not a spatial sweep)
- Pre-1973 slots are index-only by construction; 1973+ slots get seismicity depth; 2026+ gets the full lightning stack. The slots exist; the data grows into them.
Dossier shape
Same pattern as eq/tor: per-sensor aggregates + hits parquet for the spatially-swept
layers. Spine metadata in the slot: volcano name, vnum, VEI, activity type (confirmed/
uncertain), evidence method, start/end dates with precision flags, country/region,
alert-level series where US. Kind directory: data/event_storehouse/vol/.
Build bricks
- Brick A — freeze. DONE 2026-06-12: no floor (full GVP catalog, Uncertain included and flagged), slot-ID + sweep settings frozen above.
- Brick B — spine reload. DONE 2026-06-12: full-catalog pull via
scripts/reload_volcano_spine.py— 11,089 eruptions (exactly the server's numberMatched) + 1,215 volcanoes, replacing 163,725 duplicated rows with 12,304 clean ones. Audit: all 7,770 old eruption IDs present in the new catalog (GVP eruption numbers are stable across pulls — the SPC failure mode does NOT apply here), 3,319 newly-gained eruptions; Krakatau 1883-05-20 / Pinatubo / Hunga Tonga / Tambora 1812 (VEI 7) spot-checked; 2,315 BCE rows clamped to year 1 withbceflag + authoritativestart_year(9 carry real month/day from classical sources, e.g. Etna 44 BC); precision split 6,841 day / 180 month / 4,068 year; end-date month/day now captured (4,377 rows); country joined on 11,045 (44 eruptions reference volcanoes absent from the Holocene volcano list). A fifth defect found and fixed during the build: the normalizer clamped start days to 28, shifting eruptions that began on the 29th-31st. Live fetcher now sorts newest-first so newcomers always fit the 5,000 cap. Server gotcha (recorded for future GVP pulls): the Smithsonian WAF persistently 403s WFS requests withstartIndex≥ ~4000 and 403s python HTTP clients on any paged request — the reload pages via curl + keyset pagination (cql_filter=Eruption_Number > last), with a disk page cache. - Brick C — kind registration + live sweep. DONE 2026-06-12:
vol_sweep.pyKindConfig + 30-min scheduler job beside the other four kinds. Newcomer detection keys on eruption_number missing a dossier (newest-500 probe), NOT an onset window — GVP's cataloging lag was measured at ~3 months on build day (newest cataloged onset 94 days old), so any onset-lookback rule eventually misses the catalog frontier; trailing 180-d onset window additionally refreshes accruing sweeps. First tick: 500 dossiers (428 swept / 72 index-only year-precision+BCE), 239 sensor hits; Ahyi seamount 2024 top card (21 quakes M4.0-5.0, nearest 20.0 km — submarine-eruption swarm). US alert-level series joined on vnum, collapsed to level changes. Implementation note: vol PG sessions run in UTC — the year-1-clamped BCE timestamps shift into year 0 in a non-UTC session and break python's datetime. - Brick D — dossier backfill. DONE 2026-06-12: full catalog via
scripts/backfill_vol_dossiers.py— 11,089 dossiers in 379 s (one BCE batch, century windows 1–1900, yearly 1900+; windowed by catalogstart_year, BCE-safe). Audit reconciles exactly against the Brick B spine: precision 6,841 day / 180 month / 4,068 year; swept 7,012 = 7,021 day+month minus the 9 BCE classical-source rows (frozen rule applied); 4,077 index-only; 2,315 BCE; 1,171 Uncertain flagged. 140 dossiers carry sensor hits; Ahyi 2024 most (21 quakes). Verified non-bug: Pinatubo 1991 (GVP onset 1991-04-02, the precursor phase) sweeps clean because localusgs_earthquakehistorical depth is M4.5+ only (242 global rows in 1991) and the precursor swarm sat below that floor — deep-history hits require M4.5+ swarms; 2021+ years hit more via the lower live-feed floor + alert/lightning layers. - First report: deferred. Engine room, not paper mode.