Listening for events…

Earthquake Eventdex — Scope + Frozen Settings

Status: rule FROZEN 2026-06-11 (floor decided by Mike; probe counts recorded below) · Owner: Mike + Claude (engine room) Parent: docs/event-spine-framework.md (the Eventdex framework; this is kind #3 after tc and gst)

This document scopes the earthquake kind of the event storehouse. Per the pre-registration discipline: the settings below get frozen before any backfill and are not tuned afterward. Unlike the geomagnetic-storm kind, no detection rule needs inventing — the spine is an authoritative external catalog with native event IDs — so the freezable surface is smaller: the magnitude floor, the dedup rule, and the sweep geometry.

All coverage claims below were verified against PG16 on 2026-06-11.


The spine

  • Catalog: USGS ComCat, via the existing usgs_earthquake fetcher (live; local rows since 2021-04-12). Event IDs are ComCat preferred IDs (e.g. us7000srxn), carried in extra_json->>'event_id'. These are the addressable, revisitable IDs the slot model needs.
  • One spine, one catalog. emsc, isc, and gfz_geofon (all live since 2026-03-31) are corroboration sweep layers, not spines — same principle as Dst-vs-Kp in the GST scope: one rule, one catalog, no dual-definition ambiguity.
  • Dedup (frozen): the feed re-inserts events on revision, so one event = one slot via DISTINCT ON (extra_json->>'event_id') ... ORDER BY id DESC — latest insert wins, which is exactly how magnitude revisions propagate into the slot. This is the platform's existing canonical quake-dedup pattern.
  • Provisional → definitive is native: ComCat's status field moves automaticreviewed. The dossier stores it; re-sweeps upgrade in place. No second feed needed — the spine and the live edge are the same source, already polling.
  • Known wrinkle, recorded up front: ComCat occasionally merges or deletes events (duplicate detections consolidated under the preferred ID). The storehouse keeps any orphaned slot on disk (slots are never deleted by automation) and the index rebuild simply reflects what ComCat currently asserts on the next backfill pass. If an orphan matters, it gets investigated by hand.

History depth

ComCat is queryable to 1900, but global completeness is honest only from the NEIC PDE era: 1973 → present is the backfill window for any floor at M5.5 or above. (M7+ is arguably complete deeper; if we ever want the 1900–1972 great quakes, that is a separate, explicitly scoped extension — not part of this freeze.)

Spine pull: windowed requests against the ComCat fdsnws archive endpoint — the same endpoint and cache pattern the earthquake-lights workspace already used (comcat_cache/). Idempotent, yearly windows, setsid+nohup, gap-audited after completion (the Dst backfill's empty-200 lesson applies to any archive endpoint).

The decision: magnitude floor

The floor decides whether this Eventdex has thousands of meaningful cards or hundreds of thousands of empty ones. Rates below are from our own deduplicated feed, Jan–May 2026 (annualized), cross-checked against NEIC long-term averages (in parentheses):

Floor Our 2026 rate (NEIC avg) Slots, 1973→present (est.)
M6.5+ ~50/yr (~50/yr) ~2,600
M6.0+ ~115/yr (~150/yr) ~6,000–8,000
M5.5+ ~415/yr (~450/yr) ~22,000–25,000
M4.5+ ~5,000/yr (~5,000/yr) ~250,000 — rejected, slot spam

DECISION (Mike, 2026-06-11, FROZEN): M6.0 global, plus a US lower floor of M4.5. Probe counts from the ComCat count endpoint, 1973-01-01 → 2026-06-11, pulled before freeze:

Component Definition Events
Global M ≥ 6.0, worldwide 7,458
CONUS 4.5 ≤ M < 6.0, lat 24.5–49.5, lon −125 – −66.5 1,938
Alaska 4.5 ≤ M < 6.0, lat 51–72, lon −170 – −129 4,194
Hawaii 4.5 ≤ M < 6.0, lat 18.5–22.5, lon −160.5 – −154.5 233
Total 13,823

The US lower floor is defined by the three bounding boxes above — box-only, no contributor-network clause (simpler, and freezable exactly). Noted honestly at freeze time: Alaska dominates the US add, and Aleutian events sit far from gauge density, so many of those cards will be index-thin. That is the frozen rule working as stated, not a defect. Total catalog size ≈ the 13,544-storm TC Eventdex; ongoing rate ≈ 260 events/yr (≈ 140 global + 120 US).

Original rationale (pre-decision):

  • M6.0 global lands the catalog at TC scale (~8k slots vs 13.5k storms) — every event big enough that a sweep layer plausibly registers it.
  • The US lower floor is where our sensors actually live: usgs_water stream gauges and wells (hydroseismic responses are real and documented), noaa_tides coastal gauges — both US-dense. A California M4.8 within 200 km of fifty gauges is a richer dossier than an M6.2 in the empty South Pacific. Estimated additional volume: low hundreds per year — exact count to be pulled from the ComCat count endpoint at the probe step, before freezing.
  • Whatever floors are chosen, they freeze with this doc. A famous quake under the floor does not get grandfathered in; a boring one above it does not get dropped.

(The contributor-network variant considered in the draft was dropped at freeze time in favor of the box-only definition above.)

The sweep (per-kind settings)

  • Geometry: point event — single origin (lat, lon, time), not a track. One sweep per event, radius tiered by magnitude (frozen):
    • M < 6.5 → 500 km
    • 6.5 ≤ M < 7.5 → 1,500 km
    • M ≥ 7.5 → 5,000 km (tsunami/basin scale)
  • Window: [origin − 6 h, origin + 48 h]. Pre-window establishes the sensor baseline; post-window catches tsunami arrivals (hours across a basin), seiches, and hydroseismic well/gauge responses that persist for hours to days.
  • Sweep layers (verified live 2026-06-11, with local depth):
    • noaa_tides — tsunami / seiche signatures (since 2025-09-15)
    • usgs_water — hydroseismic gauge and well responses (since 2025-09-16)
    • intermagnet — co-seismic magnetic anomalies, near-field (since 2026-04-12)
    • emsc / isc / gfz_geofon — independent catalog corroboration (since 2026-03-31)
    • igra_soundings — profile sensor, launch-collapsed, mostly context (since 2024-12-31)
  • Consequence, stated up front: pre-2025 events are index-only by construction (spine metadata, no sensor hits) — exactly like the pre-2024 TC dossiers and pre-2025 GST events. The slots exist; strategic fills come later.
  • Aftershocks are not clustered. One cataloged event = one slot. Mainshock/aftershock association is an interpretive (model-adjacent) step; the Eventdex stores what the catalog measured. A famous sequence simply appears as many cards close in space and time — a query, not a data structure.

Dossier shape

Same pattern as gst: per-sensor, per-metric aggregates plus a hits parquet only for the spatially-swept layers (the TC pattern — hit rows are meaningful when distance matters). Spine metadata in the slot: magnitude, depth_km, place, status, tsunami flag, sig, network, ComCat URL. Kind directory: data/event_storehouse/eq/, event_id = ComCat preferred ID verbatim.

Build bricks

  • Brick A — freeze. DONE 2026-06-11: floor decided (M6.0 global + M4.5 US boxes), probe counts recorded above, doc FROZEN.
  • Brick B — spine backfill. DONE 2026-06-11: windowed ComCat archive pull 1973 → present at the frozen floor (scripts/backfill_eq_spine.py), 12,379 inserted, 0 failed windows, gap audit clean (one floor-crossing magnitude revision found and patched; periodic reconciliation pass is an open follow-up).
  • Brick C — kind registration + live sweep. DONE 2026-06-11: eq_sweep.py KindConfig + scheduler job beside storm_sweep/gst_sweep, 30-min tick, refreshing on revision.
  • Brick D — dossier backfill. DONE 2026-06-11: scripts/backfill_eq_dossiers.py, 13,839 dossiers in 456 s (1970s 1,541 / 1980s 2,784 / 1990s 2,583 / 2000s 2,544 / 2010s 2,720 / 2020s 1,667; tiers 500 km 11,533 / 1,500 km 2,072 / 5,000 km 234). 192 events carry sensor hits (the 2025+ tail), pre-2025 index-only as stated above.
  • First report: deferred. Engine room, not paper mode.
Live Feed