Listening for events…

Solar Flare + CME Eventdex — FROZEN scope (Brick A, 2026-06-18)

Mike's ruling (2026-06-18): solar flares and CMEs are two distinct phenomena → two kinds, not one lumped flare kind. A flare is a localized electromagnetic X-ray burst (classed A–X), measured by GOES XRS; a CME is a plasma cloud launched into the heliosphere, measured by coronagraph (LASCO C2/C3) in km/s + angular width + propagation direction. They often co-occur but frequently don't. This applies the foundational "each phenomenon = its own category" rule and matches the 2026-06-17 severe-weather split (where lumping was corrected to per-phenomenon kinds).

FROZEN — two new Eventdex kinds: flare and cme. Both single-source (nasa_donki), catalog-only (no terrestrial sensor sweep), built A+B+C+D in one pass (small enough for file-per-slot dossiers, unlike the severe-wx spine-parquet case).

Why a deep backfill (not the staged slice)

The staged nasa_donki rows are real live-fetcher output, but shallow and duplicated: the fetcher re-pulls a rolling 3-day window every tick, so the observations table holds only the recent tail, inflated ~88–110× by re-fetch:

  • donki_flare_class: 7,470 rows → 85 distinct flares (2025-09 → present, ~9 months).
  • donki_cme_speed: 50,716 rows → 462 distinct CMEs (2026-03 → present, ~3 months).

Per the established convention (FEMA, CNEOS, meteorite, GMN spines were all net-new deep API pulls when the staged slice was thin), Brick B does a one-time DONKI API backfill from the catalog start (2010-04) to present, deduped to one slot per activityID. DONKI's startDate/endDate params take any range; pull in yearly (or monthly, if the API caps a range) chunks. Auth: the live fetcher already authenticates against api.nasa.gov/DONKI; the keyless CCMC mirror (kauai.ccmc.gsfc.nasa.gov/DONKI/WS/get/{FLR,CME}) is the fallback if the key rate-limits a bulk pull. Expected spine size: flare ~5–10k, cme ~20–40k (2010–2026, two solar cycles).

Kind 1 — flare

  • Slot = one DONKI FLR flrID (the activityID, e.g. 2026-06-13T12:40:00-FLR-001). One solar flare. Natural, stable, unique per the dedup above.
  • Value = flare class magnitude via FLARE_CLASS_MAP (A=0.1, B=1, C=10, M=100, X=1000 × the decimal), unit = the class string ("X1.0"). Timestamp = beginTime; peak_time, end_time, active_region, instruments in extra.
  • Geometry = heliographic, NOT terrestrial. source_location (e.g. S21W68) is a position on the Sun's disk. Earth latitude/longitude columns are NULL (precedent: cosmic kinds store RA/Dec, not a ground point). Heliographic coords kept in extra. → catalog-only, no sensor sweep (a flare has no Earth ground point; its only Earth detector is GOES XRS, which is the source instrument, not an independent layer).
  • Measured-reality: IN. A flare is a measured EM event (GOES XRS measured the X-ray flux); DONKI FLR is the catalog of those measurements. No model component in the FLR record.
  • kind_subtype = flare class letter (A/B/C/M/X) for downstream filtering.

Kind 2 — cme

  • Slot = one DONKI CME activityID (e.g. 2026-06-18T00:00:00-CME-001). One coronal mass ejection.
  • Value = speed km/s from the most-accurate coronagraph analysis (isMostAccurate, already selected by the live normalizer). half_angle, heliographic propagation latitude/longitude, source_location, active_region, instruments in extra.
  • Geometry = heliographic. Earth lat/lon NULL; coronagraph-derived direction in extra. → catalog-only, no sensor sweep.
  • Measured-reality — the load-bearing line for CME (applies the bright line, no new fork): the CME slot value is the measured coronagraph observation (speed, half-angle, direction, source location), which is IN by the bolide-energy / reentry-observed-decay precedent (measured properties of an event that physically happened). The DONKI CME record also carries a WSA-ENLIL prediction (estimatedShockArrivalTime, predicted_kp) — that is MODEL OUTPUT and is held in extra as clearly-flagged model_context, never the slot's measured value (precedent: bolide measured energy IN, TIP-predicted impact point OUT; reentry observed decay IN, TIP impact prediction OUT). is_earth_directed is ENLIL-derived → kept but flagged model_context, not treated as a measured fact.
  • kind_subtype = earth_directed / not_earth_directed (from the flagged ENLIL field, labeled as such) for downstream filtering.

Cross-match (deferred-v2): flare ↔ cme

DONKI's linkedEvents already pairs a flare with the CME it launched (the FLR record points to its CME-001). That is the natural flare↔cme association. Per the data model it is a read-across, not a merge: each flare and each CME keeps its own slot; the linkage is a cited cross-match (cross_match = deferred-v2-flare-cme), the heliophysics analogue of cosmic-messenger v2 and the FEMA-terrestrial read-across. Not built in this pass.

Optional v2 enrichment (logged, not built)

Cite the measured GOES X-ray peak flux (goes_xray, 58M-row series we already hold) per flare slot — the independent instrument reading behind the DONKI class. Multi-source cited-slot per the universal rule; deferred because DONKI FLR already carries the GOES-derived class.

Storage

Both kinds are small (≤~40k slots each) → standard file-per-slot JSON dossiers in data/event_storehouse/{flare,cme}/ (no severe-wx-style shared-index bloat; tor at 73k is file-per-slot and fine). Full A+B+C+D this pass.

Bricks

  • A — scope freeze (this doc).
  • B — spine: scripts/build_flare_spine.py + scripts/build_cme_spine.py (or one build_donki_spine.py with a --kind switch). Deep DONKI API backfill 2010→present, dedup to distinct activityID, write rosters data/{flare,cme}_spine_roster.parquet. New clean spine sources if the staged nasa_donki ones don't fit the one-slot model (decide at build: likely keep nasa_donki as the live source and add donki_flare / donki_cme spine-source rows, mirroring the FEMA pattern).
  • C — live edge + sweep registration: register both kinds in a flare_cme_sweep.py monitor module (FLARE_CONFIG / CME_CONFIG, radius_km=None, sensor_slugs=() — catalog-only, no sweep fired, GW/uhecr lesson applied up front). The existing nasa_donki live fetcher already pulls FLR
    • CME every tick; add newcomer-skip on activityID so the live edge fills the same slots without duplicating.
  • D — dossiers + tests: backfill per-slot dossiers for both kinds; 1-slot-per-event verification (file count == distinct activityID); unit tests for the dedup, the measured-vs-model CME split, and the heliographic-coords-not-NULL-Earth handling.

Predictions / sanity anchors

  • flare spine: dominated by C/M class (X-class rare); the 4 staged X-flares (incl. recent cycle-25 activity) must survive. Class distribution should track the solar cycle (cycle-25 max ~2024-2025).
  • cme spine: most CMEs not Earth-directed (staged 26/462 ≈ 6% earth-directed); fast halo CMEs (>1000 km/s) are the minority tail.
  • Both: the linked flare↔CME pairs (DONKI linkedEvents) should be non-empty for a meaningful fraction of M/X flares (eruptive flares launch CMEs).
Live Feed