Solar Flare + CME Eventdex — FROZEN scope (Brick A, 2026-06-18)
Mike's ruling (2026-06-18): solar flares and CMEs are two distinct phenomena → two kinds, not one
lumped flare kind. A flare is a localized electromagnetic X-ray burst (classed A–X), measured by
GOES XRS; a CME is a plasma cloud launched into the heliosphere, measured by coronagraph (LASCO
C2/C3) in km/s + angular width + propagation direction. They often co-occur but frequently don't.
This applies the foundational "each phenomenon = its own category" rule and matches the 2026-06-17
severe-weather split (where lumping was corrected to per-phenomenon kinds).
FROZEN — two new Eventdex kinds: flare and cme. Both single-source (nasa_donki),
catalog-only (no terrestrial sensor sweep), built A+B+C+D in one pass (small enough for file-per-slot
dossiers, unlike the severe-wx spine-parquet case).
Why a deep backfill (not the staged slice)
The staged nasa_donki rows are real live-fetcher output, but shallow and duplicated: the fetcher
re-pulls a rolling 3-day window every tick, so the observations table holds only the recent tail,
inflated ~88–110× by re-fetch:
donki_flare_class: 7,470 rows → 85 distinct flares (2025-09 → present, ~9 months).donki_cme_speed: 50,716 rows → 462 distinct CMEs (2026-03 → present, ~3 months).
Per the established convention (FEMA, CNEOS, meteorite, GMN spines were all net-new deep API pulls
when the staged slice was thin), Brick B does a one-time DONKI API backfill from the catalog start
(2010-04) to present, deduped to one slot per activityID. DONKI's startDate/endDate params
take any range; pull in yearly (or monthly, if the API caps a range) chunks. Auth: the live fetcher
already authenticates against api.nasa.gov/DONKI; the keyless CCMC mirror
(kauai.ccmc.gsfc.nasa.gov/DONKI/WS/get/{FLR,CME}) is the fallback if the key rate-limits a bulk
pull. Expected spine size: flare ~5–10k, cme ~20–40k (2010–2026, two solar cycles).
Kind 1 — flare
- Slot = one DONKI FLR
flrID(theactivityID, e.g.2026-06-13T12:40:00-FLR-001). One solar flare. Natural, stable, unique per the dedup above. - Value = flare class magnitude via
FLARE_CLASS_MAP(A=0.1, B=1, C=10, M=100, X=1000 × the decimal),unit= the class string ("X1.0"). Timestamp =beginTime;peak_time,end_time,active_region,instrumentsin extra. - Geometry = heliographic, NOT terrestrial.
source_location(e.g. S21W68) is a position on the Sun's disk. Earthlatitude/longitudecolumns are NULL (precedent: cosmic kinds store RA/Dec, not a ground point). Heliographic coords kept in extra. → catalog-only, no sensor sweep (a flare has no Earth ground point; its only Earth detector is GOES XRS, which is the source instrument, not an independent layer). - Measured-reality: IN. A flare is a measured EM event (GOES XRS measured the X-ray flux); DONKI FLR is the catalog of those measurements. No model component in the FLR record.
kind_subtype= flare class letter (A/B/C/M/X) for downstream filtering.
Kind 2 — cme
- Slot = one DONKI CME
activityID(e.g.2026-06-18T00:00:00-CME-001). One coronal mass ejection. - Value =
speedkm/s from the most-accurate coronagraph analysis (isMostAccurate, already selected by the live normalizer).half_angle, heliographic propagationlatitude/longitude,source_location,active_region,instrumentsin extra. - Geometry = heliographic. Earth lat/lon NULL; coronagraph-derived direction in extra. → catalog-only, no sensor sweep.
- Measured-reality — the load-bearing line for CME (applies the bright line, no new fork): the
CME slot value is the measured coronagraph observation (speed, half-angle, direction, source
location), which is IN by the bolide-energy / reentry-observed-decay precedent (measured properties
of an event that physically happened). The DONKI CME record also carries a WSA-ENLIL prediction
(
estimatedShockArrivalTime,predicted_kp) — that is MODEL OUTPUT and is held in extra as clearly-flaggedmodel_context, never the slot's measured value (precedent: bolide measured energy IN, TIP-predicted impact point OUT; reentry observed decay IN, TIP impact prediction OUT).is_earth_directedis ENLIL-derived → kept but flaggedmodel_context, not treated as a measured fact. kind_subtype=earth_directed/not_earth_directed(from the flagged ENLIL field, labeled as such) for downstream filtering.
Cross-match (deferred-v2): flare ↔ cme
DONKI's linkedEvents already pairs a flare with the CME it launched (the FLR record points to its
CME-001). That is the natural flare↔cme association. Per the data model it is a read-across, not
a merge: each flare and each CME keeps its own slot; the linkage is a cited cross-match
(cross_match = deferred-v2-flare-cme), the heliophysics analogue of cosmic-messenger v2 and the
FEMA-terrestrial read-across. Not built in this pass.
Optional v2 enrichment (logged, not built)
Cite the measured GOES X-ray peak flux (goes_xray, 58M-row series we already hold) per flare
slot — the independent instrument reading behind the DONKI class. Multi-source cited-slot per the
universal rule; deferred because DONKI FLR already carries the GOES-derived class.
Storage
Both kinds are small (≤~40k slots each) → standard file-per-slot JSON dossiers in
data/event_storehouse/{flare,cme}/ (no severe-wx-style shared-index bloat; tor at 73k is
file-per-slot and fine). Full A+B+C+D this pass.
Bricks
- A — scope freeze (this doc).
- B — spine:
scripts/build_flare_spine.py+scripts/build_cme_spine.py(or onebuild_donki_spine.pywith a--kindswitch). Deep DONKI API backfill 2010→present, dedup to distinctactivityID, write rostersdata/{flare,cme}_spine_roster.parquet. New clean spine sources if the stagednasa_donkiones don't fit the one-slot model (decide at build: likely keepnasa_donkias the live source and adddonki_flare/donki_cmespine-source rows, mirroring the FEMA pattern). - C — live edge + sweep registration: register both kinds in a
flare_cme_sweep.pymonitor module (FLARE_CONFIG/CME_CONFIG,radius_km=None,sensor_slugs=()— catalog-only, no sweep fired, GW/uhecr lesson applied up front). The existingnasa_donkilive fetcher already pulls FLR- CME every tick; add newcomer-skip on
activityIDso the live edge fills the same slots without duplicating.
- CME every tick; add newcomer-skip on
- D — dossiers + tests: backfill per-slot dossiers for both kinds; 1-slot-per-event verification (file count == distinct activityID); unit tests for the dedup, the measured-vs-model CME split, and the heliographic-coords-not-NULL-Earth handling.
Predictions / sanity anchors
flarespine: dominated by C/M class (X-class rare); the 4 staged X-flares (incl. recent cycle-25 activity) must survive. Class distribution should track the solar cycle (cycle-25 max ~2024-2025).cmespine: most CMEs not Earth-directed (staged 26/462 ≈ 6% earth-directed); fast halo CMEs (>1000 km/s) are the minority tail.- Both: the linked flare↔CME pairs (DONKI linkedEvents) should be non-empty for a meaningful fraction of M/X flares (eruptive flares launch CMEs).