Lightning Eventdex Kind — Scope + Frozen Settings (Brick A)
Status: FROZEN 2026-06-17 · Owner: Mike + Claude (engine room) ·
Kind: lightning (12th Eventdex kind) · Parent: docs/event-spine-framework.md,
docs/dex-data-model.md, docs/dex-coverage-map.md (lightning = the locked first pick).
The largest measured-reality phenomenon TerraPulse holds (~52.5M raw detections), organized into storm-cell event slots. Built to the dex data model: one slot = one event (here, one storm-cell lightning episode); per-phenomenon parallel list; measured reality only.
The slot — LOAD-BEARING, RULED BY MIKE 2026-06-17: lightning cluster, flash-counted, ≥30 floor
One slot = one lightning cluster (the proper meteorological term, verified 2026-06-17), not one strike and not one flash. A single strike/flash is the wrong unit to compare against an eruption or an earthquake (the way a single seismic wiggle is the wrong unit for a quake); the cluster is the event, and the strikes are the raw measured series that compose it. The flash count, peak rate, and footprint are the cluster's parameter-bundle fields (like a tornado's EF rating and track), not its identity. The raw strokes/flashes stay queryable in PostgreSQL — nothing is discarded.
Terminology (verified 2026-06-17). A flash is the consensus discharge unit. Above it the
meteorological hierarchy is cell (smallest convective unit, tens of km, ~30 min) → multicell
cluster (several cells, hours) → mesoscale convective system / MCS (hundreds of km, hours).
The grouping operation we run produces a cluster at whatever scale the lightning organises into
— a lone bolt is a singleton cluster, a squall line is a large cluster — so "cluster" is the honest
scale-agnostic name; each slot's extent and duration fields say where on the cell→cluster→MCS
ladder it actually sits. ("Episode" is a NOAA Storm-Events database convention, NOT the right
meteorological term; "cell" is wrong at the large end. Rejected.)
Grouping = natural single-link, no storm-segmentation model (Option A, Mike 2026-06-17). We do
NOT impose a radar-style storm-cell segmentation (that is processing, not organising). We take the
clusters single-link clustering naturally yields, accept that a continuous squall line legitimately
chains into one large cluster (its extent/duration fields disclose this so a paper can subdivide
if it ever needs to), and keep the unit honest.
≥30-flash floor for a full dossier slot (Mike 2026-06-17). A cluster earns a full dossier only if its total flash activity (consensus Blitzortung flashes + summed GLM flash counts) is ≥ 30. This keeps the slot count meaningful (a 1–2 flash cluster is not a storm) and in range: validation on a peak-season day (2026-06-11) projects to roughly 150k slots over the ~2-month window at this floor (versus ~1.1M with no floor). Sub-floor clusters are NOT dossiered — EXCEPT rogue candidates (see rogue preservation): an isolated/singleton sub-floor cluster is kept as a flagged lightweight sidecar record so a bolt-from-the-blue is never lost. All raw detections remain in PostgreSQL regardless.
Counting flashes — the CONSENSUS layer (agency-backed, cited)
The cell's intensity is counted in flashes, the scientifically-agreed unit of one discharge, using the established grouping criteria (verified 2026-06-17):
- Blitzortung (ground VLF/LF point detections): group detections into flashes by the long-standing
ground-network standard, strokes within 10 km and 1 s of the first, ≤0.5 s between consecutive
strokes (Cummins/Murphy-lineage; "the most used criteria for over twenty years",
ScienceDirect S0169809519313304). Yields
blitzortung_flash_count; the raw count isblitzortung_stroke_count. - GOES GLM (gridded 0.25°×1-min flash counts): flashes are already grouped upstream by NOAA's
Lightning Cluster-Filter Algorithm (16.5 km / 330 ms, GOES-R GLM ATBD v3.0). We sum the
grid-minute counts; no regrouping. Yields
glm19_flash_count,glm18_flash_count.
Honest non-blend: a GLM optical flash (total lightning, includes intracloud) is NOT the same unit as a Blitzortung VLF flash (mostly cloud-to-ground). We do not sum them into one blended "true" count. Each sensor's flash count is reported and cited separately on the slot; cross-sensor agreement is a recorded property, not a merge.
Building clusters — the DECLARED layer (no agency consensus, pre-registered)
There is no NOAA or community standard for grouping flashes into a cluster (consensus stops at the flash). The cluster boundary is therefore a declared organizing rule, pre-registered here like the tornado-match tolerance, and explicitly labeled as a choice, not a standard:
- A flash/grid-count joins a cluster if it is within
Dkm andTminutes of any flash already in the cluster (transitive chaining = spatiotemporal single-link clustering,minPts=1). - FROZEN default:
D = 15 km,T = 15 min(mid-range of lightning-nowcasting practice). - Pre-registered sensitivity: rerun at (10 km/10 min) and (20 km/20 min); report cluster-count and headline stability. A finding that survives all three is reportable; one that flips is flagged.
- Clustering runs on the pooled flash set across all three sensors, so one storm seen by ground and space becomes one cluster with per-sensor counts.
- Single-link chaining is accepted, not fought (Option A): a continuous squall line legitimately
becomes one large cluster; its
extent/durationfields disclose its scale (cell → multicell → MCS). We impose no storm-segmentation model.
Multi-sensor cited slot + cross-sensor flag
Per the universal multi-source-cited-slot rule: one cell, every contributing sensor cited. The slot
records sensors (which of the three contributed) and cross_sensor (ground_only / space_only /
both). A cell over Europe is Blitzortung-only; a cell over the US can carry all three. Sensor
provenance is honest per cell; no artificial geographic cut (measured reality, no spiking).
Measured-reality — IN
Every detection is a sensor recording a real electrical discharge that physically happened (ground VLF time-of-arrival, or GLM optical). No model output, no forecast. Squarely IN under the bright line ([[feedback_measured_reality_only]]).
The three layers (verified in PG16, 2026-06-17)
| Layer | Source slug | Shape | Coverage |
|---|---|---|---|
| Ground VLF | blitzortung_lightning (24.8M) |
point detections, sub-second, lat/lon precise; extra_json has pol, n_stations, quality |
global, 2026-04-20 → present |
| GOES-East | goes19_glm_flashes (17.7M) |
gridded 0.25°×1-min flash count (val); NOAA LCFA upstream |
Americas, 2026-04-22 → present |
| GOES-West | goes18_glm_flashes (10.0M) |
gridded 0.25°×1-min flash count; NOAA LCFA upstream | Americas, 2026-04-22 → present |
Coverage note: all three are live streaming feeds, ~2 months deep and growing daily — unlike the historical catalogs (eq/tor/vol). The lightning dex grows with the feed; no floor (take all available coverage). It will never have a 1950 backfill because the feed does not.
What a cluster slot holds (the parameter bundle)
kind: "lightning", slot id,provisionalflag.- Time:
start_utc,peak_utc(minute of max flash rate),end_utc,duration_min. - Location:
centroid_lat/centroid_lon, footprint (bbox+area_km2),extent_km(max span — discloses cell vs MCS scale),total_flash_activity(the ≥30 gate value). - Intensity, per sensor (never blended):
blitzortung_flash_count,blitzortung_stroke_count,glm19_flash_count,glm18_flash_count,peak_flash_rate_per_min,glm_peak_grid_minute_count(busiest 0.25°×1-min cell; observed up to 118/175 flashes). - Polarity: NOT AVAILABLE — verified 2026-06-17 that Blitzortung
extra_json.polis0for all 24.8M rows (field not populated in this feed). No CG polarity breakdown; do not promise one. (Themds/mcgfields are signal metrics, not calibrated peak current;valueis null.) sensors,cross_sensor;sourcescited (3 sources + duckdb refs).- Rogue preservation (see section below):
n_flashes,isolation_km,rogue_flags(e.g.singleton/isolated/high_intensity), all spatial-only candidates, never certified.
Rogue preservation — the tail must not be organized away (Mike 2026-06-17)
Rogue lightning is real and is exactly the kind of thing a paper wants: superbolts (rare, hundreds-of-times-normal energy, the rogue-wave analog), megaflashes (single flashes >100 km; WMO record 829 km), and bolts from the blue/gray (strikes in clear air far from the parent storm). The cell model must preserve and surface this tail, not average it into a cell.
Nothing is dropped — the window groups, it does not filter. Single-link clustering with
minPts=1 means a strike with no neighbor becomes its own one-flash cell; the cell window only
decides grouping, never retention. Raw strokes also persist in PostgreSQL regardless. So no choice of
D/T ever deletes a strike.
The real risk is hiding a rogue inside a cell aggregate, not dropping it. Guard against it:
- Keep singleton cells as first-class slots (a lone strike is a valid event, not noise).
- Carry extremity fields on every cell so an outlier surfaces:
peak_flash_rate_per_min,glm_peak_grid_minute_count,area_km2,n_flashes, and anisolation_km(distance to the nearest other cell). - Flag spatial rogue candidates honestly (candidates, NOT certified):
rogue_flagsmay includesingleton,isolated(far from any neighbor, e.g.isolation_kmabove a declared threshold = bolt-from-the-blue candidate),high_intensity(extreme peak grid-minute count).
Honest limit — what these two feeds CANNOT certify (do not overclaim): superbolts by energy
(Blitzortung has no calibrated current — pol dead, mds/mcg uncalibrated; GLM feed is gridded
counts, not per-flash energy) and megaflashes by extent (flash-grouping caps at ~10 km; GLM is
gridded). We flag spatial rogue candidates only. Certifying energy/extent rogues needs a
different source (GLM L2 flash-level energy/area, or a superbolt network like WWLLN) — logged as a
future-source add in docs/dex-coverage-map.md, not part of this kind.
Provisional / live-edge model (pre-declared)
Like tor: a cell whose latest flash is within T of the data's trailing edge is provisional
(it may still be growing); it is finalized once it has been quiet for longer than T. A live
sweep (Brick C) maintains provisional cells from the rolling window.
Build bricks
- Brick A — freeze. THIS DOC: cell-slot/flash-counted ruling, consensus flash counting (ground 10 km/1 s, GLM 16.5 km/330 ms), declared cell rule (15 km/15 min + sensitivity), multi-sensor cited merge + cross-sensor flag, measured-reality IN, three verified layers, slot fields, provisional model, no floor (live feed).
- Brick B — spine (
54f4879). Pooled 52.5M detections, clustered at 15/15 (vectorised grid-snap + 3D connected-components), consensus flash counting (Blitzortung 10/1, GLM summed), ≥30 floor → 133,165 full dossier slots + 3,813-candidate rogue sidecar. Median qualifying cluster 19 km / 82 min; chaining tail bounded (2 clusters >72 h, max 5.3 d, all disclosed). 114 s.scripts/build_lightning_spine.py,src/.../monitor/lightning_cells.py. - Brick D — dossiers + tests (
54f4879+2eff92d). 133,165 dossiers written in Brick B; 7 unit tests (clustering separation/co-location/singletons/temporal gate, flash count), suite 251. Remaining minor:peak_utcper-minute fill (currently null). - Brick C — live sweep. Provisional clusters from the rolling window, finalized after
Tquiet; dedup against the static spine. NOT yet built — spine is a snapshot as of 2026-06-17 (re-runbuild_lightning_spine.pyto refresh until the sweep lands).
Expectation
A clean, well-organized lightning event list that papers can read against the other kinds (eruptions, earthquakes, severe weather, space weather). The value is the organizing; specific cross-matches are separate downstream read-acrosses.