Lightning Eventdex Kind — Scope + Frozen Settings (Brick A)

Status: FROZEN 2026-06-17 · Owner: Mike + Claude (engine room) · Kind: lightning (12th Eventdex kind) · Parent: docs/event-spine-framework.md, docs/dex-data-model.md, docs/dex-coverage-map.md (lightning = the locked first pick).

The largest measured-reality phenomenon TerraPulse holds (~52.5M raw detections), organized into storm-cell event slots. Built to the dex data model: one slot = one event (here, one storm-cell lightning episode); per-phenomenon parallel list; measured reality only.

The slot — LOAD-BEARING, RULED BY MIKE 2026-06-17: lightning cluster, flash-counted, ≥30 floor

One slot = one lightning cluster (the proper meteorological term, verified 2026-06-17), not one strike and not one flash. A single strike/flash is the wrong unit to compare against an eruption or an earthquake (the way a single seismic wiggle is the wrong unit for a quake); the cluster is the event, and the strikes are the raw measured series that compose it. The flash count, peak rate, and footprint are the cluster's parameter-bundle fields (like a tornado's EF rating and track), not its identity. The raw strokes/flashes stay queryable in PostgreSQL — nothing is discarded.

Terminology (verified 2026-06-17). A flash is the consensus discharge unit. Above it the meteorological hierarchy is cell (smallest convective unit, tens of km, ~30 min) → multicell cluster (several cells, hours) → mesoscale convective system / MCS (hundreds of km, hours). The grouping operation we run produces a cluster at whatever scale the lightning organises into — a lone bolt is a singleton cluster, a squall line is a large cluster — so "cluster" is the honest scale-agnostic name; each slot's extent and duration fields say where on the cell→cluster→MCS ladder it actually sits. ("Episode" is a NOAA Storm-Events database convention, NOT the right meteorological term; "cell" is wrong at the large end. Rejected.)

Grouping = natural single-link, no storm-segmentation model (Option A, Mike 2026-06-17). We do NOT impose a radar-style storm-cell segmentation (that is processing, not organising). We take the clusters single-link clustering naturally yields, accept that a continuous squall line legitimately chains into one large cluster (its extent/duration fields disclose this so a paper can subdivide if it ever needs to), and keep the unit honest.

≥30-flash floor for a full dossier slot (Mike 2026-06-17). A cluster earns a full dossier only if its total flash activity (consensus Blitzortung flashes + summed GLM flash counts) is ≥ 30. This keeps the slot count meaningful (a 1–2 flash cluster is not a storm) and in range: validation on a peak-season day (2026-06-11) projects to roughly 150k slots over the ~2-month window at this floor (versus ~1.1M with no floor). Sub-floor clusters are NOT dossiered — EXCEPT rogue candidates (see rogue preservation): an isolated/singleton sub-floor cluster is kept as a flagged lightweight sidecar record so a bolt-from-the-blue is never lost. All raw detections remain in PostgreSQL regardless.

Counting flashes — the CONSENSUS layer (agency-backed, cited)

The cell's intensity is counted in flashes, the scientifically-agreed unit of one discharge, using the established grouping criteria (verified 2026-06-17):

Blitzortung (ground VLF/LF point detections): group detections into flashes by the long-standing ground-network standard, strokes within 10 km and 1 s of the first, ≤0.5 s between consecutive strokes (Cummins/Murphy-lineage; "the most used criteria for over twenty years", ScienceDirect S0169809519313304). Yields blitzortung_flash_count; the raw count is blitzortung_stroke_count.
GOES GLM (gridded 0.25°×1-min flash counts): flashes are already grouped upstream by NOAA's Lightning Cluster-Filter Algorithm (16.5 km / 330 ms, GOES-R GLM ATBD v3.0). We sum the grid-minute counts; no regrouping. Yields glm19_flash_count, glm18_flash_count.

Honest non-blend: a GLM optical flash (total lightning, includes intracloud) is NOT the same unit as a Blitzortung VLF flash (mostly cloud-to-ground). We do not sum them into one blended "true" count. Each sensor's flash count is reported and cited separately on the slot; cross-sensor agreement is a recorded property, not a merge.

Building clusters — the DECLARED layer (no agency consensus, pre-registered)

There is no NOAA or community standard for grouping flashes into a cluster (consensus stops at the flash). The cluster boundary is therefore a declared organizing rule, pre-registered here like the tornado-match tolerance, and explicitly labeled as a choice, not a standard:

A flash/grid-count joins a cluster if it is within D km and T minutes of any flash already in the cluster (transitive chaining = spatiotemporal single-link clustering, minPts=1).
FROZEN default: D = 15 km, T = 15 min (mid-range of lightning-nowcasting practice).
Pre-registered sensitivity: rerun at (10 km/10 min) and (20 km/20 min); report cluster-count and headline stability. A finding that survives all three is reportable; one that flips is flagged.
Clustering runs on the pooled flash set across all three sensors, so one storm seen by ground and space becomes one cluster with per-sensor counts.
Single-link chaining is accepted, not fought (Option A): a continuous squall line legitimately becomes one large cluster; its extent/duration fields disclose its scale (cell → multicell → MCS). We impose no storm-segmentation model.

Multi-sensor cited slot + cross-sensor flag

Per the universal multi-source-cited-slot rule: one cell, every contributing sensor cited. The slot records sensors (which of the three contributed) and cross_sensor (ground_only / space_only / both). A cell over Europe is Blitzortung-only; a cell over the US can carry all three. Sensor provenance is honest per cell; no artificial geographic cut (measured reality, no spiking).

Measured-reality — IN

Every detection is a sensor recording a real electrical discharge that physically happened (ground VLF time-of-arrival, or GLM optical). No model output, no forecast. Squarely IN under the bright line ([[feedback_measured_reality_only]]).

The three layers (verified in PG16, 2026-06-17)

Layer	Source slug	Shape	Coverage
Ground VLF	`blitzortung_lightning` (24.8M)	point detections, sub-second, lat/lon precise; `extra_json` has `pol`, `n_stations`, quality	global, 2026-04-20 → present
GOES-East	`goes19_glm_flashes` (17.7M)	gridded 0.25°×1-min flash count (`val`); NOAA LCFA upstream	Americas, 2026-04-22 → present
GOES-West	`goes18_glm_flashes` (10.0M)	gridded 0.25°×1-min flash count; NOAA LCFA upstream	Americas, 2026-04-22 → present

Coverage note: all three are live streaming feeds, ~2 months deep and growing daily — unlike the historical catalogs (eq/tor/vol). The lightning dex grows with the feed; no floor (take all available coverage). It will never have a 1950 backfill because the feed does not.

What a cluster slot holds (the parameter bundle)

kind: "lightning", slot id, provisional flag.
Time: start_utc, peak_utc (minute of max flash rate), end_utc, duration_min.
Location: centroid_lat/centroid_lon, footprint (bbox + area_km2), extent_km (max span — discloses cell vs MCS scale), total_flash_activity (the ≥30 gate value).
Intensity, per sensor (never blended): blitzortung_flash_count, blitzortung_stroke_count, glm19_flash_count, glm18_flash_count, peak_flash_rate_per_min, glm_peak_grid_minute_count (busiest 0.25°×1-min cell; observed up to 118/175 flashes).
Polarity: NOT AVAILABLE — verified 2026-06-17 that Blitzortung extra_json.pol is 0 for all 24.8M rows (field not populated in this feed). No CG polarity breakdown; do not promise one. (The mds/mcg fields are signal metrics, not calibrated peak current; value is null.)
sensors, cross_sensor; sources cited (3 sources + duckdb refs).
Rogue preservation (see section below): n_flashes, isolation_km, rogue_flags (e.g. singleton / isolated / high_intensity), all spatial-only candidates, never certified.

Rogue preservation — the tail must not be organized away (Mike 2026-06-17)

Rogue lightning is real and is exactly the kind of thing a paper wants: superbolts (rare, hundreds-of-times-normal energy, the rogue-wave analog), megaflashes (single flashes >100 km; WMO record 829 km), and bolts from the blue/gray (strikes in clear air far from the parent storm). The cell model must preserve and surface this tail, not average it into a cell.

Nothing is dropped — the window groups, it does not filter. Single-link clustering with minPts=1 means a strike with no neighbor becomes its own one-flash cell; the cell window only decides grouping, never retention. Raw strokes also persist in PostgreSQL regardless. So no choice of D/T ever deletes a strike.

The real risk is hiding a rogue inside a cell aggregate, not dropping it. Guard against it:

Keep singleton cells as first-class slots (a lone strike is a valid event, not noise).
Carry extremity fields on every cell so an outlier surfaces: peak_flash_rate_per_min, glm_peak_grid_minute_count, area_km2, n_flashes, and an isolation_km (distance to the nearest other cell).
Flag spatial rogue candidates honestly (candidates, NOT certified): rogue_flags may include singleton, isolated (far from any neighbor, e.g. isolation_km above a declared threshold = bolt-from-the-blue candidate), high_intensity (extreme peak grid-minute count).

Honest limit — what these two feeds CANNOT certify (do not overclaim): superbolts by energy (Blitzortung has no calibrated current — pol dead, mds/mcg uncalibrated; GLM feed is gridded counts, not per-flash energy) and megaflashes by extent (flash-grouping caps at ~10 km; GLM is gridded). We flag spatial rogue candidates only. Certifying energy/extent rogues needs a different source (GLM L2 flash-level energy/area, or a superbolt network like WWLLN) — logged as a future-source add in docs/dex-coverage-map.md, not part of this kind.

Provisional / live-edge model (pre-declared)

Like tor: a cell whose latest flash is within T of the data's trailing edge is provisional (it may still be growing); it is finalized once it has been quiet for longer than T. A live sweep (Brick C) maintains provisional cells from the rolling window.

Build bricks

Brick A — freeze. THIS DOC: cell-slot/flash-counted ruling, consensus flash counting (ground 10 km/1 s, GLM 16.5 km/330 ms), declared cell rule (15 km/15 min + sensitivity), multi-sensor cited merge + cross-sensor flag, measured-reality IN, three verified layers, slot fields, provisional model, no floor (live feed).
Brick B — spine (54f4879). Pooled 52.5M detections, clustered at 15/15 (vectorised grid-snap + 3D connected-components), consensus flash counting (Blitzortung 10/1, GLM summed), ≥30 floor → 133,165 full dossier slots + 3,813-candidate rogue sidecar. Median qualifying cluster 19 km / 82 min; chaining tail bounded (2 clusters >72 h, max 5.3 d, all disclosed). 114 s. scripts/build_lightning_spine.py, src/.../monitor/lightning_cells.py.
Brick D — dossiers + tests (54f4879 + 2eff92d). 133,165 dossiers written in Brick B; 7 unit tests (clustering separation/co-location/singletons/temporal gate, flash count), suite 251. Remaining minor: peak_utc per-minute fill (currently null).
Brick C — live sweep. Provisional clusters from the rolling window, finalized after T quiet; dedup against the static spine. NOT yet built — spine is a snapshot as of 2026-06-17 (re-run build_lightning_spine.py to refresh until the sweep lands).

Expectation

A clean, well-organized lightning event list that papers can read against the other kinds (eruptions, earthquakes, severe weather, space weather). The value is the organizing; specific cross-matches are separate downstream read-acrosses.