Pre-registration — "Tornado Alley is moving east" (DP-002 V1)
Status: FROZEN 2026-06-18. Methods locked before any extraction code is written.
Workspace: workspaces/tornado-alley-east/. Promoted from docs/dream-papers.md DP-002.
Headline question
At what velocity and in which direction has the centroid of US significant-tornado (EF2+) activity moved, decade by decade, since 1950 — and is any eastward shift an expansion into the Southeast or a decline in the Great Plains?
Why EF2+ is the headline population (load-bearing, Mike's call 2026-06-18)
US tornado reporting is dominated by a weak-tornado detection explosion. From 1950–75 to 2000–25 the recorded counts move as:
- EF0: 4,151 → 17,332 (~4x) — detection inflation (spotters, radar, smartphones, dashcams)
- EF1: 6,637 → 10,526 — modest
- EF2: 4,573 → 2,706 — falls
- EF3–5: all fall
Strong tornadoes were essentially never missed, even in 1950, so EF2+ counts are honest across the whole record. Leading with EF2+ means the trajectory reflects where tornadoes are, not where observers grew. EF1+ and All-tornado trajectories are reported as sensitivity, never as the headline.
Data
- Source:
spc_tornado_history(SPC Tornado History 1950–2025), source_id2754b2b9-1c98-4bcd-8ae2-d2dbe15a9ba2, 73,458 rows. Already dexed as thetorEventdex kind. - Per-tornado fields (from
extra_json):ef_scale,length_miles,width_yards,year,state,fatalities,injuries,end_lat,end_lon. Start lat/lon = the observation geometry (geography(Point,4326)).
Populations (frozen)
- Primary: EF2+ (
ef_scale >= 2), ≈ 13,404. - Sensitivity: EF1+ (
ef_scale >= 1) ≈ 38,583; All (ef_scale >= 0) ≈ 71,912. - Excluded from every population:
ef_scale = -9(unknown rating, 1,546, all modern). Cannot classify; noted as a limitation.
Spatial cleaning (frozen)
- Tornado location = start point (standard convention; both endpoints retained for QA).
- CONUS only: keep
lat ∈ [24, 50],lon ∈ [-125, -66]. Drops AK/HI/PR and bad/zero coords. - Drop null/zero coordinates.
Geographic measures (frozen)
- Annual centroid = path-length-weighted geographic mean of start coordinates. Path-length weighting upweights long-track tornadoes, which are the least subject to reporting bias. Unweighted (count) centroid reported as sensitivity.
- Annual variance ellipse via 2D Gaussian fit: semi-major axis, semi-minor axis, rotation angle. Tracks the spread and orientation of activity, not just its center.
- Sub-regional share series to decompose the headline shift:
- Great Plains = TX, OK, KS, NE.
- Southeast/"Dixie Alley" = AR, LA, MS, AL, TN, GA, KY, MO.
- Track each region's fractional share of EF2+ activity per year. This is what distinguishes "expansion east" (Southeast share up, Plains flat) from "Plains decline" (Plains share down, absolute Southeast flat).
Trajectory analysis (frozen)
- Centroid (lat, lon) per year, 1950–2025.
- Linear trend (Δlat/yr, Δlon/yr) over the full record and rolling 30-year windows.
- Change-point test on the annual longitude centroid (the eastward component): pre-register
a single-change-point test (PELT / binary segmentation via
ruptures, or Bayesian change point). Report location and credible/confidence interval, or "no change-point detected." - Compare the measured Δlon/yr against the 10–20 km/yr figure circulated in popular press.
Uncertainty (frozen)
- Primary: cluster bootstrap by convective day. Tornadoes within an outbreak day are spatially clustered and not independent; resampling individual tornadoes understates the centroid CI. Resample whole convective-days (≥1000 draws) → CI on each annual centroid and on the trend slopes.
- Sensitivity: tornado-level bootstrap (the naive version) reported alongside, so the independence cost is explicit.
Pre-registered predictions
- H1 (longitude): the EF2+ centroid moves east (lon becomes less negative) over 1950–2025. Test sign, magnitude, and whether it clears the popular-press 10–20 km/yr claim.
- H2 (latitude): no strong directional prior; report the trend and CI.
- H0: net centroid displacement is within the cluster-bootstrap CI (no detectable shift).
- Decomposition prediction: if the eastward shift is real, the Southeast share rises while Great-Plains absolute EF2+ counts hold roughly flat (expansion), versus Plains share falling with flat Southeast (decline). The paper reports which pattern the data show; both are publishable findings.
Outputs (Riley)
data/tornadoes.parquet— per-tornado: year, ef_scale, start_lat, start_lon, length_miles, state, region (plains/southeast/other), population flags (ef2/ef1/all).data/centroids.parquet— per year × population: weighted & unweighted centroid (lat, lon), ellipse (a, b, θ), N, cluster-bootstrap CI.data/results.json— full-record & rolling trend slopes (Δlat/yr, Δlon/yr) per population; change-point result; Plains-vs-Southeast share series; popular-claim comparison; all N values.
Sasha pre-flight (controls already baked in)
- Reporting bias → EF2+ headline + path-length weighting (addressed; EF1+/All shown as the bias-sensitivity ladder).
- Garden-of-forking-paths → populations, weighting, change-point method, predictions all frozen here before extraction.
- Sample independence → cluster bootstrap by convective day is the primary uncertainty.
- Overreach → "expansion vs decline" decomposition is mandatory; no bare "moving east" claim.
- Population-density correction is an optional sensitivity only; EF2+ reporting is argued near-complete throughout, so it is not in the headline path.
Honesty / limitations to carry into the draft
- SPC pre-~1953 coverage is thin; show the early-years CI honestly (do not smooth it away).
- EF (Enhanced Fujita) ratings are retrospectively mapped from F-scale for pre-2007 events by
SPC; we use SPC's published
ef_scaleas-is and note the rating-system change. ef_scale = -9unknowns are modern; their exclusion slightly reduces recent N.