Listening for events…

Pre-registration — "Tornado Alley is moving east" (DP-002 V1)

Status: FROZEN 2026-06-18. Methods locked before any extraction code is written. Workspace: workspaces/tornado-alley-east/. Promoted from docs/dream-papers.md DP-002.

Headline question

At what velocity and in which direction has the centroid of US significant-tornado (EF2+) activity moved, decade by decade, since 1950 — and is any eastward shift an expansion into the Southeast or a decline in the Great Plains?

Why EF2+ is the headline population (load-bearing, Mike's call 2026-06-18)

US tornado reporting is dominated by a weak-tornado detection explosion. From 1950–75 to 2000–25 the recorded counts move as:

  • EF0: 4,151 → 17,332 (~4x) — detection inflation (spotters, radar, smartphones, dashcams)
  • EF1: 6,637 → 10,526 — modest
  • EF2: 4,573 → 2,706 — falls
  • EF3–5: all fall

Strong tornadoes were essentially never missed, even in 1950, so EF2+ counts are honest across the whole record. Leading with EF2+ means the trajectory reflects where tornadoes are, not where observers grew. EF1+ and All-tornado trajectories are reported as sensitivity, never as the headline.

Data

  • Source: spc_tornado_history (SPC Tornado History 1950–2025), source_id 2754b2b9-1c98-4bcd-8ae2-d2dbe15a9ba2, 73,458 rows. Already dexed as the tor Eventdex kind.
  • Per-tornado fields (from extra_json): ef_scale, length_miles, width_yards, year, state, fatalities, injuries, end_lat, end_lon. Start lat/lon = the observation geometry (geography(Point,4326)).

Populations (frozen)

  • Primary: EF2+ (ef_scale >= 2), ≈ 13,404.
  • Sensitivity: EF1+ (ef_scale >= 1) ≈ 38,583; All (ef_scale >= 0) ≈ 71,912.
  • Excluded from every population: ef_scale = -9 (unknown rating, 1,546, all modern). Cannot classify; noted as a limitation.

Spatial cleaning (frozen)

  • Tornado location = start point (standard convention; both endpoints retained for QA).
  • CONUS only: keep lat ∈ [24, 50], lon ∈ [-125, -66]. Drops AK/HI/PR and bad/zero coords.
  • Drop null/zero coordinates.

Geographic measures (frozen)

  1. Annual centroid = path-length-weighted geographic mean of start coordinates. Path-length weighting upweights long-track tornadoes, which are the least subject to reporting bias. Unweighted (count) centroid reported as sensitivity.
  2. Annual variance ellipse via 2D Gaussian fit: semi-major axis, semi-minor axis, rotation angle. Tracks the spread and orientation of activity, not just its center.
  3. Sub-regional share series to decompose the headline shift:
    • Great Plains = TX, OK, KS, NE.
    • Southeast/"Dixie Alley" = AR, LA, MS, AL, TN, GA, KY, MO.
    • Track each region's fractional share of EF2+ activity per year. This is what distinguishes "expansion east" (Southeast share up, Plains flat) from "Plains decline" (Plains share down, absolute Southeast flat).

Trajectory analysis (frozen)

  • Centroid (lat, lon) per year, 1950–2025.
  • Linear trend (Δlat/yr, Δlon/yr) over the full record and rolling 30-year windows.
  • Change-point test on the annual longitude centroid (the eastward component): pre-register a single-change-point test (PELT / binary segmentation via ruptures, or Bayesian change point). Report location and credible/confidence interval, or "no change-point detected."
  • Compare the measured Δlon/yr against the 10–20 km/yr figure circulated in popular press.

Uncertainty (frozen)

  • Primary: cluster bootstrap by convective day. Tornadoes within an outbreak day are spatially clustered and not independent; resampling individual tornadoes understates the centroid CI. Resample whole convective-days (≥1000 draws) → CI on each annual centroid and on the trend slopes.
  • Sensitivity: tornado-level bootstrap (the naive version) reported alongside, so the independence cost is explicit.

Pre-registered predictions

  • H1 (longitude): the EF2+ centroid moves east (lon becomes less negative) over 1950–2025. Test sign, magnitude, and whether it clears the popular-press 10–20 km/yr claim.
  • H2 (latitude): no strong directional prior; report the trend and CI.
  • H0: net centroid displacement is within the cluster-bootstrap CI (no detectable shift).
  • Decomposition prediction: if the eastward shift is real, the Southeast share rises while Great-Plains absolute EF2+ counts hold roughly flat (expansion), versus Plains share falling with flat Southeast (decline). The paper reports which pattern the data show; both are publishable findings.

Outputs (Riley)

  • data/tornadoes.parquet — per-tornado: year, ef_scale, start_lat, start_lon, length_miles, state, region (plains/southeast/other), population flags (ef2/ef1/all).
  • data/centroids.parquet — per year × population: weighted & unweighted centroid (lat, lon), ellipse (a, b, θ), N, cluster-bootstrap CI.
  • data/results.json — full-record & rolling trend slopes (Δlat/yr, Δlon/yr) per population; change-point result; Plains-vs-Southeast share series; popular-claim comparison; all N values.

Sasha pre-flight (controls already baked in)

  • Reporting bias → EF2+ headline + path-length weighting (addressed; EF1+/All shown as the bias-sensitivity ladder).
  • Garden-of-forking-paths → populations, weighting, change-point method, predictions all frozen here before extraction.
  • Sample independence → cluster bootstrap by convective day is the primary uncertainty.
  • Overreach → "expansion vs decline" decomposition is mandatory; no bare "moving east" claim.
  • Population-density correction is an optional sensitivity only; EF2+ reporting is argued near-complete throughout, so it is not in the headline path.

Honesty / limitations to carry into the draft

  • SPC pre-~1953 coverage is thin; show the early-years CI honestly (do not smooth it away).
  • EF (Enhanced Fujita) ratings are retrospectively mapped from F-scale for pre-2007 events by SPC; we use SPC's published ef_scale as-is and note the rating-system change.
  • ef_scale = -9 unknowns are modern; their exclusion slightly reduces recent N.
Live Feed