Data Lab / Two Competing Responses Hidden in the 10 m WSPR Anticorrelation: Station-Pair Validation Reveals a Short-Path versus Long-Path Sign Flip
Fig. 1: 10m short vs long
Fig. 2: fisher forest
Fig. 3: full vs filtered r
Fig. 4: pair counts by band
Fig. 5: paper figure1
Fig. 6: paper figure2
WSPR Station-Pair Validation: 10 m Has Two Competing Cycle Responses
Author: PMA (Elise) / TerraPulse Lab
Status: Complete
Created: 2026-04-06
GitHub Issue: #104
Builds on: #87 (WSPR solar cycle modulation), wspr-21year-census
Hypothesis
The wspr-solar-cycle-modulation paper (#87) reported a strongly negative
Pearson correlation between monthly mean SNR and sunspot number (SSN) on the
10 m and 12 m bands ( and , both Bonferroni-significant
across 21 years and 10.94 billion spots) and argued that this anticorrelation
was a station-population selection effect rather than a genuine ionospheric
response. The proposed remediation was to filter the analysis to a stable set
of TX-RX pairs that are present in both solar minimum and solar maximum, and
re-run the correlation. If the hypothesis is right, the negative sign should
disappear.
This workspace executes that test.
Data Sources
| Source | Span | Records | Role |
|---|---|---|---|
WSPRnet raw spots (wspr_raw_YYYYMM.parquet) | Nov 2004 to Mar 2026 | 10.94 B | Response |
| SILSO sunspot number | 2004 to 2026 | ~258 monthly means | Predictor |
The WSPR archive lives on /mnt/ursa/data/terrapulse/wspr/raw as 258 monthly
Parquet files (about 214 GB total). Each spot has tx_sign, rx_sign,
band, snr, distance, and a UTC timestamp.
Methodology
Two-pass extractor
A naive single pass that loads every WSPR file into RAM is impossible: a
single year of raw spots is on the order of 30 GB. The extract.py driver
runs two streaming passes, each in a fresh subprocess per file so the parent
never holds more than the small intermediate output of one month.
Pass 1 (pair_one_pass1.py): for each monthly file, group by
(tx_sign, rx_sign, band) and emit a small Parquet with the spot count and
the solar-cycle period tag (min24, max24, min25, max25, or other).
After all 258 files have been reduced, sum spot counts across files and
identify qualifying pairs: pairs that have at least 100 total spots over
the 21-year window AND are present in both a solar minimum window (cycle 24
or cycle 25 minimum) AND a solar maximum window (cycle 24 or cycle 25
maximum). This filter forces the analysis to use only stations whose
participation does not shift between phases.
Pass 2 (pair_one_pass2.py): for each monthly file, re-scan and emit
two aggregations: a full-population per-band monthly mean SNR (so we can
reproduce the original #87 correlation), and a per-pair monthly mean SNR
filtered by a semi-join against the qualifying-pair table.
The full pipeline produces three Parquet outputs:
qualifying_pairs.parquet: 1,021,771 unique TX-RX-band qualifying triplespair_monthly_snr.parquet: 21,885,356 pair-month rowsfull_band_monthly.parquet: 1,733 month-band aggregates (full population)
Five SNR estimators per band
For each of the eight HF amateur bands, analyze.py builds five monthly mean
SNR series and correlates each against monthly SSN:
full_population: reproduces #87. Spot-weighted mean across all
spots, no pair filter. Includes the same population shift the original
paper warned about.
filtered_mean: spot-weighted mean across only the qualifying-pair
subset. This is the direct test of #87's selection-bias hypothesis.
filtered_median_of_pairs: per-pair monthly mean, then median across
pairs. Robust to per-pair scale differences.
short_path_filtered: spot-weighted mean among qualifying pairs whose
long-run average distance is < 1500 km (576,770 pairs). These are
ground-wave, NVIS, and short-skip paths that should be insensitive to MUF.
long_path_filtered: spot-weighted mean among qualifying pairs whose
long-run average distance is >= 5000 km (160,413 pairs). These are
trans-oceanic and trans-continental DX paths that depend critically on MUF.
For each estimator we compute Pearson , Spearman , -values, and a
linearly detrended Pearson as a secular-trend control. Bonferroni
correction at is applied across the eight bands.
A Fisher contrast quantifies the difference between any two
correlations on the same band; the same machinery answers two distinct
questions in this paper:
- Is the full-population different from the pair-filtered ?
(selection-bias test)
- Is the short-path different from the long-path on the same band?
(path-geometry test)
Findings
Result 1: 12 m IS selection bias
| Estimator | ||||
|---|---|---|---|---|
| Full population | 210 | |||
| Pair-filtered (mean) | 200 | |||
| Pair-filtered (median) | 200 |
Filtering to qualifying pairs kills the negative correlation on 12 m.
The Fisher contrast between full and filtered is ,
, so the two correlations are statistically distinct.
The pair-filtered correlation is no longer Bonferroni-significant. On 12 m,
the original #87 anticorrelation was a selection artifact, exactly as the
paper hypothesized.
Result 2: 10 m is NOT (mostly) selection bias
| Estimator | ||||
|---|---|---|---|---|
| Full population | 217 | |||
| Pair-filtered (mean) | 210 | |||
| Pair-filtered (median) | 210 |
On 10 m, the negative correlation persists under pair filtering. The
Fisher contrast is only , , so the full-population
and pair-filtered correlations are statistically indistinguishable. The
pair-filtered is still Bonferroni-significant, and the
detrended version is essentially unchanged at .
This is a clean refutation of the original "all selection bias" hypothesis
for 10 m. Same station pairs, same measurement protocol, the negative sign
survives.
Result 3: split 10 m by path length and the sign flips
This is the headline finding. Using the same pair-filtered population,
we partition by long-run average path distance:
| Subset | |||||
|---|---|---|---|---|---|
| 10 m short (<1500 km) | 9,996 | 209 | |||
| 10 m long (5000 km) | 1,971 | 203 |
The 10 m band has two competing cycle responses.
- Long paths follow textbook MUF physics: trans-continental and
trans-oceanic 10 m contacts are much more reliable at solar maximum, and
their mean SNR rises by several dB. , .
- Short paths get worse at solar maximum: ground-wave and short-skip
10 m paths show a strongly negative correlation, ,
. Same finding holds in the rank statistics
() and after detrending (), so it cannot
be attributed to a few outlier months or to network growth.
The Fisher contrast between the two subsets on 10 m is
, : short-path 10 m and long-path 10 m
respond to the solar cycle in opposite directions at a level that is
essentially impossible under any single shared response. The full-population
that #87 reported was the average of these two opposite signals.
Result 4: 12 m shows the same path-length asymmetry, in a milder form
| Subset | |||
|---|---|---|---|
| 12 m short (<1500 km) | (subset of 3,300) | ||
| 12 m long (5000 km) | (subset of 3,300) |
12 m short vs long Fisher , . The
same physical pattern (short paths anticorrelate, long paths correlate)
shows up on 12 m as well. But the pair-filtered combined 12 m correlation
collapses to near zero (, n.s.), so 12 m's full-population
anticorrelation is a mixture of the short/long contrast plus a
contributing selection effect from station-population shifts.
What is happening physically?
Long-path 10 m needs F-layer refraction at very high frequency. This requires
MUF 28 MHz, which only happens during solar maximum. At solar minimum
the long-path 10 m population is dominated by sporadic- (single-hop, not
trans-oceanic), meteor scatter, and the rare path-of-the-day; at solar
maximum, conventional F2 propagation opens hundreds of paths. The
surviving long-path qualifying pairs are by construction the persistent
ones (the Northwest Europe to Caribbean and US-to-Pacific links that show
up in both phases), and on those paths the SNR rises with SSN. Textbook.
Short-path 10 m is a different beast. Below 1500 km the dominant
propagation modes are line-of-sight, ground wave, sporadic-, and
near-vertical incidence skywave (NVIS). These modes are largely insensitive
to F-layer MUF. Yet the spot-weighted mean SNR on short 10 m paths drops by
about 2 dB from solar minimum to solar maximum, with . Possible
explanations:
- Co-channel interference and band crowding: at solar maximum, hundreds
of additional WSPR transmitters fire up on 10 m chasing F-layer DX. They
share the band with the short-path stations. Co-channel collisions raise
the local noise floor at the receiver and degrade decoded SNR even on
short paths that do not themselves use F-layer.
- External noise from solar activity: during solar maximum the geomagnetic
environment is more disturbed, which raises the HF noise floor (galactic
and atmospheric component variations, ionospheric absorption events).
Short paths cannot escape this floor as easily as long paths can stack
transmit power against the same noise.
- Multipath / fading on sporadic-E enhanced openings: at solar maximum
there are more sporadic- events on short paths, and weak
single-hop contacts decode at lower SNR than ground-wave contacts.
- D-layer absorption: at solar maximum, increased D-layer ionization
raises HF absorption, especially in daytime. Short-path 10 m paths that
rely on near-vertical incidence skywave see higher absorption losses.
We cannot pick a single mechanism from this dataset. What we can say is
that the negative 10 m signal in #87 was not purely a measurement artifact
of who decided to log spots; the same pairs that decoded short-path
contacts at dB during solar minimum decoded them at dB during
solar maximum.
Result 5: positive bands behave better, with one new finding
Filtering to qualifying pairs strengthens the positive correlations on
40 m, 30 m, 20 m, 17 m, and 15 m, in some cases substantially:
| Band | full | filtered | short | long |
|---|---|---|---|---|
| 40 m | ||||
| 30 m | ||||
| 20 m | ||||
| 17 m | ||||
| 15 m |
Long-path correlations are uniformly the strongest on these bands, which is
consistent with MUF physics: more cycle sensitivity is expected on the paths
that are MUF-limited. The 17 m and 15 m bands, which were not
Bonferroni-significant in their full-population form (#87), become strongly
positive once a fixed station population is enforced, particularly on
long paths. So the pair-filter not only resolves #87's 12 m anomaly, it also
recovers the expected positive ionospheric signal on 15 m and 17 m that the
raw-population analysis was unable to detect.
What does this mean for #87?
The original #87 paper proposed a single mechanism (population shift toward
weak long-path attempts at solar maximum) to explain both the 10 m and 12 m
anticorrelations. This workspace shows the situation is more nuanced:
- 12 m is consistent with #87's selection-bias story. Filtering to a
fixed pair population takes the correlation from to .
- 10 m is partly selection bias and partly a real, opposite-sign
short-path effect. Filtering takes the correlation from only to
, but the residual signal is itself the average of two
oppositely-signed populations: on short paths and on long
paths.
The methodological consequence for any future WSPR-based ionospheric study
on the TerraPulse platform: always condition on path geometry, not just
on station identity. Mean SNR aggregated across all path lengths is a
mixture of two physical regimes that respond to the solar cycle in opposite
directions on bands at the upper edge of the HF range.
Limitations
- Pair classification by long-run average distance, not per-month
distance. A few qualifying pairs may have changed antennas or moved
antennas across the window; the average is robust to this.
- No partition by hemisphere or time of day. Daytime vs nighttime 10 m
propagation is qualitatively different and would warrant a follow-up.
- No mechanism resolution. We can show the short-path 10 m
anticorrelation is real, but we cannot distinguish among the four
candidate physical explanations (band crowding, geomagnetic noise,
multipath, D-layer absorption) with this dataset alone. A resolution
would need station-by-station noise-floor measurements aligned with the
geomagnetic record.
- The "medium" path bin (1500-5000 km) is excluded from the short/long
contrast. We do not report it because it spans both regimes. About 28%
of qualifying pairs fall in this bin and are present in the
pair-filtered aggregate but not in either short or long.
References
- WSPR solar cycle modulation paper (#87, this repository,
workspaces/wspr-solar-cycle-modulation/paper/paper.pdf)
- WSPR 21-year census (
workspaces/wspr-21year-census/) - Davies, Ionospheric Radio (1990)
- Hunsucker and Hargreaves, The High-Latitude Ionosphere (2003)
- Taylor, "The WSPR protocol," QST (Oct 2010)
- Frissell et al., GRL 49, e2022GL097879 (2022)
- WDC-SILSO, Royal Observatory of Belgium, https://www.sidc.be/SILSO/
Author: PMA
Published: 2026-04-06 · Updated: 2026-04-07
Data files: full_band_monthly.parquet, pair_monthly_snr.parquet, qualifying_pairs.parquet, results.json
Scripts: analyze.py, extract.py, make_figure.py, make_figure_v2.py, make_plotly.py, pair_one_pass1.py, pair_one_pass2.py