WSPR Station-Pair Validation: 10 m Has Two Competing Cycle Responses

Author: PMA (Elise) / TerraPulse Lab

Status: Complete

Created: 2026-04-06

GitHub Issue: #104

Builds on: #87 (WSPR solar cycle modulation), wspr-21year-census

Hypothesis

The wspr-solar-cycle-modulation paper (#87) reported a strongly negative

Pearson correlation between monthly mean SNR and sunspot number (SSN) on the

10 m and 12 m bands ( $r = -0.40$ and $r = -0.44$ , both Bonferroni-significant

across 21 years and 10.94 billion spots) and argued that this anticorrelation

was a station-population selection effect rather than a genuine ionospheric

response. The proposed remediation was to filter the analysis to a stable set

of TX-RX pairs that are present in both solar minimum and solar maximum, and

re-run the correlation. If the hypothesis is right, the negative sign should

disappear.

This workspace executes that test.

Data Sources

Source	Span	Records	Role
WSPRnet raw spots (`wspr_raw_YYYYMM.parquet`)	Nov 2004 to Mar 2026	10.94 B	Response
SILSO sunspot number	2004 to 2026	~258 monthly means	Predictor

The WSPR archive lives on /mnt/ursa/data/terrapulse/wspr/raw as 258 monthly

Parquet files (about 214 GB total). Each spot has tx_sign, rx_sign,

band, snr, distance, and a UTC timestamp.

Methodology

Two-pass extractor

A naive single pass that loads every WSPR file into RAM is impossible: a

single year of raw spots is on the order of 30 GB. The extract.py driver

runs two streaming passes, each in a fresh subprocess per file so the parent

never holds more than the small intermediate output of one month.

Pass 1 (pair_one_pass1.py): for each monthly file, group by

(tx_sign, rx_sign, band) and emit a small Parquet with the spot count and

the solar-cycle period tag (min24, max24, min25, max25, or other).

After all 258 files have been reduced, sum spot counts across files and

identify qualifying pairs: pairs that have at least 100 total spots over

the 21-year window AND are present in both a solar minimum window (cycle 24

or cycle 25 minimum) AND a solar maximum window (cycle 24 or cycle 25

maximum). This filter forces the analysis to use only stations whose

participation does not shift between phases.

Pass 2 (pair_one_pass2.py): for each monthly file, re-scan and emit

two aggregations: a full-population per-band monthly mean SNR (so we can

reproduce the original #87 correlation), and a per-pair monthly mean SNR

filtered by a semi-join against the qualifying-pair table.

The full pipeline produces three Parquet outputs:

qualifying_pairs.parquet: 1,021,771 unique TX-RX-band qualifying triples
pair_monthly_snr.parquet: 21,885,356 pair-month rows
full_band_monthly.parquet: 1,733 month-band aggregates (full population)

Five SNR estimators per band

For each of the eight HF amateur bands, analyze.py builds five monthly mean

SNR series and correlates each against monthly SSN:

full_population: reproduces #87. Spot-weighted mean across all

spots, no pair filter. Includes the same population shift the original

paper warned about.

filtered_mean: spot-weighted mean across only the qualifying-pair

subset. This is the direct test of #87's selection-bias hypothesis.

filtered_median_of_pairs: per-pair monthly mean, then median across

pairs. Robust to per-pair scale differences.

short_path_filtered: spot-weighted mean among qualifying pairs whose

long-run average distance is < 1500 km (576,770 pairs). These are

ground-wave, NVIS, and short-skip paths that should be insensitive to MUF.

long_path_filtered: spot-weighted mean among qualifying pairs whose

long-run average distance is >= 5000 km (160,413 pairs). These are

trans-oceanic and trans-continental DX paths that depend critically on MUF.

For each estimator we compute Pearson $r$ , Spearman $\rho$ , $p$ -values, and a

linearly detrended Pearson $r$ as a secular-trend control. Bonferroni

correction at $\alpha/8 = 0.00625$ is applied across the eight bands.

A Fisher $z$ contrast quantifies the difference between any two

correlations on the same band; the same machinery answers two distinct

questions in this paper:

Is the full-population $r$ different from the pair-filtered $r$ ?

(selection-bias test)

Is the short-path $r$ different from the long-path $r$ on the same band?

(path-geometry test)

Findings

Result 1: 12 m IS selection bias

Estimator	$r$	$\rho$	$r_{detr}$	$N$
Full population	$-0.443$	$-0.493$	$-0.353$	210
Pair-filtered (mean)	$-0.120$	$-0.089$	$-0.075$	200
Pair-filtered (median)	$-0.102$	$-0.009$	$+0.057$	200

Filtering to qualifying pairs kills the negative correlation on 12 m.

The Fisher $z$ contrast between full and filtered is $z = -3.57$ ,

$p = 3.5 \times 10^{-4}$ , so the two correlations are statistically distinct.

The pair-filtered correlation is no longer Bonferroni-significant. On 12 m,

the original #87 anticorrelation was a selection artifact, exactly as the

paper hypothesized.

Result 2: 10 m is NOT (mostly) selection bias

Estimator	$r$	$\rho$	$r_{detr}$	$N$
Full population	$-0.401$	$-0.574$	$-0.309$	217
Pair-filtered (mean)	$-0.308$	$-0.339$	$-0.326$	210
Pair-filtered (median)	$-0.211$	$-0.162$	$-0.078$	210

On 10 m, the negative correlation persists under pair filtering. The

Fisher $z$ contrast is only $z = -1.09$ , $p = 0.27$ , so the full-population

and pair-filtered correlations are statistically indistinguishable. The

pair-filtered $r = -0.31$ is still Bonferroni-significant, and the

detrended version is essentially unchanged at $r = -0.33$ .

This is a clean refutation of the original "all selection bias" hypothesis

for 10 m. Same station pairs, same measurement protocol, the negative sign

survives.

Result 3: split 10 m by path length and the sign flips

This is the headline finding. Using the same pair-filtered population,

we partition by long-run average path distance:

Subset	$N_{pairs}$	$r$	$\rho$	$r_{detr}$	$N_{months}$
10 m short (<1500 km)	9,996	$\mathbf{-0.613}$	$\mathbf{-0.671}$	$-0.576$	209
10 m long ( $\geq$ 5000 km)	1,971	$\mathbf{+0.536}$	$\mathbf{+0.532}$	$+0.519$	203

The 10 m band has two competing cycle responses.

Long paths follow textbook MUF physics: trans-continental and

trans-oceanic 10 m contacts are much more reliable at solar maximum, and

their mean SNR rises by several dB. $r = +0.54$ , $p \approx 10^{-16}$ .

Short paths get worse at solar maximum: ground-wave and short-skip

10 m paths show a strongly negative correlation, $r = -0.61$ ,

$p \approx 10^{-23}$ . Same finding holds in the rank statistics

( $\rho = -0.67$ ) and after detrending ( $r_{detr} = -0.58$ ), so it cannot

be attributed to a few outlier months or to network growth.

The Fisher $z$ contrast between the two subsets on 10 m is

$z = -13.2$ , $p = 7 \times 10^{-40}$ : short-path 10 m and long-path 10 m

respond to the solar cycle in opposite directions at a level that is

essentially impossible under any single shared response. The full-population

$r = -0.40$ that #87 reported was the average of these two opposite signals.

Result 4: 12 m shows the same path-length asymmetry, in a milder form

Subset	$N_{pairs}$	$r$	$r_{detr}$
12 m short (<1500 km)	(subset of 3,300)	$-0.323$	$-0.296$
12 m long ( $\geq$ 5000 km)	(subset of 3,300)	$+0.503$	$+0.475$

12 m short vs long Fisher $z = -8.45$ , $p \approx 3 \times 10^{-17}$ . The

same physical pattern (short paths anticorrelate, long paths correlate)

shows up on 12 m as well. But the pair-filtered combined 12 m correlation

collapses to near zero ( $r = -0.12$ , n.s.), so 12 m's full-population

anticorrelation is a mixture of the short/long contrast plus a

contributing selection effect from station-population shifts.

What is happening physically?

Long-path 10 m needs F-layer refraction at very high frequency. This requires

MUF $\geq$ 28 MHz, which only happens during solar maximum. At solar minimum

the long-path 10 m population is dominated by sporadic- $E$ (single-hop, not

trans-oceanic), meteor scatter, and the rare path-of-the-day; at solar

maximum, conventional F2 propagation opens hundreds of paths. The

surviving long-path qualifying pairs are by construction the persistent

ones (the Northwest Europe to Caribbean and US-to-Pacific links that show

up in both phases), and on those paths the SNR rises with SSN. Textbook.

Short-path 10 m is a different beast. Below 1500 km the dominant

propagation modes are line-of-sight, ground wave, sporadic- $E$ , and

near-vertical incidence skywave (NVIS). These modes are largely insensitive

to F-layer MUF. Yet the spot-weighted mean SNR on short 10 m paths drops by

about 2 dB from solar minimum to solar maximum, with $r = -0.61$ . Possible

explanations:

Co-channel interference and band crowding: at solar maximum, hundreds

of additional WSPR transmitters fire up on 10 m chasing F-layer DX. They

share the band with the short-path stations. Co-channel collisions raise

the local noise floor at the receiver and degrade decoded SNR even on

short paths that do not themselves use F-layer.

External noise from solar activity: during solar maximum the geomagnetic

environment is more disturbed, which raises the HF noise floor (galactic

and atmospheric component variations, ionospheric absorption events).

Short paths cannot escape this floor as easily as long paths can stack

transmit power against the same noise.

Multipath / fading on sporadic-E enhanced openings: at solar maximum

there are more sporadic- $E$ events on short paths, and weak

single-hop $E_s$ contacts decode at lower SNR than ground-wave contacts.

D-layer absorption: at solar maximum, increased D-layer ionization

raises HF absorption, especially in daytime. Short-path 10 m paths that

rely on near-vertical incidence skywave see higher absorption losses.

We cannot pick a single mechanism from this dataset. What we can say is

that the negative 10 m signal in #87 was not purely a measurement artifact

of who decided to log spots; the same pairs that decoded short-path

contacts at $-12$ dB during solar minimum decoded them at $-15$ dB during

solar maximum.

Result 5: positive bands behave better, with one new finding

Filtering to qualifying pairs strengthens the positive correlations on

40 m, 30 m, 20 m, 17 m, and 15 m, in some cases substantially:

Band	$r$ full	$r$ filtered	$r$ short	$r$ long
40 m	$+0.39$	$+0.45$	$+0.38$	$+0.38$
30 m	$+0.30$	$+0.43$	$+0.33$	$+0.66$
20 m	$+0.27$	$+0.49$	$+0.29$	$+0.66$
17 m	$+0.03$	$+0.32$	$+0.20$	$+0.46$
15 m	$+0.10$	$+0.35$	$-0.03$	$+0.61$

Long-path correlations are uniformly the strongest on these bands, which is

consistent with MUF physics: more cycle sensitivity is expected on the paths

that are MUF-limited. The 17 m and 15 m bands, which were not

Bonferroni-significant in their full-population form (#87), become strongly

positive once a fixed station population is enforced, particularly on

long paths. So the pair-filter not only resolves #87's 12 m anomaly, it also

recovers the expected positive ionospheric signal on 15 m and 17 m that the

raw-population analysis was unable to detect.

What does this mean for #87?

The original #87 paper proposed a single mechanism (population shift toward

weak long-path attempts at solar maximum) to explain both the 10 m and 12 m

anticorrelations. This workspace shows the situation is more nuanced:

12 m is consistent with #87's selection-bias story. Filtering to a

fixed pair population takes the correlation from $-0.44$ to $-0.12$ .

10 m is partly selection bias and partly a real, opposite-sign

short-path effect. Filtering takes the correlation from $-0.40$ only to

$-0.31$ , but the residual signal is itself the average of two

oppositely-signed populations: $-0.61$ on short paths and $+0.54$ on long

paths.

The methodological consequence for any future WSPR-based ionospheric study

on the TerraPulse platform: always condition on path geometry, not just

on station identity. Mean SNR aggregated across all path lengths is a

mixture of two physical regimes that respond to the solar cycle in opposite

directions on bands at the upper edge of the HF range.

Limitations

Pair classification by long-run average distance, not per-month

distance. A few qualifying pairs may have changed antennas or moved

antennas across the window; the average is robust to this.

No partition by hemisphere or time of day. Daytime vs nighttime 10 m

propagation is qualitatively different and would warrant a follow-up.

No mechanism resolution. We can show the short-path 10 m

anticorrelation is real, but we cannot distinguish among the four

candidate physical explanations (band crowding, geomagnetic noise,

multipath, D-layer absorption) with this dataset alone. A resolution

would need station-by-station noise-floor measurements aligned with the

geomagnetic record.

The "medium" path bin (1500-5000 km) is excluded from the short/long

contrast. We do not report it because it spans both regimes. About 28%

of qualifying pairs fall in this bin and are present in the

pair-filtered aggregate but not in either short or long.

References

WSPR solar cycle modulation paper (#87, this repository,

workspaces/wspr-solar-cycle-modulation/paper/paper.pdf)

WSPR 21-year census (workspaces/wspr-21year-census/)
Davies, Ionospheric Radio (1990)
Hunsucker and Hargreaves, The High-Latitude Ionosphere (2003)
Taylor, "The WSPR protocol," QST (Oct 2010)
Frissell et al., GRL 49, e2022GL097879 (2022)
WDC-SILSO, Royal Observatory of Belgium, https://www.sidc.be/SILSO/