First light for a thermosphere sensor made of falling satellites

Dek: Every decaying Starlink is a tiny barometer for the upper atmosphere. Pooled into one instrument and aimed at a moderate storm, they hint at the expected drag spike but cannot prove it against their own scatter. This is a working monitor's first reading, waiting for a storm big enough to speak.

Abstract

When a geomagnetic storm heats the upper atmosphere, the thermosphere expands, air density at satellite altitude rises, and low-orbiting spacecraft feel more drag. We tested whether that textbook chain is visible at fleet scale in the orbital decay of Starlink satellites recorded in TerraPulse. From 75 days of CelesTrak orbital elements (2026-03-27 to 2026-06-09, 10,591 Starlinks, 610,214 object-days) we built a per-object daily decay rate and aggregated the non-thrusting population into a single fleet-median time series.

This pilot is exploratory, not confirmatory. The storm we pre-registered to test, the 2026-03-22 event (Dst -101 nT), turned out to be unobservable: the Starlink catalog in TerraPulse begins 2026-03-27, five days after that storm, with no pre-storm baseline. We therefore re-pointed the test at the deepest storm fully inside our coverage, 2026-04-18 (Dst -88 nT).

Around that storm the fleet-median decay rate rose from 3.0 to 5.6 x 10^-6 rev/day^2, a factor of about 1.9. The rise is in the predicted direction and survives resampling of objects and a fixed-cohort survivorship check. It does not survive the tests that matter. The same positive shift appears on a quiet control date (+1.4 x 10^-6) and in maneuvering satellites that should not respond to drag (+1.7 x 10^-6), so the storm-specific excess is only about 1.2 x 10^-6, roughly 0.3 standard deviations of the instrument's background scatter. The storm response lands at the 85th percentile of quiet-day responses (z = 0.84, shuffle p = 0.19). A continuous correlation between daily decay rate and storm intensity is weakly positive in a linear test (Pearson r = 0.40) but is contradicted by the rank test (Spearman r = 0.14, p = 0.29) and is carried by a handful of storm days.

We report this as a non-detection with a working instrument. We could not detect the thermospheric drag response of a moderate storm against a 75-day baseline, and we show why: the one deep storm in the era is unobservable, the fleet-median is dominated by near-static shell satellites at the TLE noise floor, and the effective sample is about 20 to 27 independent days, not thousands of satellites. The method is sound and the standing monitor will gain power when a strong storm lands inside coverage.

1. Why ask this

The thermosphere is the only part of the climate system that breathes on a timescale of hours. A solar flare or a geomagnetic storm dumps energy into the upper atmosphere, it puffs outward, and the air that low-Earth-orbit satellites fly through gets denser. The effect is large and well documented. In February 2022 a moderate storm raised drag enough to pull roughly 38 freshly launched Starlinks back into the atmosphere before they reached their operating altitude.

Satellite drag has been used as a thermospheric density proxy for decades (Emmert and colleagues; the HASDM assimilation system). What is new is not the physics but the sample. There are now more than 10,000 Starlinks in a narrow altitude shell, each reporting orbital elements many times a day. In principle that is a dense, fleet-scale instrument for watching the upper atmosphere expand and contract. This pilot asks the most basic question that instrument has to answer before any of the ambitious versions are worth attempting: can it see a single, ordinary geomagnetic storm?

This is the first entry in TerraPulse's open, standing-monitor category of papers. It is designed to be re-run as data accrues, and to grow into a solar-cycle study once enough history exists. It connects to prior TerraPulse work on geomagnetic drivers (geomagnetic-storm-cascades, solar-geomagnetic-lag, bz-dst-predictive-lead-time) and reuses the Starlink catalog first ingested for starlink-nuforc-attribution.

2. Data

Starlink orbital elements (CelesTrak). Each satellite reports a mean motion, the number of orbits it completes per day. As an orbit decays the satellite drops, speeds up, and its mean motion rises. The rate of that rise, d(mean_motion)/dt, is the drag signal and therefore a thermospheric density proxy. We extracted every Starlink record from the TerraPulse celestrak source: 10,591 distinct satellites, 610,214 object-days, spanning 2026-03-27 to 2026-06-09 (75 days).

Space-weather drivers (daily). Dst index (storm intensity, the most negative daily excursion), DSCOVR solar-wind speed, SILSO sunspot number (a proxy for the slow solar-EUV heating baseline), and GOES X-ray flux (flare context).

The coverage problem, stated plainly. Our pre-registration locked the test to the 2026-03-22 storm (Dst -101 nT), the strongest event of the period. The feasibility note that preceded the pre-registration asserted that the Starlink history reached back to 2026-03-17. That was an error. The 03-17 floor belonged to the non-Starlink portion of the catalog; the Starlinks themselves first appear on 2026-03-27, five days after the storm. There is no pre-storm baseline and no onset coverage for the pre-registered event. It is unobservable in this dataset, and no amount of analysis recovers it.

We therefore re-pointed the test, by a rule we can state without ambiguity, at the deepest storm that falls fully inside the actual coverage window: 2026-04-18, Dst -88 nT. Because this choice was made after discovering the coverage error, everything downstream is exploratory rather than confirmatory, and we label it as such throughout. We note for honesty that the 04-18 event is not a clean impulse but a multi-day disturbed sequence (Dst of -88, -48, -57, -43 on 04-18 through 04-21), and that solar-wind speed was already elevated during the pre-storm baseline days. Both work against a sharp before-and-after contrast.

3. Method

Per-object decay rate. For each satellite we took the daily-median mean motion (to suppress the jitter between the several orbital-element updates per day) and estimated a daily decay rate as the slope of a centered seven-day (plus or minus three day) linear fit. Robust slopes, not single-pair differences, because two-line-element sets are individually noisy.

Clean-drag filter. Operational Starlinks raise their own orbits with ion thrusters, which masks or inverts the drag signal. We kept only satellites whose mean motion never stepped sharply downward (no orbit raise above 0.002 rev/day day-to-day), whose net change over the window was positive (decaying, not climbing), and which were tracked at least 20 days. That yielded 4,960 clean-drag objects against 5,434 maneuvering ones.

A caveat we surface rather than hide: this filter is leaky toward inertness. About 52 percent of the 4,960 "clean" objects moved by less than 0.001 rev/day over the whole window, meaning they are essentially station-kept shell satellites sitting at the noise floor, not vigorous decayers. Their presence in the fleet median is the single biggest reason that median is noisy. We carry a stricter "genuine decayer" subsample (net change above 0.005 rev/day, 2,291 objects) as a sensitivity check.

Fleet aggregate. The daily median decay rate over the clean population, with a 10 percent trimmed mean as a backup. Median rather than mean to resist the residual maneuver outliers.

Tests, pre-registered.

H1, storm impulse: superposed-epoch of the fleet-median rate on days -10 to +15 around 2026-04-18, with a pre-storm baseline (days -10 to -2) compared to a post-storm window (days 0 to +5). Significance against a shuffle null of random quiet epoch days, plus an object-resampling bootstrap.
H2, continuous driver: daily fleet rate against storm intensity (-Dst), sunspots, and solar-wind speed, at lags 0 to 2, Pearson and Spearman, with a partial correlation removing the sunspot baseline.
H3, altitude gradient: the storm response stratified by altitude (mean-motion band), expecting a larger response lower down.
H0, null: the fleet rate around the storm and across the window is within background variability, with no driver correlation that survives honest significance testing.

4. Results

4.1 H1: a rise in the right direction that does not clear the background

Around the 04-18 storm the fleet-median decay rate rose from a pre-storm baseline of 3.0 x 10^-6 to a post-storm 5.6 x 10^-6 rev/day^2, a factor of about 1.9. Resampling the clean objects gives a 95 percent bootstrap interval of [2.1, 3.1] x 10^-6 on the difference, which excludes zero. A fixed cohort of the 833 satellites present on every day of the window shows the same shift (6.1 to 8.6 x 10^-6), so survivorship is not creating the rise.

That is where the encouraging news ends. The bootstrap answers only "is the post-minus-pre difference stable across which objects we picked," not "did the storm cause it." The tests that ask the second question all fail:

Quiet control. Running the identical superposed-epoch on a quiet date, 2026-05-25, yields a +1.4 x 10^-6 "response." The instrument produces a positive shift of this size with no storm present.
Maneuvering objects. Satellites we excluded as thrusting, which should not show a drag response, show +1.7 x 10^-6.
Net storm excess. Subtracting the background drift, the storm-specific excess is about 1.2 x 10^-6, roughly 0.3 standard deviations of the instrument's day-to-day scatter (about 4 x 10^-6).
Shuffle null. Across the 20 quiet epoch days available in the window, the 04-18 response sits at the 85th percentile (z = 0.84, one-sided p = 0.19). Elevated, not significant.
Wrong timing. The decay rate does not peak when drag physics says it should. The superposed-epoch curve spikes at offset -3 (before the storm) and at offsets +4 and +5, while the storm day and the three days after it are among the lowest in the window. A clean drag impulse would peak within 0 to 2 days of the Dst minimum. Part of the -3 spike is the centered seven-day window already reaching forward into the storm, but the overall pattern is not the shape a storm response should have.

The honest reading of H1 is a non-detection. The fleet rate rose in the predicted direction, but a rise of this size is something the instrument does on quiet days too.

4.2 H2: a weak linear hint the rank test does not believe

Daily fleet decay rate against storm intensity (-Dst) at lag 0 gives Pearson r = 0.40. Taken at face value (n = 57 days) that is p = 0.002, and it barely moves when the sunspot baseline is partialled out (partial r = 0.40). Two corrections deflate it:

Autocorrelation. The seven-day rolling slope makes neighboring days share most of their input, so the daily series is serially correlated (lag-1 autocorrelation 0.36). The effective sample is about 27 independent days, not 57. With that correction the correlation is marginal: p of about 0.035 to 0.04, and a circular-shift null that preserves each series' own autocorrelation gives p = 0.035.
The rank test disagrees. Spearman's rank correlation is r = 0.14, p = 0.29, no monotone relationship. When Pearson is significant and Spearman is not, the linear result is being carried by a few high-leverage points, here the handful of storm days where both -Dst and decay rate are large together.

Across the nine correlation cells we computed (three drivers by three lags), a Bonferroni threshold is about 0.006, which the -Dst result does not clear. H2 is a weak, leverage-driven, non-robust hint, not a detected continuous driver.

4.3 H3: the altitude gradient is an artifact of degenerate binning

The clean population is concentrated in a razor-thin mean-motion shell, so quartile altitude bands are degenerate: two of the four bands span almost no altitude range at all. The apparent gradient (responses of 1, 3, 3, and 20 x 10^-6 from high to low altitude) is entirely the lowest band, which is simply the genuinely-decaying end-of-life tail with a baseline decay rate eight times the others. "Fast decayers show large changes in decay rate" is close to a tautology and is not evidence of a thermospheric density gradient. We do not claim H3.

4.4 Sensitivity: genuine decayers

Restricting to the 2,291 satellites with real net decay (net change above 0.005 rev/day) sharpens nothing decisive. The storm response is 3.1 x 10^-6 and the quiet-control response is 1.5 x 10^-6, a storm-minus-quiet excess of 1.7 x 10^-6, still comparable to the background scatter. Even the satellites that are unambiguously falling do not show a storm impulse that separates from the instrument's own wander.

5. Why this is a non-detection, in one place

Four things, none of them fatal to the method, jointly defeat detection in this window:

The deep storm is unobservable. The one event large enough to produce an unmistakable signal, the 2026-03-22 Dst -101 storm, predates the Starlink catalog by five days. We are left testing a moderate Dst -88 storm that is also a smeared multi-day sequence.
The fleet median is noise-dominated. More than half the "clean" population barely moves, so the aggregate sits near the two-line-element noise floor, which scatters by about 4 x 10^-6 per day, larger than the storm effect we are chasing.
The effective sample is days, not satellites. Thousands of satellites collapse into one daily series with about 20 to 27 independent points. The statistical power lives in the number of independent days, and 75 days holding one moderate storm is thin.
The storm shape is wrong for the test. A superposed-epoch design wants a sharp onset. This storm did not provide one.

6. Limitations

Exploratory, not confirmatory. The pre-registration was invalidated by a coverage error, and the substitute storm was chosen after that error came to light. The "deepest storm in coverage" rule is stated and defensible, but it is a post-coverage choice and the results inherit exploratory status.
One storm, one window. A single moderate event over 75 days cannot establish or rule out a population-level drag response.
No direct EUV index. We proxy solar heating with sunspot number; F10.7 and Kp are not ingested. This weakens the driver side but does not affect the null.
Altitude resolution. Mean motion is a coarse altitude proxy and the clean shell is narrow, so H3 was never well posed in this window.
TLE provenance. All decay rates derive from public two-line elements, whose per-element accuracy and update cadence we do not control.

7. What would change the verdict

This is a standing monitor, and the conditions for a real detection are concrete and likely to arrive:

A strong storm fully inside coverage. A Dst below about -150 with a clean onset, with at least ten days of Starlink data on either side, is the single thing most likely to convert this null into a detection. The CelesTrak archive accrues every day, so this is a matter of waiting and re-running.
A decay-weighted aggregate. Weighting the fleet statistic by decay magnitude, or restricting hard to vigorous decayers, would lift the signal off the noise floor.
Autocorrelation-aware inference as standard, so the continuous-driver test is honest from the start.
Solar-cycle horizon. With years of history, the slow expansion and contraction of the thermosphere across the solar cycle becomes the target, and eventually the decadal carbon-dioxide cooling and contraction of the upper atmosphere, the genuine climate signal. Both are explicitly out of reach now and are held as dream-tier seeds.

8. Conclusion

We built a fleet-scale thermospheric-drag instrument from thousands of decaying Starlinks and pointed it at the strongest geomagnetic storm available to us. It returned a hint in the predicted direction and nothing that survives rigorous testing. That is a real result, not a failed one. The instrument works, the pipeline is reproducible, and the reasons for the non-detection are understood and addressable. The open, standing-monitor category gets its first entry as an honest null with a working sensor, waiting for a storm large enough to speak above the noise.

Data and reproducibility

scripts/extract.py: pulls the Starlink daily panel and daily drivers from TerraPulse PostgreSQL.
scripts/analyze.py: clean-drag filter, decay rates, superposed-epoch, correlations, sensitivity. Writes data/results.json.
scripts/analyze_null.py: autocorrelation-corrected significance, genuine-decayer subsample, storm-versus-quiet percentile. Writes data/results_null.json.
Data products: data/starlink_daily.parquet, data/starlink_decay.parquet, data/drivers_daily.parquet.

References

Emmert, J. T. (2015). Thermospheric mass density: A review.
Picone, J. M. et al. HASDM and assimilative thermospheric density specification.
Hapgood, M. et al. (2022); Dang, T. et al. Analyses of the February 2022 Starlink storm loss.
Roble, R. G. and Dickinson, R. E. (1989); Emmert, J. T. (2021). Carbon-dioxide cooling and secular contraction of the upper atmosphere.
TerraPulse internal: geomagnetic-storm-cascades, solar-geomagnetic-lag, bz-dst-predictive-lead-time, starlink-nuforc-attribution.