Paper Machine Agent — Domain Knowledge Context
This file is the "brain" of the Paper Machine Agent (PMA). It teaches an AI agent how to conduct rigorous cross-domain environmental research on the TerraPulse platform. Load this file into agent context before any research task.
Platform: TerraPulse — 29M observations, 113 metrics, 14 scientific domains Database: PostgreSQL + PostGIS at localhost:5433
1. Data Access
Connection
Always use synchronous psycopg2 for data extraction scripts. The async SQLAlchemy engine is for the API server only — research scripts run in batch and psycopg2 is simpler and more reliable for one-off queries.
import psycopg2
import numpy as np
import polars as pl
from pathlib import Path
PG_DSN = "postgresql://terrapulse:terrapulse@localhost:5433/terrapulse"
DATA_DIR = Path(__file__).resolve().parent.parent / "data"
conn = psycopg2.connect(PG_DSN)
cur = conn.cursor()
Core Query Patterns
List all available metrics with counts:
cur.execute("""
SELECT metric, COUNT(*) AS n,
MIN(timestamp_utc) AS first_obs,
MAX(timestamp_utc) AS last_obs
FROM observations
WHERE value IS NOT NULL
GROUP BY metric
ORDER BY n DESC
""")
metrics = cur.fetchall()
for m, n, first, last in metrics:
print(f" {m:40s} {n:>10,} {first} → {last}")
Daily aggregation (the most common pattern):
cur.execute("""
SELECT timestamp_utc::date AS day,
COUNT(*) AS n,
AVG(CAST(value AS DOUBLE PRECISION)) AS avg_val,
MAX(CAST(value AS DOUBLE PRECISION)) AS max_val,
MIN(CAST(value AS DOUBLE PRECISION)) AS min_val
FROM observations
WHERE metric = %s AND value IS NOT NULL
GROUP BY 1
ORDER BY 1
""", (metric_name,))
rows = cur.fetchall()
df = pl.DataFrame({
"day": [r[0] for r in rows],
"n": [r[1] for r in rows],
"avg": [float(r[2]) for r in rows],
"max": [float(r[3]) for r in rows],
"min": [float(r[4]) for r in rows],
})
Hourly aggregation:
cur.execute("""
SELECT date_trunc('hour', timestamp_utc) AS hour,
COUNT(*) AS n,
AVG(CAST(value AS DOUBLE PRECISION)) AS avg_val
FROM observations
WHERE metric = %s AND value IS NOT NULL
AND timestamp_utc >= %s AND timestamp_utc < %s
GROUP BY 1
ORDER BY 1
""", (metric_name, start_utc, end_utc))
Earthquake deduplication (REQUIRED before any earthquake analysis):
Earthquake data contains duplicates from overlapping USGS feed windows. Always deduplicate by rounding to the nearest minute and grouping by lat/lon:
cur.execute("""
SELECT DISTINCT ON (
date_trunc('minute', timestamp_utc),
ROUND(CAST(latitude AS NUMERIC), 2),
ROUND(CAST(longitude AS NUMERIC), 2)
)
timestamp_utc,
CAST(value AS DOUBLE PRECISION) AS magnitude,
latitude,
longitude
FROM observations
WHERE metric = 'earthquake_magnitude' AND value IS NOT NULL
ORDER BY date_trunc('minute', timestamp_utc),
ROUND(CAST(latitude AS NUMERIC), 2),
ROUND(CAST(longitude AS NUMERIC), 2),
timestamp_utc
""")
eq_rows = cur.fetchall()
Data Guards
These guards prevent runaway queries, bad data, and out-of-memory errors:
LIMIT 1M rows — Never fetch more than 1 million rows in a single query without explicit justification. For larger datasets, aggregate server-side.
Filter nulls — Always include
WHERE value IS NOT NULLunless you are specifically studying data completeness.Filter > 0 for continuous metrics — Many continuous metrics (streamflow, radiation, wave height) have physical meaning only when positive. Zero values often indicate sensor off or no-data:
WHERE metric = 'streamflow_00060' AND value IS NOT NULL AND CAST(value AS DOUBLE PRECISION) > 0Subsample > 100K — When fitting distributions or running bootstrap tests on datasets larger than 100,000 rows, subsample first:
if len(values) > 100_000: values = np.random.default_rng(42).choice(values, 100_000, replace=False) print(f"Subsampled to 100,000 for fitting (seed=42)")Close connections — Always close cursor and connection:
cur.close() conn.close()
Key Metrics Reference Table
| Domain | Metric Name | Records | Notes |
|---|---|---|---|
| Solar / Space Weather | solar_xray_flux |
9.2M | GOES X-ray flux (5-min cadence) |
solar_wind_speed |
2.8M | DSCOVR solar wind (km/s) | |
solar_wind_bz |
2.6M | IMF Bz component (nT, southward=negative) | |
space_kp_index |
160K | Planetary K-index (0-9 scale) | |
space_solar_flux_10cm |
84K | F10.7 solar radio flux | |
sunspot_number |
76K | Daily sunspot number (1818-2026) | |
donki_cme_speed |
18K | CME speed from NASA DONKI (km/s) | |
donki_flare_class |
2.5K | Solar flare class (C/M/X encoded) | |
donki_storm_kp |
788 | Geomagnetic storm peak Kp | |
donki_high_speed_stream |
580 | High-speed solar wind streams | |
| Seismology | earthquake_magnitude |
175K | USGS earthquakes (2021-2026, global) |
| Severe Weather (NWS) | nws_* (40+ types) |
471K | NWS alerts by type (3-day rolling window) |
| Severe Weather (SPC) | spc_tornado |
70K | Historical tornadoes (1950-2023) |
spc_hail |
29K | Hail reports | |
spc_wind |
7.7K | Wind damage reports | |
| Meteorology | temperature_2m |
56K | Open-Meteo temperature (10 cities) |
geosphere_temperature |
9.6K | GeoSphere Austria stations | |
nasa_power_t2m |
11K | NASA POWER reanalysis temperature | |
| Air Quality | us_aqi |
55K | US AQI from Open-Meteo |
geosphere_pm25 |
7.4K | PM2.5 from GeoSphere Austria | |
| Hydrology / Oceanography | streamflow_00060 |
2.4M | USGS streamflow (cfs, 15-min) |
water_level |
247K | NOAA tide gauge water levels | |
wave_height |
22K | Open-Meteo wave height (m) | |
river_discharge |
1.5K | Open-Meteo river discharge | |
| Radiation | radiation |
1.4M | Safecast citizen science (CPM) |
| Volcanology | volcanic_eruption |
5K | Smithsonian GVP eruptions |
volcano_alert |
5.9K | USGS volcano alerts | |
holocene_volcano |
1.2K | Holocene volcano catalog | |
| Near-Earth Objects | neo_close_approach |
25K | NASA NEO close approaches |
neo_fireball |
541 | CNEOS fireball database (1998-2026) | |
| Gravitational Waves | gw_merger |
368 | LIGO/Virgo/KAGRA confirmed mergers |
gw_observing_run |
2.1K | Detector observing run metadata | |
| Optical Transients | ztf_sn_candidate |
1.4K | ZTF supernova candidates |
ztf_kilonova_candidate |
395 | ZTF kilonova candidates | |
| Climate Indices | climate_oni |
1.8K | Oceanic Nino Index (1949-2025) |
climate_mei_v2 |
1.1K | Multivariate ENSO Index (1979-2025) | |
drought_area_pct |
13K | USDM drought area % by state | |
| Atmospheric Composition | co2_daily |
16K | Daily CO2 concentration |
ch4_monthly |
509 | Monthly methane concentration | |
| Satellite | sar_scene |
7.6K | Sentinel-1 SAR scene catalog |
| Socioeconomic | wb_SP.POP.TOTL |
3.3K | World Bank population |
wb_AG.LND.FRST.ZS |
2.8K | World Bank forest area % | |
wb_EG.USE.PCAP.KG.OE |
2.8K | World Bank energy use per capita | |
| Geomagnetic (NEW) | dst_index |
400+ | Dst ring current (1963-present, hourly, CDAWeb HAPI) |
emsc_magnitude |
~340/day | EMSC seismicity (M0.8+ Europe/Middle East) | |
gfz_magnitude |
~5/hr | GFZ GEOFON (M2+ global, focal mechanisms) | |
isc_magnitude |
~27/6hr | ISC aggregated (496 agencies worldwide) | |
mag_field_* |
varies | INTERMAGNET observatories (X/Y/Z/F nT, hourly) | |
magnetic_pole_north |
435 | North magnetic pole lat/lon (1590-2025) | |
magnetic_pole_south |
435 | South magnetic pole lat/lon (1590-2025) | |
| Ionosphere (NEW) | wspr_snr_* (18 bands) |
~18/hr | WSPR propagation (mean SNR per band per hour) |
ham_band_* (4 bands) |
~4/3hr | HamQSL band conditions (Good/Fair/Poor/Closed) | |
ham_kindex, ham_solarflux, etc. |
~7/3hr | HamQSL solar/geomagnetic indices | |
| UAP | ufo_sighting |
80K | NUFORC sighting reports (1906-2014) |
Data Indices (NEW)
Two pre-built indices enable cross-domain queries without touching PostgreSQL:
Timeline Index (data/timeline.parquet): 8,128 rows (one per day, 2004-2026), 23 columns. Query any date range to get earthquake counts, Kp, solar wind, X-ray class, WSPR SNR, NWS alerts, fireballs, radiation, sunspots, volcanoes — all in one row. Refreshed hourly.
GeoIndex (data/geoindex.parquet): 9,584 cells at 1° resolution. Query any lat/lon bounding box to find what data exists there and which metrics are available. Useful for spatial context of events.
WSPR Denoised Data (NEW)
For any WSPR study, use the denoised datasets on /mnt/ursa:
corridors_denoised.parquet— 3.3M rows, 6 geographic corridors, hourly, noise-correcteddaily_denoised.parquet— 109K rows, 7 HF bands, 21 years, noise-correctednoise_baseline.parquet— median noise per band per hour-of-day
Key columns added by denoising: adjusted_snr (noise-corrected), signal_level (receiver-independent), noise_anomaly (deviation from baseline).
2. Statistical Methods
Every method below includes production-ready Python code tested on TerraPulse
data. Copy these directly into analyze.py scripts.
2.1 Correlation (Pearson + Spearman)
Always report BOTH Pearson (linear) and Spearman (monotonic rank) correlations. Disagreement between them indicates nonlinearity.
from scipy import stats as sp_stats
def correlate(x: np.ndarray, y: np.ndarray, label: str = "") -> dict:
"""Compute Pearson and Spearman correlations with sample size.
Always report both. Disagreement flags nonlinearity.
"""
# Drop pairs where either is NaN
mask = ~(np.isnan(x) | np.isnan(y))
x_clean, y_clean = x[mask], y[mask]
n = len(x_clean)
if n < 10:
print(f" {label}: N={n} — TOO FEW for correlation")
return {"n": n, "pearson_r": None, "spearman_rho": None}
r_pearson, p_pearson = sp_stats.pearsonr(x_clean, y_clean)
rho_spearman, p_spearman = sp_stats.spearmanr(x_clean, y_clean)
print(f" {label}: N={n}")
print(f" Pearson r={r_pearson:.4f}, p={p_pearson:.6f}, R²={r_pearson**2:.4f}")
print(f" Spearman ρ={rho_spearman:.4f}, p={p_spearman:.6f}")
return {
"n": n,
"pearson_r": round(float(r_pearson), 4),
"pearson_p": round(float(p_pearson), 6),
"pearson_r_squared": round(float(r_pearson**2), 4),
"spearman_rho": round(float(rho_spearman), 4),
"spearman_p": round(float(p_spearman), 6),
"label": label,
}
2.2 Schuster Test (Phase Analysis)
The classical test for periodic triggering (e.g., tidal forcing of earthquakes). Projects event phases onto a unit circle — if events cluster at a preferred phase, the resultant vector is long.
def schuster_test(phases: np.ndarray, label: str = "") -> dict:
"""Schuster test for phase clustering.
phases: array of phase values in [0, 1) where 0=start, 1=full cycle.
Under the null (uniform distribution), D²/N ~ exponential(1).
Returns:
D²: squared resultant length
p: exp(-D²/N) — probability of this alignment by chance
R: mean resultant length (0=uniform, 1=perfect alignment)
mean_phase: the preferred phase (if significant)
"""
N = len(phases)
cos_sum = np.sum(np.cos(2 * np.pi * phases))
sin_sum = np.sum(np.sin(2 * np.pi * phases))
D2 = cos_sum**2 + sin_sum**2
p = np.exp(-D2 / N)
R = np.sqrt(D2) / N
mean_phase = np.arctan2(sin_sum, cos_sum) / (2 * np.pi) % 1.0
print(f" Schuster test ({label}): N={N:,}")
print(f" D² = {D2:.1f}")
print(f" p = exp(-D²/N) = {p:.6e}")
print(f" R = {R:.6f} (mean resultant length)")
print(f" Mean phase = {mean_phase:.4f}")
print(f" {'SIGNIFICANT' if p < 0.05 else 'Not significant'}")
return {
"n": N,
"D_squared": round(float(D2), 2),
"p_value": float(p),
"R": round(float(R), 6),
"mean_phase": round(float(mean_phase), 4),
"significant": p < 0.05,
"label": label,
}
2.3 Granger Causality (MLE with F-test)
Tests whether lagged values of X improve prediction of Y beyond Y's own history. This is "predictive causality" — not true causation.
def granger_causality_test(x: np.ndarray, y: np.ndarray, max_lag: int = 7) -> dict:
"""Granger causality F-test: does X help predict Y?
Compares:
Restricted: Y_t = a0 + sum(a_i * Y_{t-i}) + e
Unrestricted: Y_t = a0 + sum(a_i * Y_{t-i}) + sum(b_i * X_{t-i}) + e
Tests lags 1 through max_lag, returns the best (lowest p-value).
"""
best = {"lag": 1, "f_stat": 0.0, "p_value": 1.0}
for lag in range(1, max_lag + 1):
n = len(y) - lag
if n < 2 * lag + 10: # Need sufficient degrees of freedom
continue
# Dependent variable
Y = y[lag:]
# Restricted model: Y on its own lags + intercept
Y_lags = np.column_stack([y[lag - i - 1:len(y) - i - 1] for i in range(lag)])
Y_lags = np.column_stack([np.ones(n), Y_lags])
try:
beta_r, _, _, _ = np.linalg.lstsq(Y_lags, Y, rcond=None)
rss_r = np.sum((Y - Y_lags @ beta_r) ** 2)
except np.linalg.LinAlgError:
continue
# Unrestricted model: Y on Y-lags + X-lags
X_lags = np.column_stack([x[lag - i - 1:len(x) - i - 1] for i in range(lag)])
Z = np.column_stack([Y_lags, X_lags])
try:
beta_u, _, _, _ = np.linalg.lstsq(Z, Y, rcond=None)
rss_u = np.sum((Y - Z @ beta_u) ** 2)
except np.linalg.LinAlgError:
continue
# F-test
df1 = lag # additional parameters in unrestricted
df2 = n - 2 * lag - 1 # residual DoF
if df2 <= 0 or rss_u <= 0:
continue
f_stat = ((rss_r - rss_u) / df1) / (rss_u / df2)
p_value = 1 - sp_stats.f.cdf(f_stat, df1, df2)
if p_value < best["p_value"]:
best = {"lag": lag, "f_stat": float(f_stat), "p_value": float(p_value)}
return best
2.4 Superposed Epoch Analysis
Stack multiple events on a common timeline to detect systematic pre- and post-event patterns. The gold standard for "does X respond to Y events?"
def superposed_epoch(event_times: list, series_times: np.ndarray,
series_values: np.ndarray, window_days: int = 7,
label: str = "") -> dict:
"""Superposed epoch analysis with statistical tests.
event_times: list of datetime objects (the "key dates")
series_times: array of datetime objects for the response variable
series_values: array of float values for the response variable
window_days: half-window size (±window_days around each event)
Stacks all events and compares each epoch day to the pre-event baseline.
"""
from datetime import timedelta
# Convert to days relative to each event
epoch_matrix = []
for event_t in event_times:
offsets = []
vals = []
for t, v in zip(series_times, series_values):
delta = (t - event_t).total_seconds() / 86400.0
if -window_days <= delta <= window_days:
offsets.append(round(delta))
vals.append(v)
if len(offsets) > 0:
day_dict = {}
for d, v in zip(offsets, vals):
day_dict.setdefault(d, []).append(v)
# Average within each day for this event
epoch = {d: np.mean(vs) for d, vs in day_dict.items()}
epoch_matrix.append(epoch)
if len(epoch_matrix) < 3:
print(f" SEA ({label}): Only {len(epoch_matrix)} events stacked — too few")
return {"n_events": len(epoch_matrix), "significant": False}
# Build aligned matrix: columns = epoch days, rows = events
days = list(range(-window_days, window_days + 1))
matrix = np.full((len(epoch_matrix), len(days)), np.nan)
for i, epoch in enumerate(epoch_matrix):
for j, d in enumerate(days):
if d in epoch:
matrix[i, j] = epoch[d]
# Epoch averages
epoch_means = np.nanmean(matrix, axis=0)
epoch_stds = np.nanstd(matrix, axis=0)
# Baseline: average of pre-event days [-window_days, -1]
pre_cols = [j for j, d in enumerate(days) if -window_days <= d < 0]
pre_values = matrix[:, pre_cols].flatten()
pre_values = pre_values[~np.isnan(pre_values)]
baseline_mean = np.mean(pre_values)
baseline_std = np.std(pre_values)
# Test each post-event day against baseline
results_by_day = {}
for j, d in enumerate(days):
if d < 0:
continue
col = matrix[:, j]
col_clean = col[~np.isnan(col)]
if len(col_clean) < 3:
continue
t_stat, t_p = sp_stats.ttest_1samp(col_clean, baseline_mean)
u_stat, u_p = sp_stats.mannwhitneyu(col_clean, pre_values,
alternative='two-sided')
results_by_day[d] = {
"mean": round(float(np.mean(col_clean)), 4),
"n": len(col_clean),
"t_stat": round(float(t_stat), 3),
"t_p": round(float(t_p), 6),
"mann_whitney_p": round(float(u_p), 6),
}
print(f" SEA ({label}): {len(epoch_matrix)} events stacked, ±{window_days} days")
print(f" Baseline (pre-event): mean={baseline_mean:.4f}, std={baseline_std:.4f}")
for d, r in sorted(results_by_day.items()):
sig = "***" if r["t_p"] < 0.001 else "**" if r["t_p"] < 0.01 else "*" if r["t_p"] < 0.05 else ""
print(f" Day {d:+d}: mean={r['mean']:.4f} t={r['t_stat']:.2f} p={r['t_p']:.4f} {sig}")
return {
"n_events": len(epoch_matrix),
"baseline_mean": round(float(baseline_mean), 4),
"baseline_std": round(float(baseline_std), 4),
"epoch_means": {d: round(float(m), 4) for d, m in zip(days, epoch_means)},
"post_event_tests": results_by_day,
"label": label,
}
2.5 Distribution Fitting (Clauset Power Law + Alternatives)
Following Clauset et al. (2009): estimate x_min via KS minimization, fit power law + exponential + lognormal + Weibull on the same tail, compare with Vuong likelihood ratio test, and validate with bootstrap GOF.
def fit_power_law(values: np.ndarray, bootstrap_n: int = 500,
label: str = "") -> dict:
"""Clauset et al. (2009) power law fitting with alternatives.
1. Estimate x_min by minimizing KS statistic
2. MLE fit for power law, exponential, lognormal, Weibull
3. Vuong pairwise comparison
4. Bootstrap goodness-of-fit (p < 0.1 = reject power law)
Returns dict with all fit parameters and the best model.
"""
from scipy.special import erfc
values = values[values > 0] # Power law requires positive values
# --- Step 1: Estimate x_min ---
unique_vals = np.sort(np.unique(values))
if len(unique_vals) > 500:
candidates = unique_vals[np.linspace(0, len(unique_vals) - 2, 200).astype(int)]
else:
candidates = unique_vals[:-1]
best_xmin, best_alpha, best_ks = candidates[0], 2.0, 1.0
for xm in candidates:
tail = values[values >= xm]
n = len(tail)
if n < 10:
continue
alpha = 1 + n / np.sum(np.log(tail / xm))
if alpha <= 1:
continue
# KS statistic
tail_sorted = np.sort(tail)
cdf_theory = 1 - (tail_sorted / xm) ** (1 - alpha)
cdf_empirical = np.arange(1, n + 1) / n
ks = np.max(np.abs(cdf_empirical - cdf_theory))
if ks < best_ks:
best_ks, best_xmin, best_alpha = ks, xm, alpha
tail = values[values >= best_xmin]
n_tail = len(tail)
# --- Step 2: Fit alternatives on the same tail ---
# Exponential MLE: lambda = 1 / mean(x - x_min)
shifted = tail - best_xmin
shifted = shifted[shifted > 0]
exp_lambda = 1.0 / np.mean(shifted) if len(shifted) > 0 else 1.0
exp_ks = sp_stats.kstest(shifted, 'expon', args=(0, 1/exp_lambda)).statistic
# Lognormal MLE
log_tail = np.log(tail)
ln_mu, ln_sigma = np.mean(log_tail), np.std(log_tail)
ln_ks = sp_stats.kstest(tail, 'lognorm', args=(ln_sigma, 0, np.exp(ln_mu))).statistic
# Weibull MLE
try:
wb_params = sp_stats.weibull_min.fit(tail, floc=0)
wb_k = wb_params[0]
wb_ks = sp_stats.kstest(tail, 'weibull_min', args=wb_params).statistic
except Exception:
wb_k, wb_ks = None, 1.0
# --- Step 3: Vuong likelihood ratio tests ---
def vuong(loglik_ratios):
n = len(loglik_ratios)
if n < 10:
return 0.0, 1.0, "inconclusive"
R = np.sum(loglik_ratios)
sigma = np.std(loglik_ratios, ddof=1)
if sigma == 0:
return R, 1.0, "inconclusive"
stat = R / (sigma * np.sqrt(n))
p = float(erfc(abs(stat) / np.sqrt(2)))
if p > 0.1:
return float(R), p, "inconclusive"
return float(R), p, ("power_law" if R > 0 else "alternative")
# Power law log-likelihoods per point
pl_ll = (np.log(best_alpha - 1) - np.log(best_xmin)
- best_alpha * np.log(tail / best_xmin))
exp_ll = np.log(exp_lambda) - exp_lambda * (tail - best_xmin)
ln_ll = sp_stats.lognorm.logpdf(tail, ln_sigma, 0, np.exp(ln_mu))
comparisons = {}
R_exp, p_exp, fav_exp = vuong(pl_ll - exp_ll)
comparisons["vs_exponential"] = {"R": round(R_exp, 2), "p": round(p_exp, 4), "favors": fav_exp}
R_ln, p_ln, fav_ln = vuong(pl_ll - ln_ll)
comparisons["vs_lognormal"] = {"R": round(R_ln, 2), "p": round(p_ln, 4), "favors": fav_ln}
# --- Step 4: Bootstrap GOF ---
observed_ks = best_ks
frac_tail = n_tail / len(values)
count_ge = 0
rng = np.random.default_rng(42)
for _ in range(bootstrap_n):
n_synth_tail = rng.binomial(len(values), frac_tail)
if n_synth_tail < 10:
continue
synth_tail = best_xmin * (1 - rng.random(n_synth_tail)) ** (-1 / (best_alpha - 1))
synth_alpha = 1 + n_synth_tail / np.sum(np.log(synth_tail / best_xmin))
synth_sorted = np.sort(synth_tail)
cdf_t = 1 - (synth_sorted / best_xmin) ** (1 - synth_alpha)
cdf_e = np.arange(1, n_synth_tail + 1) / n_synth_tail
synth_ks = np.max(np.abs(cdf_e - cdf_t))
if synth_ks >= observed_ks:
count_ge += 1
bootstrap_p = count_ge / bootstrap_n
# --- Determine best fit ---
fits = {"power_law": best_ks, "exponential": exp_ks, "lognormal": ln_ks}
if wb_k is not None:
fits["weibull"] = wb_ks
best_fit = min(fits, key=fits.get)
if bootstrap_p < 0.1:
best_fit = min({k: v for k, v in fits.items() if k != "power_law"},
key=lambda k: fits[k], default=best_fit)
print(f" Distribution fit ({label}): N={len(values):,}, tail N={n_tail:,}")
print(f" x_min = {best_xmin}")
print(f" Power Law: alpha={best_alpha:.3f}, KS={best_ks:.4f}, bootstrap_p={bootstrap_p:.3f}")
print(f" Exponential: lambda={exp_lambda:.4f}, KS={exp_ks:.4f}")
print(f" Lognormal: mu={ln_mu:.3f}, sigma={ln_sigma:.3f}, KS={ln_ks:.4f}")
if wb_k is not None:
print(f" Weibull: k={wb_k:.3f}, KS={wb_ks:.4f}")
print(f" Vuong tests: {comparisons}")
print(f" >>> BEST FIT: {best_fit}")
return {
"n": len(values),
"n_tail": n_tail,
"x_min": float(best_xmin),
"power_law": {"alpha": round(best_alpha, 3), "ks": round(best_ks, 4),
"bootstrap_p": round(bootstrap_p, 3)},
"exponential": {"lambda": round(exp_lambda, 4), "ks": round(exp_ks, 4)},
"lognormal": {"mu": round(ln_mu, 3), "sigma": round(ln_sigma, 3),
"ks": round(ln_ks, 4)},
"weibull": {"k": round(wb_k, 3) if wb_k else None,
"ks": round(wb_ks, 4) if wb_k else None},
"comparisons": comparisons,
"best_fit": best_fit,
"label": label,
}
2.6 Multiple Comparisons (Bonferroni Correction)
When testing more than 3 pairs, apply Bonferroni correction to control the family-wise error rate. This is non-negotiable.
def bonferroni_threshold(n_pairs: int, alpha: float = 0.05) -> float:
"""Compute Bonferroni-corrected significance threshold.
With 72 pairs, the threshold becomes 0.05/72 = 0.000694.
A result at p=0.01 that "looks significant" is NOT after Bonferroni.
"""
threshold = alpha / n_pairs
print(f" Bonferroni: {n_pairs} tests, alpha={alpha} → threshold={threshold:.6f}")
return threshold
def apply_bonferroni(results: list[dict], alpha: float = 0.05) -> list[dict]:
"""Mark results as surviving or not surviving Bonferroni correction.
Each result dict must have a 'p_value' key.
"""
n = len(results)
threshold = bonferroni_threshold(n, alpha)
surviving = 0
for r in results:
r["bonferroni_threshold"] = threshold
r["survives_bonferroni"] = r["p_value"] < threshold
if r["survives_bonferroni"]:
surviving += 1
print(f" {surviving}/{n} results survive Bonferroni correction")
return results
2.7 Cross-Correlation with Lags
Compute correlation at multiple time lags to detect delayed coupling.
def cross_correlation(x: np.ndarray, y: np.ndarray,
max_lag: int = 24, label: str = "") -> dict:
"""Cross-correlation between x and y at lags -max_lag to +max_lag.
Positive lag means x leads y (x at time t correlates with y at time t+lag).
Reports Pearson r and p-value at each lag.
"""
results = []
for lag in range(-max_lag, max_lag + 1):
if lag > 0:
x_shifted = x[:-lag]
y_shifted = y[lag:]
elif lag < 0:
x_shifted = x[-lag:]
y_shifted = y[:lag]
else:
x_shifted = x
y_shifted = y
# Drop NaNs
mask = ~(np.isnan(x_shifted) | np.isnan(y_shifted))
x_c, y_c = x_shifted[mask], y_shifted[mask]
n = len(x_c)
if n < 10:
continue
r, p = sp_stats.pearsonr(x_c, y_c)
results.append({
"lag": lag,
"r": round(float(r), 4),
"p": round(float(p), 6),
"n": n,
})
# Find peak
if results:
best = max(results, key=lambda r: abs(r["r"]))
print(f" Cross-correlation ({label}):")
print(f" Peak: lag={best['lag']}, r={best['r']:.4f}, p={best['p']:.6f}, N={best['n']}")
return {
"lags": results,
"peak_lag": best["lag"] if results else None,
"peak_r": best["r"] if results else None,
"peak_p": best["p"] if results else None,
"label": label,
}
3. Quality Standards
These standards are NON-NEGOTIABLE. They encode hard-won lessons from 30 research workspaces and 10 published papers. Every one of these rules exists because we violated it and had to correct the record.
3.0 Anti-Sycophancy Protocol
When the data says no, SAY NO. Do not soften, hedge, reframe, or find silver linings in null results. Do not agree with dramatic claims from the internet without cross-checking against our data first.
Known cases where enthusiasm outran the numbers:
- Fireball "158x anomaly" → duplicate ingestion bug. Actual March 2026: 3 fireballs (normal).
- AMS "mysterious surge" → CNEOS shows 4 in Q1, exactly the 4.3/quarter baseline.
- X1.5 CME "will produce Kp storm" → CME missed Earth entirely. Most CMEs miss.
- Solar wind r=0.54 → V2 with more data: r=0.09. The effect was 6x smaller than V1 suggested.
Rules:
- Check the data BEFORE agreeing with any claim.
- "That's noise" is a valid and useful response.
- Correct your own errors immediately — do not wait to be asked.
- A null result reported clearly is worth more than a positive result reported vaguely.
- If you catch yourself saying "interesting!" about something that might be an artifact, stop and verify.
3.1 Honest Nulls
A null result IS a finding. Report it prominently, not buried in an appendix. Some of our best papers are null results:
- Solar-terrestrial forcing: "The solar cycle does not modulate geophysical event rates" (205 years of data, zero significant correlations)
- NEO-gravitational: "Gravitational wave events do not correlate with near-Earth object trajectories" (clean null)
- Earthquake tidal forcing: Schuster test p=0.28 — earthquakes do NOT cluster by lunar phase (despite 171K events)
If your analysis finds nothing, SAY SO CLEARLY. Do not hedge with "further investigation needed" when the data is conclusive. A null with N=100,000 is more valuable than a positive with N=50.
3.2 V1 to V2 Discipline
Every surprising positive result must be verified with more data.
The canonical cautionary tale: Solar wind speed vs earthquakes showed r=0.54 (p<0.00001) with 74 hours of data (V1). With 1,800 hours of DSCOVR archive data (V2), the correlation dropped to r=0.09 — still statistically significant but explaining less than 1% of variance.
Rule of thumb: r > 0.5 with N < 500 is a flag for verification, not a finding. Do not publish V1 results without attempting V2 replication.
The V1-to-V2 pipeline:
- V1: Initial analysis on available data. Flag any strong correlations.
- Seek more data: Backfill archives, extend time windows, add sources.
- V2: Re-run the identical analysis on the expanded dataset.
- Report BOTH V1 and V2. The shrinkage itself is scientifically informative.
3.3 Effect Sizes
Always report effect size alongside p-values. A p-value alone is meaningless — with enough data, any nonzero effect becomes "significant."
Required effect size metrics:
- r or rho for correlations (with R² = variance explained)
- Cohen's d for group comparisons:
d = (mean1 - mean2) / pooled_std - Ratio for rate comparisons: "high-Kp days average 1.76x more earthquakes"
def cohens_d(group1: np.ndarray, group2: np.ndarray) -> float:
"""Cohen's d effect size for two independent samples."""
n1, n2 = len(group1), len(group2)
var1, var2 = np.var(group1, ddof=1), np.var(group2, ddof=1)
pooled_std = np.sqrt(((n1 - 1) * var1 + (n2 - 1) * var2) / (n1 + n2 - 2))
if pooled_std == 0:
return 0.0
d = (np.mean(group1) - np.mean(group2)) / pooled_std
return round(float(d), 4)
Interpretation: |d| < 0.2 = negligible, 0.2-0.5 = small, 0.5-0.8 = medium, > 0.8 = large.
3.4 Sample Sizes
State N for every statistical test. No exceptions. Every correlation, every t-test, every Schuster test — the reader must see N alongside the result.
Pattern:
Solar wind vs earthquakes: r=0.09, p=0.0001, N=1,800 hours
Schuster test (all M≥4): p=0.28, N=171,852 earthquakes
Granger F-test: F=4.32, p=0.003, N=181 hours, lag=2h
3.5 Bootstrap Before Power Law
When fitting power laws, ALWAYS run the bootstrap goodness-of-fit test. If bootstrap p < 0.1, reject the power law hypothesis — the data is better described by an alternative distribution.
We learned this from the universal-scaling-laws workspace: earthquake magnitudes looked like a power law until bootstrap rejected it (p=0.03). The true distribution was lognormal.
3.6 Deduplicate Earthquakes
USGS earthquake feeds have overlapping windows that produce duplicate records. Before ANY earthquake analysis, run the deduplication query from Section 1. Failure to deduplicate inflates N and biases clustering tests.
3.7 At Least One Null Hypothesis Per Paper
Every paper must explicitly state and test at least one null hypothesis. "We find no evidence for X" is a valid and valuable conclusion. The null should be the default expectation, and the burden of proof is on the alternative.
3.8 Bonferroni for > 3 Pairs
When testing more than 3 pairwise comparisons, apply Bonferroni correction. In the Granger causality network (72 pairs), only 15 of 37 nominal significances survived Bonferroni — more than half were false positives.
Report both: "37 significant at p<0.05, 15 survive Bonferroni (p<0.00069)."
3.9 Pre-Flight Checklist (Lessons from Mike's Reviews)
These items are caught by the editor on EVERY paper. Do them BEFORE submitting.
Sensitivity analysis. For any threshold or window parameter (e.g., 3σ, 90-day), report results at ±1 level (2.5σ/3.5σ, 60-day/180-day). A one-row table showing "results are robust" prevents a revision round.
Cite prior TerraPulse workspaces. Run
ls workspaces/and check if any prior workspace studied related metrics. Ifsolar-geomagnetic-lagmeasured Kp delay and you're measuring Kp delay, cite it and compare numbers.Report Mann-Whitney alongside t-test. For ANY group comparison (SEA post vs pre, high-Kp vs low-Kp), report both parametric (t-test) and non-parametric (Mann-Whitney). If they disagree, the non-parametric result is more reliable and that disagreement is itself a finding (indicates non-normality or outlier influence).
Table notation. Never use
d=for anything other than Cohen's d. Usepeak=for peak values,Δ=for changes,r=for correlations. Mixed notation in a single column header guarantees an editorial flag.Explain count discrepancies. If the issue says 66 events but the paper analyzes 32, explain why in the first paragraph. "Of the 66 events, 32 were available in the exported catalog" prevents confusion.
Data availability statement. Include as a bold paragraph after Conclusion, not buried in text:
\textbf{Data availability.} All data and scripts at \texttt{workspaces/{slug}/}.Check the math. Before submission, verify that every number in the paper matches
results.json. The editor cross-checks and WILL catch mismatches.
4. Paper Template
revtex4-2 LaTeX Skeleton
All TerraPulse papers use this exact template. Do not modify the document class, author block, or affiliation.
\documentclass[twocolumn,prl,aps,showpacs]{revtex4-2}
\usepackage{amsmath,amssymb}
\usepackage{graphicx}
\usepackage{hyperref}
\usepackage{booktabs}
\begin{document}
\title{Your Title Here}
\author{M.~Isenbek}
\email{michael@terrapulse.info}
\author{E.~Isenbek}
\author{B.~Isenbek}
\affiliation{TerraPulse Climate Intelligence Platform, \url{https://terrapulse.info}}
\date{\today}
\begin{abstract}
One paragraph. State the question, the data, the method, and the key finding.
Include the most important numbers (N, r, p). End with the implication.
\end{abstract}
\maketitle
\section{Introduction}
Why does this question matter? What is the prior literature? What gap does
this paper fill? 2-3 paragraphs.
\section{Data}
Where does the data come from? What is the temporal and spatial coverage?
Include a table summarizing each data source with record counts and date ranges.
Always cite TerraPulse: \url{https://terrapulse.info}.
\begin{table}[h]
\caption{Data sources and coverage.}
\label{tab:data}
\begin{ruledtabular}
\begin{tabular}{lrr}
\textbf{Source} & \textbf{$N$} & \textbf{Span} \\
\colrule
Source 1 & 100,000 & 2021--2026 \\
Source 2 & 50,000 & 2025--2026 \\
\end{tabular}
\end{ruledtabular}
\end{table}
\section{Method}
What statistical tests are you running? Define them mathematically.
Reference the standard citations (e.g., Clauset et al. 2009 for power laws).
\section{Results}
Present the numbers. Tables and figures. State significance after Bonferroni
where applicable. Report effect sizes, not just p-values.
\begin{table}[h]
\caption{Results summary.}
\label{tab:results}
\begin{ruledtabular}
\begin{tabular}{lrrrl}
\textbf{Test} & \textbf{$r$} & \textbf{$p$} & \textbf{$N$} & \textbf{Sig?} \\
\colrule
Test 1 & +0.09 & 0.0001 & 1,800 & Yes \\
Test 2 & -0.02 & 0.81 & 120 & No \\
\end{tabular}
\end{ruledtabular}
\end{table}
\section{Discussion}
What do the results mean? How do they relate to prior work? What are the
limitations? If a null result, say so clearly and explain why it matters.
If V1 → V2 shrinkage occurred, discuss it honestly.
\section{Conclusion}
2-3 sentences. The main finding. The implication. What should be done next.
\begin{thebibliography}{99}
\bibitem{ref1} Author, ``Title,'' \textit{Journal} \textbf{Vol}, Page (Year).
\end{thebibliography}
\end{document}
Compile Command
cd workspaces/<workspace>/paper/
pdflatex paper.tex && pdflatex paper.tex
Run twice to resolve cross-references and table numbering.
Target Specifications
- Length: 3-4 pages (two-column revtex4-2 format)
- Tables: 1-2 (data summary + results)
- Figures: 1-2 (referenced from
www/via\includegraphics) - References: 5-15 (key citations, not exhaustive)
5. Workspace Conventions
Directory Structure
Every research workspace follows this layout:
workspaces/<workspace-name>/
├── index.md # Research narrative (the "lab notebook")
├── workspace.json # Machine-readable metadata
├── scripts/
│ ├── extract.py # Data extraction from PostgreSQL
│ ├── analyze.py # Statistical analysis
│ └── visualize.py # (optional) Separate visualization script
├── data/
│ ├── *.parquet # Extracted datasets (Polars/Parquet)
│ └── *.json # Intermediate results
├── www/
│ ├── *.html # Interactive Plotly visualizations
│ └── *.png # Static images for papers
├── paper/
│ └── paper.tex # LaTeX paper (if publishing)
└── tmp/ # Scratch files, debugging output
Pipeline Flow
The canonical research pipeline is linear:
extract.py → data/*.parquet
→ analyze.py → data/results.json
→ visualize.py → www/*.html + www/*.png
→ index.md (narrative writeup)
→ paper/paper.tex (if publishing)
extract.py — Queries PostgreSQL, saves Parquet files to
data/. Uses psycopg2 + polars. No analysis logic here.analyze.py — Loads Parquet files, runs statistical tests, writes
data/results.json. All numbers in the paper come from this JSON.Visualizations — Generate both interactive HTML (for the web) and static PNG (for the paper). Usually done at the end of
analyze.pyor in a separatevisualize.py.index.md — The human-readable research narrative. Written after analysis is complete.
paper/paper.tex — The formal paper. Only created if the findings warrant publication.
index.md Format
# Workspace Title
> **Author:** Claude (TerraPulse Lab)
> **Status:** Draft | Complete | Published
> **Created:** YYYY-MM-DD
> **GitHub Issue:** #NN
## Hypothesis
What are we investigating? State it as a testable question.
## Data Sources
Table of metrics used, record counts, date ranges.
## Methodology
Statistical methods, preprocessing steps, parameters.
## Findings
Results with numbers. Tables. References to visualizations in www/.
## References
Related work, data source documentation.
Plotly Visualization Pattern
Always generate both interactive HTML and static PNG:
import plotly.graph_objects as go
fig = go.Figure()
# ... build your figure ...
fig.update_layout(
title="Descriptive Title (N=1,234)",
xaxis_title="X Label (units)",
yaxis_title="Y Label (units)",
template="plotly_white",
)
# Interactive version for the web
fig.write_html(
str(WWW_DIR / "figure-name.html"),
include_plotlyjs="cdn",
)
# Static version for papers (high DPI)
fig.write_image(
str(WWW_DIR / "figure-name.png"),
scale=2,
)
print(f" Wrote {WWW_DIR / 'figure-name.html'}")
print(f" Wrote {WWW_DIR / 'figure-name.png'}")
workspace.json Format
{
"name": "workspace-name",
"title": "Human-Readable Title",
"status": "complete",
"created": "2026-03-28",
"issue": 43,
"metrics": ["earthquake_magnitude", "space_kp_index"],
"methods": ["correlation", "granger_causality"],
"findings_summary": "One-sentence summary of the key result."
}