Listening for events…

Paper Machine Agent — Domain Knowledge Context

This file is the "brain" of the Paper Machine Agent (PMA). It teaches an AI agent how to conduct rigorous cross-domain environmental research on the TerraPulse platform. Load this file into agent context before any research task.

Platform: TerraPulse — 29M observations, 113 metrics, 14 scientific domains Database: PostgreSQL + PostGIS at localhost:5433


1. Data Access

Connection

Always use synchronous psycopg2 for data extraction scripts. The async SQLAlchemy engine is for the API server only — research scripts run in batch and psycopg2 is simpler and more reliable for one-off queries.

import psycopg2
import numpy as np
import polars as pl
from pathlib import Path

PG_DSN = "postgresql://terrapulse:terrapulse@localhost:5433/terrapulse"
DATA_DIR = Path(__file__).resolve().parent.parent / "data"

conn = psycopg2.connect(PG_DSN)
cur = conn.cursor()

Core Query Patterns

List all available metrics with counts:

cur.execute("""
    SELECT metric, COUNT(*) AS n,
           MIN(timestamp_utc) AS first_obs,
           MAX(timestamp_utc) AS last_obs
    FROM observations
    WHERE value IS NOT NULL
    GROUP BY metric
    ORDER BY n DESC
""")
metrics = cur.fetchall()
for m, n, first, last in metrics:
    print(f"  {m:40s} {n:>10,}  {first} → {last}")

Daily aggregation (the most common pattern):

cur.execute("""
    SELECT timestamp_utc::date AS day,
           COUNT(*) AS n,
           AVG(CAST(value AS DOUBLE PRECISION)) AS avg_val,
           MAX(CAST(value AS DOUBLE PRECISION)) AS max_val,
           MIN(CAST(value AS DOUBLE PRECISION)) AS min_val
    FROM observations
    WHERE metric = %s AND value IS NOT NULL
    GROUP BY 1
    ORDER BY 1
""", (metric_name,))
rows = cur.fetchall()
df = pl.DataFrame({
    "day": [r[0] for r in rows],
    "n": [r[1] for r in rows],
    "avg": [float(r[2]) for r in rows],
    "max": [float(r[3]) for r in rows],
    "min": [float(r[4]) for r in rows],
})

Hourly aggregation:

cur.execute("""
    SELECT date_trunc('hour', timestamp_utc) AS hour,
           COUNT(*) AS n,
           AVG(CAST(value AS DOUBLE PRECISION)) AS avg_val
    FROM observations
    WHERE metric = %s AND value IS NOT NULL
      AND timestamp_utc >= %s AND timestamp_utc < %s
    GROUP BY 1
    ORDER BY 1
""", (metric_name, start_utc, end_utc))

Earthquake deduplication (REQUIRED before any earthquake analysis):

Earthquake data contains duplicates from overlapping USGS feed windows. Always deduplicate by rounding to the nearest minute and grouping by lat/lon:

cur.execute("""
    SELECT DISTINCT ON (
        date_trunc('minute', timestamp_utc),
        ROUND(CAST(latitude AS NUMERIC), 2),
        ROUND(CAST(longitude AS NUMERIC), 2)
    )
        timestamp_utc,
        CAST(value AS DOUBLE PRECISION) AS magnitude,
        latitude,
        longitude
    FROM observations
    WHERE metric = 'earthquake_magnitude' AND value IS NOT NULL
    ORDER BY date_trunc('minute', timestamp_utc),
             ROUND(CAST(latitude AS NUMERIC), 2),
             ROUND(CAST(longitude AS NUMERIC), 2),
             timestamp_utc
""")
eq_rows = cur.fetchall()

Data Guards

These guards prevent runaway queries, bad data, and out-of-memory errors:

  1. LIMIT 1M rows — Never fetch more than 1 million rows in a single query without explicit justification. For larger datasets, aggregate server-side.

  2. Filter nulls — Always include WHERE value IS NOT NULL unless you are specifically studying data completeness.

  3. Filter > 0 for continuous metrics — Many continuous metrics (streamflow, radiation, wave height) have physical meaning only when positive. Zero values often indicate sensor off or no-data:

    WHERE metric = 'streamflow_00060' AND value IS NOT NULL
      AND CAST(value AS DOUBLE PRECISION) > 0
    
  4. Subsample > 100K — When fitting distributions or running bootstrap tests on datasets larger than 100,000 rows, subsample first:

    if len(values) > 100_000:
        values = np.random.default_rng(42).choice(values, 100_000, replace=False)
        print(f"Subsampled to 100,000 for fitting (seed=42)")
    
  5. Close connections — Always close cursor and connection:

    cur.close()
    conn.close()
    

Key Metrics Reference Table

Domain Metric Name Records Notes
Solar / Space Weather solar_xray_flux 9.2M GOES X-ray flux (5-min cadence)
solar_wind_speed 2.8M DSCOVR solar wind (km/s)
solar_wind_bz 2.6M IMF Bz component (nT, southward=negative)
space_kp_index 160K Planetary K-index (0-9 scale)
space_solar_flux_10cm 84K F10.7 solar radio flux
sunspot_number 76K Daily sunspot number (1818-2026)
donki_cme_speed 18K CME speed from NASA DONKI (km/s)
donki_flare_class 2.5K Solar flare class (C/M/X encoded)
donki_storm_kp 788 Geomagnetic storm peak Kp
donki_high_speed_stream 580 High-speed solar wind streams
Seismology earthquake_magnitude 175K USGS earthquakes (2021-2026, global)
Severe Weather (NWS) nws_* (40+ types) 471K NWS alerts by type (3-day rolling window)
Severe Weather (SPC) spc_tornado 70K Historical tornadoes (1950-2023)
spc_hail 29K Hail reports
spc_wind 7.7K Wind damage reports
Meteorology temperature_2m 56K Open-Meteo temperature (10 cities)
geosphere_temperature 9.6K GeoSphere Austria stations
nasa_power_t2m 11K NASA POWER reanalysis temperature
Air Quality us_aqi 55K US AQI from Open-Meteo
geosphere_pm25 7.4K PM2.5 from GeoSphere Austria
Hydrology / Oceanography streamflow_00060 2.4M USGS streamflow (cfs, 15-min)
water_level 247K NOAA tide gauge water levels
wave_height 22K Open-Meteo wave height (m)
river_discharge 1.5K Open-Meteo river discharge
Radiation radiation 1.4M Safecast citizen science (CPM)
Volcanology volcanic_eruption 5K Smithsonian GVP eruptions
volcano_alert 5.9K USGS volcano alerts
holocene_volcano 1.2K Holocene volcano catalog
Near-Earth Objects neo_close_approach 25K NASA NEO close approaches
neo_fireball 541 CNEOS fireball database (1998-2026)
Gravitational Waves gw_merger 368 LIGO/Virgo/KAGRA confirmed mergers
gw_observing_run 2.1K Detector observing run metadata
Optical Transients ztf_sn_candidate 1.4K ZTF supernova candidates
ztf_kilonova_candidate 395 ZTF kilonova candidates
Climate Indices climate_oni 1.8K Oceanic Nino Index (1949-2025)
climate_mei_v2 1.1K Multivariate ENSO Index (1979-2025)
drought_area_pct 13K USDM drought area % by state
Atmospheric Composition co2_daily 16K Daily CO2 concentration
ch4_monthly 509 Monthly methane concentration
Satellite sar_scene 7.6K Sentinel-1 SAR scene catalog
Socioeconomic wb_SP.POP.TOTL 3.3K World Bank population
wb_AG.LND.FRST.ZS 2.8K World Bank forest area %
wb_EG.USE.PCAP.KG.OE 2.8K World Bank energy use per capita
Geomagnetic (NEW) dst_index 400+ Dst ring current (1963-present, hourly, CDAWeb HAPI)
emsc_magnitude ~340/day EMSC seismicity (M0.8+ Europe/Middle East)
gfz_magnitude ~5/hr GFZ GEOFON (M2+ global, focal mechanisms)
isc_magnitude ~27/6hr ISC aggregated (496 agencies worldwide)
mag_field_* varies INTERMAGNET observatories (X/Y/Z/F nT, hourly)
magnetic_pole_north 435 North magnetic pole lat/lon (1590-2025)
magnetic_pole_south 435 South magnetic pole lat/lon (1590-2025)
Ionosphere (NEW) wspr_snr_* (18 bands) ~18/hr WSPR propagation (mean SNR per band per hour)
ham_band_* (4 bands) ~4/3hr HamQSL band conditions (Good/Fair/Poor/Closed)
ham_kindex, ham_solarflux, etc. ~7/3hr HamQSL solar/geomagnetic indices
UAP ufo_sighting 80K NUFORC sighting reports (1906-2014)

Data Indices (NEW)

Two pre-built indices enable cross-domain queries without touching PostgreSQL:

Timeline Index (data/timeline.parquet): 8,128 rows (one per day, 2004-2026), 23 columns. Query any date range to get earthquake counts, Kp, solar wind, X-ray class, WSPR SNR, NWS alerts, fireballs, radiation, sunspots, volcanoes — all in one row. Refreshed hourly.

GeoIndex (data/geoindex.parquet): 9,584 cells at 1° resolution. Query any lat/lon bounding box to find what data exists there and which metrics are available. Useful for spatial context of events.

WSPR Denoised Data (NEW)

For any WSPR study, use the denoised datasets on /mnt/ursa:

  • corridors_denoised.parquet — 3.3M rows, 6 geographic corridors, hourly, noise-corrected
  • daily_denoised.parquet — 109K rows, 7 HF bands, 21 years, noise-corrected
  • noise_baseline.parquet — median noise per band per hour-of-day

Key columns added by denoising: adjusted_snr (noise-corrected), signal_level (receiver-independent), noise_anomaly (deviation from baseline).


2. Statistical Methods

Every method below includes production-ready Python code tested on TerraPulse data. Copy these directly into analyze.py scripts.

2.1 Correlation (Pearson + Spearman)

Always report BOTH Pearson (linear) and Spearman (monotonic rank) correlations. Disagreement between them indicates nonlinearity.

from scipy import stats as sp_stats

def correlate(x: np.ndarray, y: np.ndarray, label: str = "") -> dict:
    """Compute Pearson and Spearman correlations with sample size.

    Always report both. Disagreement flags nonlinearity.
    """
    # Drop pairs where either is NaN
    mask = ~(np.isnan(x) | np.isnan(y))
    x_clean, y_clean = x[mask], y[mask]
    n = len(x_clean)

    if n < 10:
        print(f"  {label}: N={n} — TOO FEW for correlation")
        return {"n": n, "pearson_r": None, "spearman_rho": None}

    r_pearson, p_pearson = sp_stats.pearsonr(x_clean, y_clean)
    rho_spearman, p_spearman = sp_stats.spearmanr(x_clean, y_clean)

    print(f"  {label}: N={n}")
    print(f"    Pearson  r={r_pearson:.4f}, p={p_pearson:.6f}, R²={r_pearson**2:.4f}")
    print(f"    Spearman ρ={rho_spearman:.4f}, p={p_spearman:.6f}")

    return {
        "n": n,
        "pearson_r": round(float(r_pearson), 4),
        "pearson_p": round(float(p_pearson), 6),
        "pearson_r_squared": round(float(r_pearson**2), 4),
        "spearman_rho": round(float(rho_spearman), 4),
        "spearman_p": round(float(p_spearman), 6),
        "label": label,
    }

2.2 Schuster Test (Phase Analysis)

The classical test for periodic triggering (e.g., tidal forcing of earthquakes). Projects event phases onto a unit circle — if events cluster at a preferred phase, the resultant vector is long.

def schuster_test(phases: np.ndarray, label: str = "") -> dict:
    """Schuster test for phase clustering.

    phases: array of phase values in [0, 1) where 0=start, 1=full cycle.
    Under the null (uniform distribution), D²/N ~ exponential(1).

    Returns:
        D²: squared resultant length
        p: exp(-D²/N) — probability of this alignment by chance
        R: mean resultant length (0=uniform, 1=perfect alignment)
        mean_phase: the preferred phase (if significant)
    """
    N = len(phases)
    cos_sum = np.sum(np.cos(2 * np.pi * phases))
    sin_sum = np.sum(np.sin(2 * np.pi * phases))
    D2 = cos_sum**2 + sin_sum**2
    p = np.exp(-D2 / N)
    R = np.sqrt(D2) / N
    mean_phase = np.arctan2(sin_sum, cos_sum) / (2 * np.pi) % 1.0

    print(f"  Schuster test ({label}): N={N:,}")
    print(f"    D² = {D2:.1f}")
    print(f"    p = exp(-D²/N) = {p:.6e}")
    print(f"    R = {R:.6f} (mean resultant length)")
    print(f"    Mean phase = {mean_phase:.4f}")
    print(f"    {'SIGNIFICANT' if p < 0.05 else 'Not significant'}")

    return {
        "n": N,
        "D_squared": round(float(D2), 2),
        "p_value": float(p),
        "R": round(float(R), 6),
        "mean_phase": round(float(mean_phase), 4),
        "significant": p < 0.05,
        "label": label,
    }

2.3 Granger Causality (MLE with F-test)

Tests whether lagged values of X improve prediction of Y beyond Y's own history. This is "predictive causality" — not true causation.

def granger_causality_test(x: np.ndarray, y: np.ndarray, max_lag: int = 7) -> dict:
    """Granger causality F-test: does X help predict Y?

    Compares:
      Restricted:   Y_t = a0 + sum(a_i * Y_{t-i}) + e
      Unrestricted: Y_t = a0 + sum(a_i * Y_{t-i}) + sum(b_i * X_{t-i}) + e

    Tests lags 1 through max_lag, returns the best (lowest p-value).
    """
    best = {"lag": 1, "f_stat": 0.0, "p_value": 1.0}

    for lag in range(1, max_lag + 1):
        n = len(y) - lag
        if n < 2 * lag + 10:  # Need sufficient degrees of freedom
            continue

        # Dependent variable
        Y = y[lag:]

        # Restricted model: Y on its own lags + intercept
        Y_lags = np.column_stack([y[lag - i - 1:len(y) - i - 1] for i in range(lag)])
        Y_lags = np.column_stack([np.ones(n), Y_lags])

        try:
            beta_r, _, _, _ = np.linalg.lstsq(Y_lags, Y, rcond=None)
            rss_r = np.sum((Y - Y_lags @ beta_r) ** 2)
        except np.linalg.LinAlgError:
            continue

        # Unrestricted model: Y on Y-lags + X-lags
        X_lags = np.column_stack([x[lag - i - 1:len(x) - i - 1] for i in range(lag)])
        Z = np.column_stack([Y_lags, X_lags])

        try:
            beta_u, _, _, _ = np.linalg.lstsq(Z, Y, rcond=None)
            rss_u = np.sum((Y - Z @ beta_u) ** 2)
        except np.linalg.LinAlgError:
            continue

        # F-test
        df1 = lag  # additional parameters in unrestricted
        df2 = n - 2 * lag - 1  # residual DoF
        if df2 <= 0 or rss_u <= 0:
            continue

        f_stat = ((rss_r - rss_u) / df1) / (rss_u / df2)
        p_value = 1 - sp_stats.f.cdf(f_stat, df1, df2)

        if p_value < best["p_value"]:
            best = {"lag": lag, "f_stat": float(f_stat), "p_value": float(p_value)}

    return best

2.4 Superposed Epoch Analysis

Stack multiple events on a common timeline to detect systematic pre- and post-event patterns. The gold standard for "does X respond to Y events?"

def superposed_epoch(event_times: list, series_times: np.ndarray,
                     series_values: np.ndarray, window_days: int = 7,
                     label: str = "") -> dict:
    """Superposed epoch analysis with statistical tests.

    event_times: list of datetime objects (the "key dates")
    series_times: array of datetime objects for the response variable
    series_values: array of float values for the response variable
    window_days: half-window size (±window_days around each event)

    Stacks all events and compares each epoch day to the pre-event baseline.
    """
    from datetime import timedelta

    # Convert to days relative to each event
    epoch_matrix = []

    for event_t in event_times:
        offsets = []
        vals = []
        for t, v in zip(series_times, series_values):
            delta = (t - event_t).total_seconds() / 86400.0
            if -window_days <= delta <= window_days:
                offsets.append(round(delta))
                vals.append(v)

        if len(offsets) > 0:
            day_dict = {}
            for d, v in zip(offsets, vals):
                day_dict.setdefault(d, []).append(v)
            # Average within each day for this event
            epoch = {d: np.mean(vs) for d, vs in day_dict.items()}
            epoch_matrix.append(epoch)

    if len(epoch_matrix) < 3:
        print(f"  SEA ({label}): Only {len(epoch_matrix)} events stacked — too few")
        return {"n_events": len(epoch_matrix), "significant": False}

    # Build aligned matrix: columns = epoch days, rows = events
    days = list(range(-window_days, window_days + 1))
    matrix = np.full((len(epoch_matrix), len(days)), np.nan)
    for i, epoch in enumerate(epoch_matrix):
        for j, d in enumerate(days):
            if d in epoch:
                matrix[i, j] = epoch[d]

    # Epoch averages
    epoch_means = np.nanmean(matrix, axis=0)
    epoch_stds = np.nanstd(matrix, axis=0)

    # Baseline: average of pre-event days [-window_days, -1]
    pre_cols = [j for j, d in enumerate(days) if -window_days <= d < 0]
    pre_values = matrix[:, pre_cols].flatten()
    pre_values = pre_values[~np.isnan(pre_values)]
    baseline_mean = np.mean(pre_values)
    baseline_std = np.std(pre_values)

    # Test each post-event day against baseline
    results_by_day = {}
    for j, d in enumerate(days):
        if d < 0:
            continue
        col = matrix[:, j]
        col_clean = col[~np.isnan(col)]
        if len(col_clean) < 3:
            continue
        t_stat, t_p = sp_stats.ttest_1samp(col_clean, baseline_mean)
        u_stat, u_p = sp_stats.mannwhitneyu(col_clean, pre_values,
                                            alternative='two-sided')
        results_by_day[d] = {
            "mean": round(float(np.mean(col_clean)), 4),
            "n": len(col_clean),
            "t_stat": round(float(t_stat), 3),
            "t_p": round(float(t_p), 6),
            "mann_whitney_p": round(float(u_p), 6),
        }

    print(f"  SEA ({label}): {len(epoch_matrix)} events stacked, ±{window_days} days")
    print(f"    Baseline (pre-event): mean={baseline_mean:.4f}, std={baseline_std:.4f}")
    for d, r in sorted(results_by_day.items()):
        sig = "***" if r["t_p"] < 0.001 else "**" if r["t_p"] < 0.01 else "*" if r["t_p"] < 0.05 else ""
        print(f"    Day {d:+d}: mean={r['mean']:.4f}  t={r['t_stat']:.2f}  p={r['t_p']:.4f} {sig}")

    return {
        "n_events": len(epoch_matrix),
        "baseline_mean": round(float(baseline_mean), 4),
        "baseline_std": round(float(baseline_std), 4),
        "epoch_means": {d: round(float(m), 4) for d, m in zip(days, epoch_means)},
        "post_event_tests": results_by_day,
        "label": label,
    }

2.5 Distribution Fitting (Clauset Power Law + Alternatives)

Following Clauset et al. (2009): estimate x_min via KS minimization, fit power law + exponential + lognormal + Weibull on the same tail, compare with Vuong likelihood ratio test, and validate with bootstrap GOF.

def fit_power_law(values: np.ndarray, bootstrap_n: int = 500,
                  label: str = "") -> dict:
    """Clauset et al. (2009) power law fitting with alternatives.

    1. Estimate x_min by minimizing KS statistic
    2. MLE fit for power law, exponential, lognormal, Weibull
    3. Vuong pairwise comparison
    4. Bootstrap goodness-of-fit (p < 0.1 = reject power law)

    Returns dict with all fit parameters and the best model.
    """
    from scipy.special import erfc

    values = values[values > 0]  # Power law requires positive values

    # --- Step 1: Estimate x_min ---
    unique_vals = np.sort(np.unique(values))
    if len(unique_vals) > 500:
        candidates = unique_vals[np.linspace(0, len(unique_vals) - 2, 200).astype(int)]
    else:
        candidates = unique_vals[:-1]

    best_xmin, best_alpha, best_ks = candidates[0], 2.0, 1.0
    for xm in candidates:
        tail = values[values >= xm]
        n = len(tail)
        if n < 10:
            continue
        alpha = 1 + n / np.sum(np.log(tail / xm))
        if alpha <= 1:
            continue
        # KS statistic
        tail_sorted = np.sort(tail)
        cdf_theory = 1 - (tail_sorted / xm) ** (1 - alpha)
        cdf_empirical = np.arange(1, n + 1) / n
        ks = np.max(np.abs(cdf_empirical - cdf_theory))
        if ks < best_ks:
            best_ks, best_xmin, best_alpha = ks, xm, alpha

    tail = values[values >= best_xmin]
    n_tail = len(tail)

    # --- Step 2: Fit alternatives on the same tail ---
    # Exponential MLE: lambda = 1 / mean(x - x_min)
    shifted = tail - best_xmin
    shifted = shifted[shifted > 0]
    exp_lambda = 1.0 / np.mean(shifted) if len(shifted) > 0 else 1.0
    exp_ks = sp_stats.kstest(shifted, 'expon', args=(0, 1/exp_lambda)).statistic

    # Lognormal MLE
    log_tail = np.log(tail)
    ln_mu, ln_sigma = np.mean(log_tail), np.std(log_tail)
    ln_ks = sp_stats.kstest(tail, 'lognorm', args=(ln_sigma, 0, np.exp(ln_mu))).statistic

    # Weibull MLE
    try:
        wb_params = sp_stats.weibull_min.fit(tail, floc=0)
        wb_k = wb_params[0]
        wb_ks = sp_stats.kstest(tail, 'weibull_min', args=wb_params).statistic
    except Exception:
        wb_k, wb_ks = None, 1.0

    # --- Step 3: Vuong likelihood ratio tests ---
    def vuong(loglik_ratios):
        n = len(loglik_ratios)
        if n < 10:
            return 0.0, 1.0, "inconclusive"
        R = np.sum(loglik_ratios)
        sigma = np.std(loglik_ratios, ddof=1)
        if sigma == 0:
            return R, 1.0, "inconclusive"
        stat = R / (sigma * np.sqrt(n))
        p = float(erfc(abs(stat) / np.sqrt(2)))
        if p > 0.1:
            return float(R), p, "inconclusive"
        return float(R), p, ("power_law" if R > 0 else "alternative")

    # Power law log-likelihoods per point
    pl_ll = (np.log(best_alpha - 1) - np.log(best_xmin)
             - best_alpha * np.log(tail / best_xmin))
    exp_ll = np.log(exp_lambda) - exp_lambda * (tail - best_xmin)
    ln_ll = sp_stats.lognorm.logpdf(tail, ln_sigma, 0, np.exp(ln_mu))

    comparisons = {}
    R_exp, p_exp, fav_exp = vuong(pl_ll - exp_ll)
    comparisons["vs_exponential"] = {"R": round(R_exp, 2), "p": round(p_exp, 4), "favors": fav_exp}
    R_ln, p_ln, fav_ln = vuong(pl_ll - ln_ll)
    comparisons["vs_lognormal"] = {"R": round(R_ln, 2), "p": round(p_ln, 4), "favors": fav_ln}

    # --- Step 4: Bootstrap GOF ---
    observed_ks = best_ks
    frac_tail = n_tail / len(values)
    count_ge = 0
    rng = np.random.default_rng(42)
    for _ in range(bootstrap_n):
        n_synth_tail = rng.binomial(len(values), frac_tail)
        if n_synth_tail < 10:
            continue
        synth_tail = best_xmin * (1 - rng.random(n_synth_tail)) ** (-1 / (best_alpha - 1))
        synth_alpha = 1 + n_synth_tail / np.sum(np.log(synth_tail / best_xmin))
        synth_sorted = np.sort(synth_tail)
        cdf_t = 1 - (synth_sorted / best_xmin) ** (1 - synth_alpha)
        cdf_e = np.arange(1, n_synth_tail + 1) / n_synth_tail
        synth_ks = np.max(np.abs(cdf_e - cdf_t))
        if synth_ks >= observed_ks:
            count_ge += 1
    bootstrap_p = count_ge / bootstrap_n

    # --- Determine best fit ---
    fits = {"power_law": best_ks, "exponential": exp_ks, "lognormal": ln_ks}
    if wb_k is not None:
        fits["weibull"] = wb_ks
    best_fit = min(fits, key=fits.get)
    if bootstrap_p < 0.1:
        best_fit = min({k: v for k, v in fits.items() if k != "power_law"},
                       key=lambda k: fits[k], default=best_fit)

    print(f"  Distribution fit ({label}): N={len(values):,}, tail N={n_tail:,}")
    print(f"    x_min = {best_xmin}")
    print(f"    Power Law: alpha={best_alpha:.3f}, KS={best_ks:.4f}, bootstrap_p={bootstrap_p:.3f}")
    print(f"    Exponential: lambda={exp_lambda:.4f}, KS={exp_ks:.4f}")
    print(f"    Lognormal: mu={ln_mu:.3f}, sigma={ln_sigma:.3f}, KS={ln_ks:.4f}")
    if wb_k is not None:
        print(f"    Weibull: k={wb_k:.3f}, KS={wb_ks:.4f}")
    print(f"    Vuong tests: {comparisons}")
    print(f"    >>> BEST FIT: {best_fit}")

    return {
        "n": len(values),
        "n_tail": n_tail,
        "x_min": float(best_xmin),
        "power_law": {"alpha": round(best_alpha, 3), "ks": round(best_ks, 4),
                      "bootstrap_p": round(bootstrap_p, 3)},
        "exponential": {"lambda": round(exp_lambda, 4), "ks": round(exp_ks, 4)},
        "lognormal": {"mu": round(ln_mu, 3), "sigma": round(ln_sigma, 3),
                      "ks": round(ln_ks, 4)},
        "weibull": {"k": round(wb_k, 3) if wb_k else None,
                    "ks": round(wb_ks, 4) if wb_k else None},
        "comparisons": comparisons,
        "best_fit": best_fit,
        "label": label,
    }

2.6 Multiple Comparisons (Bonferroni Correction)

When testing more than 3 pairs, apply Bonferroni correction to control the family-wise error rate. This is non-negotiable.

def bonferroni_threshold(n_pairs: int, alpha: float = 0.05) -> float:
    """Compute Bonferroni-corrected significance threshold.

    With 72 pairs, the threshold becomes 0.05/72 = 0.000694.
    A result at p=0.01 that "looks significant" is NOT after Bonferroni.
    """
    threshold = alpha / n_pairs
    print(f"  Bonferroni: {n_pairs} tests, alpha={alpha} → threshold={threshold:.6f}")
    return threshold


def apply_bonferroni(results: list[dict], alpha: float = 0.05) -> list[dict]:
    """Mark results as surviving or not surviving Bonferroni correction.

    Each result dict must have a 'p_value' key.
    """
    n = len(results)
    threshold = bonferroni_threshold(n, alpha)

    surviving = 0
    for r in results:
        r["bonferroni_threshold"] = threshold
        r["survives_bonferroni"] = r["p_value"] < threshold
        if r["survives_bonferroni"]:
            surviving += 1

    print(f"  {surviving}/{n} results survive Bonferroni correction")
    return results

2.7 Cross-Correlation with Lags

Compute correlation at multiple time lags to detect delayed coupling.

def cross_correlation(x: np.ndarray, y: np.ndarray,
                      max_lag: int = 24, label: str = "") -> dict:
    """Cross-correlation between x and y at lags -max_lag to +max_lag.

    Positive lag means x leads y (x at time t correlates with y at time t+lag).
    Reports Pearson r and p-value at each lag.
    """
    results = []

    for lag in range(-max_lag, max_lag + 1):
        if lag > 0:
            x_shifted = x[:-lag]
            y_shifted = y[lag:]
        elif lag < 0:
            x_shifted = x[-lag:]
            y_shifted = y[:lag]
        else:
            x_shifted = x
            y_shifted = y

        # Drop NaNs
        mask = ~(np.isnan(x_shifted) | np.isnan(y_shifted))
        x_c, y_c = x_shifted[mask], y_shifted[mask]
        n = len(x_c)

        if n < 10:
            continue

        r, p = sp_stats.pearsonr(x_c, y_c)
        results.append({
            "lag": lag,
            "r": round(float(r), 4),
            "p": round(float(p), 6),
            "n": n,
        })

    # Find peak
    if results:
        best = max(results, key=lambda r: abs(r["r"]))
        print(f"  Cross-correlation ({label}):")
        print(f"    Peak: lag={best['lag']}, r={best['r']:.4f}, p={best['p']:.6f}, N={best['n']}")

    return {
        "lags": results,
        "peak_lag": best["lag"] if results else None,
        "peak_r": best["r"] if results else None,
        "peak_p": best["p"] if results else None,
        "label": label,
    }

3. Quality Standards

These standards are NON-NEGOTIABLE. They encode hard-won lessons from 30 research workspaces and 10 published papers. Every one of these rules exists because we violated it and had to correct the record.

3.0 Anti-Sycophancy Protocol

When the data says no, SAY NO. Do not soften, hedge, reframe, or find silver linings in null results. Do not agree with dramatic claims from the internet without cross-checking against our data first.

Known cases where enthusiasm outran the numbers:

  • Fireball "158x anomaly" → duplicate ingestion bug. Actual March 2026: 3 fireballs (normal).
  • AMS "mysterious surge" → CNEOS shows 4 in Q1, exactly the 4.3/quarter baseline.
  • X1.5 CME "will produce Kp storm" → CME missed Earth entirely. Most CMEs miss.
  • Solar wind r=0.54 → V2 with more data: r=0.09. The effect was 6x smaller than V1 suggested.

Rules:

  1. Check the data BEFORE agreeing with any claim.
  2. "That's noise" is a valid and useful response.
  3. Correct your own errors immediately — do not wait to be asked.
  4. A null result reported clearly is worth more than a positive result reported vaguely.
  5. If you catch yourself saying "interesting!" about something that might be an artifact, stop and verify.

3.1 Honest Nulls

A null result IS a finding. Report it prominently, not buried in an appendix. Some of our best papers are null results:

  • Solar-terrestrial forcing: "The solar cycle does not modulate geophysical event rates" (205 years of data, zero significant correlations)
  • NEO-gravitational: "Gravitational wave events do not correlate with near-Earth object trajectories" (clean null)
  • Earthquake tidal forcing: Schuster test p=0.28 — earthquakes do NOT cluster by lunar phase (despite 171K events)

If your analysis finds nothing, SAY SO CLEARLY. Do not hedge with "further investigation needed" when the data is conclusive. A null with N=100,000 is more valuable than a positive with N=50.

3.2 V1 to V2 Discipline

Every surprising positive result must be verified with more data.

The canonical cautionary tale: Solar wind speed vs earthquakes showed r=0.54 (p<0.00001) with 74 hours of data (V1). With 1,800 hours of DSCOVR archive data (V2), the correlation dropped to r=0.09 — still statistically significant but explaining less than 1% of variance.

Rule of thumb: r > 0.5 with N < 500 is a flag for verification, not a finding. Do not publish V1 results without attempting V2 replication.

The V1-to-V2 pipeline:

  1. V1: Initial analysis on available data. Flag any strong correlations.
  2. Seek more data: Backfill archives, extend time windows, add sources.
  3. V2: Re-run the identical analysis on the expanded dataset.
  4. Report BOTH V1 and V2. The shrinkage itself is scientifically informative.

3.3 Effect Sizes

Always report effect size alongside p-values. A p-value alone is meaningless — with enough data, any nonzero effect becomes "significant."

Required effect size metrics:

  • r or rho for correlations (with R² = variance explained)
  • Cohen's d for group comparisons: d = (mean1 - mean2) / pooled_std
  • Ratio for rate comparisons: "high-Kp days average 1.76x more earthquakes"
def cohens_d(group1: np.ndarray, group2: np.ndarray) -> float:
    """Cohen's d effect size for two independent samples."""
    n1, n2 = len(group1), len(group2)
    var1, var2 = np.var(group1, ddof=1), np.var(group2, ddof=1)
    pooled_std = np.sqrt(((n1 - 1) * var1 + (n2 - 1) * var2) / (n1 + n2 - 2))
    if pooled_std == 0:
        return 0.0
    d = (np.mean(group1) - np.mean(group2)) / pooled_std
    return round(float(d), 4)

Interpretation: |d| < 0.2 = negligible, 0.2-0.5 = small, 0.5-0.8 = medium, > 0.8 = large.

3.4 Sample Sizes

State N for every statistical test. No exceptions. Every correlation, every t-test, every Schuster test — the reader must see N alongside the result.

Pattern:

Solar wind vs earthquakes: r=0.09, p=0.0001, N=1,800 hours
Schuster test (all M≥4): p=0.28, N=171,852 earthquakes
Granger F-test: F=4.32, p=0.003, N=181 hours, lag=2h

3.5 Bootstrap Before Power Law

When fitting power laws, ALWAYS run the bootstrap goodness-of-fit test. If bootstrap p < 0.1, reject the power law hypothesis — the data is better described by an alternative distribution.

We learned this from the universal-scaling-laws workspace: earthquake magnitudes looked like a power law until bootstrap rejected it (p=0.03). The true distribution was lognormal.

3.6 Deduplicate Earthquakes

USGS earthquake feeds have overlapping windows that produce duplicate records. Before ANY earthquake analysis, run the deduplication query from Section 1. Failure to deduplicate inflates N and biases clustering tests.

3.7 At Least One Null Hypothesis Per Paper

Every paper must explicitly state and test at least one null hypothesis. "We find no evidence for X" is a valid and valuable conclusion. The null should be the default expectation, and the burden of proof is on the alternative.

3.8 Bonferroni for > 3 Pairs

When testing more than 3 pairwise comparisons, apply Bonferroni correction. In the Granger causality network (72 pairs), only 15 of 37 nominal significances survived Bonferroni — more than half were false positives.

Report both: "37 significant at p<0.05, 15 survive Bonferroni (p<0.00069)."

3.9 Pre-Flight Checklist (Lessons from Mike's Reviews)

These items are caught by the editor on EVERY paper. Do them BEFORE submitting.

  1. Sensitivity analysis. For any threshold or window parameter (e.g., 3σ, 90-day), report results at ±1 level (2.5σ/3.5σ, 60-day/180-day). A one-row table showing "results are robust" prevents a revision round.

  2. Cite prior TerraPulse workspaces. Run ls workspaces/ and check if any prior workspace studied related metrics. If solar-geomagnetic-lag measured Kp delay and you're measuring Kp delay, cite it and compare numbers.

  3. Report Mann-Whitney alongside t-test. For ANY group comparison (SEA post vs pre, high-Kp vs low-Kp), report both parametric (t-test) and non-parametric (Mann-Whitney). If they disagree, the non-parametric result is more reliable and that disagreement is itself a finding (indicates non-normality or outlier influence).

  4. Table notation. Never use d= for anything other than Cohen's d. Use peak= for peak values, Δ= for changes, r= for correlations. Mixed notation in a single column header guarantees an editorial flag.

  5. Explain count discrepancies. If the issue says 66 events but the paper analyzes 32, explain why in the first paragraph. "Of the 66 events, 32 were available in the exported catalog" prevents confusion.

  6. Data availability statement. Include as a bold paragraph after Conclusion, not buried in text: \textbf{Data availability.} All data and scripts at \texttt{workspaces/{slug}/}.

  7. Check the math. Before submission, verify that every number in the paper matches results.json. The editor cross-checks and WILL catch mismatches.


4. Paper Template

revtex4-2 LaTeX Skeleton

All TerraPulse papers use this exact template. Do not modify the document class, author block, or affiliation.

\documentclass[twocolumn,prl,aps,showpacs]{revtex4-2}
\usepackage{amsmath,amssymb}
\usepackage{graphicx}
\usepackage{hyperref}
\usepackage{booktabs}

\begin{document}

\title{Your Title Here}

\author{M.~Isenbek}
\email{michael@terrapulse.info}
\author{E.~Isenbek}
\author{B.~Isenbek}
\affiliation{TerraPulse Climate Intelligence Platform, \url{https://terrapulse.info}}

\date{\today}

\begin{abstract}
One paragraph. State the question, the data, the method, and the key finding.
Include the most important numbers (N, r, p). End with the implication.
\end{abstract}

\maketitle

\section{Introduction}

Why does this question matter? What is the prior literature? What gap does
this paper fill? 2-3 paragraphs.

\section{Data}

Where does the data come from? What is the temporal and spatial coverage?
Include a table summarizing each data source with record counts and date ranges.
Always cite TerraPulse: \url{https://terrapulse.info}.

\begin{table}[h]
\caption{Data sources and coverage.}
\label{tab:data}
\begin{ruledtabular}
\begin{tabular}{lrr}
\textbf{Source} & \textbf{$N$} & \textbf{Span} \\
\colrule
Source 1 & 100,000 & 2021--2026 \\
Source 2 & 50,000 & 2025--2026 \\
\end{tabular}
\end{ruledtabular}
\end{table}

\section{Method}

What statistical tests are you running? Define them mathematically.
Reference the standard citations (e.g., Clauset et al. 2009 for power laws).

\section{Results}

Present the numbers. Tables and figures. State significance after Bonferroni
where applicable. Report effect sizes, not just p-values.

\begin{table}[h]
\caption{Results summary.}
\label{tab:results}
\begin{ruledtabular}
\begin{tabular}{lrrrl}
\textbf{Test} & \textbf{$r$} & \textbf{$p$} & \textbf{$N$} & \textbf{Sig?} \\
\colrule
Test 1 & +0.09 & 0.0001 & 1,800 & Yes \\
Test 2 & -0.02 & 0.81 & 120 & No \\
\end{tabular}
\end{ruledtabular}
\end{table}

\section{Discussion}

What do the results mean? How do they relate to prior work? What are the
limitations? If a null result, say so clearly and explain why it matters.
If V1 → V2 shrinkage occurred, discuss it honestly.

\section{Conclusion}

2-3 sentences. The main finding. The implication. What should be done next.

\begin{thebibliography}{99}
\bibitem{ref1} Author, ``Title,'' \textit{Journal} \textbf{Vol}, Page (Year).
\end{thebibliography}

\end{document}

Compile Command

cd workspaces/<workspace>/paper/
pdflatex paper.tex && pdflatex paper.tex

Run twice to resolve cross-references and table numbering.

Target Specifications

  • Length: 3-4 pages (two-column revtex4-2 format)
  • Tables: 1-2 (data summary + results)
  • Figures: 1-2 (referenced from www/ via \includegraphics)
  • References: 5-15 (key citations, not exhaustive)

5. Workspace Conventions

Directory Structure

Every research workspace follows this layout:

workspaces/<workspace-name>/
├── index.md              # Research narrative (the "lab notebook")
├── workspace.json        # Machine-readable metadata
├── scripts/
│   ├── extract.py        # Data extraction from PostgreSQL
│   ├── analyze.py        # Statistical analysis
│   └── visualize.py      # (optional) Separate visualization script
├── data/
│   ├── *.parquet         # Extracted datasets (Polars/Parquet)
│   └── *.json            # Intermediate results
├── www/
│   ├── *.html            # Interactive Plotly visualizations
│   └── *.png             # Static images for papers
├── paper/
│   └── paper.tex         # LaTeX paper (if publishing)
└── tmp/                  # Scratch files, debugging output

Pipeline Flow

The canonical research pipeline is linear:

extract.py → data/*.parquet
    → analyze.py → data/results.json
        → visualize.py → www/*.html + www/*.png
            → index.md (narrative writeup)
                → paper/paper.tex (if publishing)
  1. extract.py — Queries PostgreSQL, saves Parquet files to data/. Uses psycopg2 + polars. No analysis logic here.

  2. analyze.py — Loads Parquet files, runs statistical tests, writes data/results.json. All numbers in the paper come from this JSON.

  3. Visualizations — Generate both interactive HTML (for the web) and static PNG (for the paper). Usually done at the end of analyze.py or in a separate visualize.py.

  4. index.md — The human-readable research narrative. Written after analysis is complete.

  5. paper/paper.tex — The formal paper. Only created if the findings warrant publication.

index.md Format

# Workspace Title

> **Author:** Claude (TerraPulse Lab)
> **Status:** Draft | Complete | Published
> **Created:** YYYY-MM-DD
> **GitHub Issue:** #NN

## Hypothesis

What are we investigating? State it as a testable question.

## Data Sources

Table of metrics used, record counts, date ranges.

## Methodology

Statistical methods, preprocessing steps, parameters.

## Findings

Results with numbers. Tables. References to visualizations in www/.

## References

Related work, data source documentation.

Plotly Visualization Pattern

Always generate both interactive HTML and static PNG:

import plotly.graph_objects as go

fig = go.Figure()

# ... build your figure ...

fig.update_layout(
    title="Descriptive Title (N=1,234)",
    xaxis_title="X Label (units)",
    yaxis_title="Y Label (units)",
    template="plotly_white",
)

# Interactive version for the web
fig.write_html(
    str(WWW_DIR / "figure-name.html"),
    include_plotlyjs="cdn",
)

# Static version for papers (high DPI)
fig.write_image(
    str(WWW_DIR / "figure-name.png"),
    scale=2,
)

print(f"  Wrote {WWW_DIR / 'figure-name.html'}")
print(f"  Wrote {WWW_DIR / 'figure-name.png'}")

workspace.json Format

{
  "name": "workspace-name",
  "title": "Human-Readable Title",
  "status": "complete",
  "created": "2026-03-28",
  "issue": 43,
  "metrics": ["earthquake_magnitude", "space_kp_index"],
  "methods": ["correlation", "granger_causality"],
  "findings_summary": "One-sentence summary of the key result."
}
Live Feed