Listening for events…

Research arcs and idle projects

A companion to weekly-cadence.md. The cadence covers the must-dos. This doc covers what to dip into during the rest of the day — slow-burn, multi-day projects that pay off the long-term TerraPulse ambition without draining tokens on any single session.


Part 1 — How paper topics actually get chosen

The short version: Brad files the issues, I execute against them. But that's not the whole picture, because the which of what Brad files isn't random. The V-series papers follow a structure that's worth making explicit so you and Brad can debate the next move with shared vocabulary.

The V-series arc

Every line of research at TerraPulse follows the same falsification ladder:

  1. V1 — Observe. A signal shows up in the data. The paper documents it: what we saw, when, how strong. Often INCONCLUSIVE in some narrow sense because the V1 is allowed to over-claim — that's what V2+ is for.

  2. V2 — Replicate. Re-run on independent data (different time window, different geography, different stations). If the effect size shrinks dramatically, the V1 was probably overfit. The TerraPulse rule is: any V1 with |d|>0.5 at N<200 gets a mandatory V2 with bigger N.

  3. V3 — Control. Introduce a known confounder and check if the signal survives after controlling for it. The Lifted Index control (#181 → #183) was a V3 against the WSPR precursor — H1 (survives control) vs H0 (LI was the actual cause).

  4. V4 — Mechanism. What physically produces the signal? The WSPR V4 paper tested D-layer disturbance vs cell-local QRN as competing mechanisms.

  5. V5 — Generalize. Does it hold for adjacent classes? The V5 paper checked if the WSPR precursor was tornado-specific or just any-severe-weather. Answer: any-severe.

Each step is a test of the most threatening alternative explanation to whatever the previous step claimed. Brad's instinct picks the threat to test next. The skill — and frankly the moat — is in knowing which threat is most credible at each step.

What I bring vs what Brad brings

  • Brad brings: the science instinct, the choice of which V-step is most worth running, the framing of hypothesis vs null, the pre-registered claim threshold, the historical context for what's already been shown in related fields.
  • I bring: mechanical extraction, statistical execution to spec, honest verdict against the threshold, paper-writing to revtex template, never upgrading INCONCLUSIVE to H1 just because a result "looks interesting."
  • Mike brings: editorial review, public-voice polish, audit trail.

Candidate-threats first pass (middle ground)

To save cycles without removing the human judgment step, Claude produces a candidate-threats list within ~24 hours of any V-paper shipping. Either Brad or Mike reads it, picks the threat (or rejects all of them in favor of their own), and a new issue gets filed for the next V-step.

The list is a first draft, not a recommendation. Claude's role is to enumerate plausible alternative explanations and assess operational feasibility. The judgment of which threat is most credible to test stays with a human — that's the part that doesn't automate and shouldn't.

Who picks (as of 2026-05-13). Either Brad or Mike. Brad's instinct is more developed in the domain (atmospheric physics, ionospheric coupling); Mike is building calibration. Both are valid threat-selectors; the question is how much guardrail support each needs from Claude during the transition.

Format. Posted as a comment on the just-shipped paper's issue. Sections:

  1. What V1 claimed. One sentence restating the finding so the threats are anchored to it.
  2. Credible threats (~3-5). Each gets: name, one-paragraph mechanism for how it could fully or partially explain the V1 effect, what data would test it, whether we have that data today, rough effort estimate. During the transition period each credible threat also gets a "Brad's likely take" annotation — Claude's read of how Brad would assess the threat and why, so Mike can pattern-match against the heuristic explicitly. After ~10 cycles where Mike's picks have stabilized, the annotation drops off.
  3. Less-credible threats (~3-5). Brief — one line each. Listed for completeness so the picker can confirm we're not blind to them.
  4. Weak threats (~3-5). Even briefer. Listed so the picker can see what Claude already discounted and challenge if they disagree.
  5. Claude's gut pick. One sentence: which credible threat Claude would test first if forced. The picker ignores this freely; it just makes Claude's reasoning legible.

What the picker does with it. Rank, edit, dismiss, or replace. The output is one filed issue: the next V-paper, with hypothesis + method + threshold per the existing PMA issue template.

Escape hatches (Mike-specific, during transition).

  • "Let me check with Brad" is always an available move on a specific threat selection. Costs ≤ 1 day on the arc. Use it whenever a pick feels genuinely 50/50.
  • High-stakes arcs default to Brad. Anything with clear monetization implications, anything Brad has flagged as a personal-priority arc, or anything where the V-result would land in a public-facing paper with media-shareable headline gets Brad's pick by default. Mike picks for routine V-cycles (V2 replication of an existing arc, V5 generalization tests, etc.).
  • Dramatic results trigger an explicit hold. If a paper that comes out of Mike's threat selection produces a large effect size with small N (the V1-shrinkage flag regime), or anything else that "looks too good," Brad reads the paper before it ships to the issue tracker. This is in addition to the regular Dana copyedit step, not a replacement for it.

Six-week review. Every six weeks, Brad and Mike skim the previous threat picks together (10–15 papers' worth) and gut-check: are the chosen threats the ones Brad would have picked, or has the arc drifted away from what he would have prioritized? Drift is the slow failure mode worth catching; the review is the catch.

What this is NOT. It is not a paper proposal. It is not a finding. It is not Claude picking the next research direction. It is a structured first draft to make the picker's threat-selection step faster.

How new V-arcs get started

A new arc starts when one of three things happens:

  1. A new datasource comes online and unlocks a test that couldn't be run before. IGRA radiosondes unlocked CAPE × WSPR (#182) and LI control (#183). When NASA FIRMS gets onboarded, a wildfire-WSPR coupling arc becomes possible.
  2. A live event is captured that's worth a standalone analysis. Enid EF-4 in 2026-04 was the template — paper #33 written same-day from the live GLM capture.
  3. A prior V-series result gets challenged externally — though this hasn't happened yet, it would force a V-next.

How long-term ambitions shape the queue

The site exists to monetize climate intelligence on X. That biases the paper queue toward findings that are:

  • Visual — a figure that reads in a single glance is worth 10 papers of pure tables.
  • Surprising but defensible — "WSPR precursor independent of LI in the highly-unstable regime" is shareable; "we ran a t-test and p=0.04" is not.
  • Repeatable — a finding we can re-run as new data arrives is a renewable content asset.
  • Linked to active weather — papers shipped within hours of a real-world event get exponentially more reach than retrospectives.

This is why the V-series gets prioritized over one-off curiosity papers: each entry in a V-arc compounds the credibility of the prior entries.


Part 2 — Multi-day idle projects

These are projects to dip into when the day's cadence block is done and there's spare capacity. Rules:

  • Touch one project per idle slot. Don't context-switch.
  • Each session produces a visible artifact: a commit, a filed issue, a checked-off subtask.
  • Time-box to 30 min per slot. If it bleeds beyond that, leave a note and pick up next slot.
  • All work happens in main; no long-lived branches.
  • If a project completes, the entry here gets a strike-through and a link to the closing commit/issue.

Roster

1. Knowledge graph completeness

Scope: ~0.09% of last-7d rows are still untagged. The bulk is per-station metrics (mag_field_<STATION>, future per-station sounding metrics, etc.) that aren't matched by current auto_tag_rules. Walk the tag tree, identify gaps, propose new rules.

Per-slot deliverable: add or refine one regex rule, verify next audit shows coverage tick up.

Done when: tag coverage is ≥ 99.99% for four consecutive Monday audits.

Dependencies: none.

2. Datasource documentation pass

Scope: Many datasources.notes are null or auto-generated stubs. Each curated source deserves one human-readable paragraph: what it is, who runs it, what the data answers, known quirks, refresh cadence, license/attribution.

Per-slot deliverable: populate notes for 3-5 sources, commit.

Done when: every curated source (non-AutoSense) has a real notes field.

Dependencies: none.

3. Backfill gap matrix

Scope: Each curated source has a theoretical history (USGS earthquake → 1900; SPC tornado → 1950; IGRA → 1950s) and an actual coverage in TerraPulse. Build a markdown matrix: source × historical-start × current-start × gap-years × backfill-feasibility.

Per-slot deliverable: add one row to the matrix per session.

Done when: all curated sources are documented; Brad can pick the next backfill target from a single table.

Dependencies: none.

4. Cross-source correlation explorer

Scope: Proactively compute pairwise correlations across metric pairs to surface unexpected couplings as "research candidates" Brad can turn into proper paper issues. NOT papers themselves — just signal scouting.

Per-slot deliverable: one correlation matrix slice (e.g., all seismic × all space-weather), one one-paragraph writeup of the strongest non-trivial pair, committed to docs/research-candidates/.

Done when: open-ended; this project produces a renewable stream of candidate ideas.

Dependencies: the new (source_id, timestamp_utc) index — already in place.

Risk: must not present candidates as findings. Frame as "this pair correlates at r=X over period Y; needs a real paper to determine if it's spurious."

5. API surface polish

Scope: Every /api/v1/ endpoint should have a docstring, an example response, and a responses schema for OpenAPI. Critical for eventual monetization — the API is the product.

Per-slot deliverable: polish 2-3 endpoints.

Done when: the /docs page renders cleanly for an external developer with no internal context.

Dependencies: none.

6. Loader resilience survey

Scope: Audit each curated fetcher for retry/timeout/circuit-breaker patterns. Some inherit from BaseFetcher and get the standard treatment; some are older one-offs. Build a compliance matrix; migrate non-compliant fetchers to the standard.

Per-slot deliverable: audit 5 fetchers, migrate 1.

Done when: every curated fetcher uses the BaseFetcher retry/timeout pattern AND is wired through the LoaderRunner framework.

Dependencies: none.

7. Untapped public datasets survey

Scope: What major public climate datasets are NOT yet in TerraPulse? Examples: NOAA HRRR (sub-hourly CAPE — would unblock the #182 mechanism question), ERA5 reanalysis, MODIS LST, GHCN-Daily, NLDAS-2, CMIP6 outputs, GLDAS, etc. Build the inventory + onboarding effort estimate.

Per-slot deliverable: one datasource profiled (URL, format, auth, volume, refresh, relevance to active V-arcs).

Done when: 30 candidate sources are profiled; Brad has a prioritized roadmap.

Dependencies: none.

8. Live-event readiness drills

Scope: The Enid EF-4 paper got written the day of the storm because the GLM listener was live and the analysis pipeline already existed. We should be able to do the same for: hurricane landfall, M6+ earthquake near population, X-class solar flare, geomagnetic storm Kp≥8, major volcanic eruption. For each, draft the analysis-template-ready-to-go so the live event just needs to drop in.

Per-slot deliverable: one event-type template scaffolded under workspaces/templates/<event-type>/.

Done when: the five event types above each have a runnable template.

Dependencies: the relevant fetchers must be live and tested.

9. Memory hygiene

Scope: The .claude/projects/.../memory/ directory accumulates entries. Some go stale, some duplicate. Weekly pass to consolidate and remove.

Per-slot deliverable: review 5 memory entries, prune or merge.

Done when: ongoing, never "done." Aim for the index staying under 20 entries.

Dependencies: none.

10. Public-site SEO + brand fundamentals

Scope: terrapulse.info needs the basics for organic discovery and credibility. Open Graph tags, structured-data markup for papers (DataCatalog / ScholarlyArticle schema.org), sitemap.xml, robots.txt, canonical URLs, social preview images.

Per-slot deliverable: one fundamental added or improved.

Done when: site passes Lighthouse audit at ≥90 SEO and a manual eyeball check on Twitter Card preview / LinkedIn / Reddit.

Dependencies: none.


Part 3 — How to pick the next idle slice

When daily cadence is done and there's capacity, pick the next slice using this rule, in order:

  1. Any project a follow-up issue is blocking? If yes, that one. (E.g., if I file an issue under #1 and a follow-up needs the project to advance, prioritize it.)
  2. Any project where a one-slot push would close it? Closing a project is more valuable than opening a new one.
  3. Any project visibly stale (no commit in 14 days)? If yes, that one. Stale projects feel abandoned.
  4. Otherwise round-robin through the roster by the order above.

No project should run > 30 minutes per slot. If something needs more, file it as its own dedicated issue with a "needs deep-work session" label and resume during a planned block.


Part 4 — Token & cadence budget

The weekly cadence (Mon-Fri blocks) totals ~2-3 hours of focused work. Multi-day project slices add another ~30 min/day where capacity allows. So a healthy week looks like:

Day Cadence block Idle project slice (optional)
Mon Data integrity audit (~30 min) 1 slice (~30 min)
Tue Editorial review (~20 min) 1 slice (~30 min)
Wed Public surface QA (~20 min) 1 slice (~30 min)
Thu Infrastructure & cost (~30 min) 1 slice (~30 min)
Fri Brand readout (~30 min) — (focus on readout)

That's ~4-5 hours/week. Anything beyond that is paper-work (i.e., actually running a PMA paper from an issue), which is a separate budget item triggered by Brad filing an issue.

If a week's events drain tokens unusually hard (a live event, a paper, a backfill), idle slices get skipped first. Cadence blocks are the floor; idle slices are the ceiling.

Live Feed