Research arcs and idle projects

A companion to weekly-cadence.md. The cadence covers the must-dos. This doc covers what to dip into during the rest of the day — slow-burn, multi-day projects that pay off the long-term TerraPulse ambition without draining tokens on any single session.

Part 1 — How paper topics actually get chosen

The short version: Brad files the issues, I execute against them. But that's not the whole picture, because the which of what Brad files isn't random. The V-series papers follow a structure that's worth making explicit so you and Brad can debate the next move with shared vocabulary.

The V-series arc

Every line of research at TerraPulse follows the same falsification ladder:

V1 — Observe. A signal shows up in the data. The paper documents it: what we saw, when, how strong. Often INCONCLUSIVE in some narrow sense because the V1 is allowed to over-claim — that's what V2+ is for.
V2 — Replicate. Re-run on independent data (different time window, different geography, different stations). If the effect size shrinks dramatically, the V1 was probably overfit. The TerraPulse rule is: any V1 with |d|>0.5 at N<200 gets a mandatory V2 with bigger N.
V3 — Control. Introduce a known confounder and check if the signal survives after controlling for it. The Lifted Index control (#181 → #183) was a V3 against the WSPR precursor — H1 (survives control) vs H0 (LI was the actual cause).
V4 — Mechanism. What physically produces the signal? The WSPR V4 paper tested D-layer disturbance vs cell-local QRN as competing mechanisms.
V5 — Generalize. Does it hold for adjacent classes? The V5 paper checked if the WSPR precursor was tornado-specific or just any-severe-weather. Answer: any-severe.

Each step is a test of the most threatening alternative explanation to whatever the previous step claimed. Brad's instinct picks the threat to test next. The skill — and frankly the moat — is in knowing which threat is most credible at each step.

What I bring vs what Brad brings

Brad brings: the science instinct, the choice of which V-step is most worth running, the framing of hypothesis vs null, the pre-registered claim threshold, the historical context for what's already been shown in related fields.
I bring: mechanical extraction, statistical execution to spec, honest verdict against the threshold, paper-writing to revtex template, never upgrading INCONCLUSIVE to H1 just because a result "looks interesting."
Mike brings: editorial review, public-voice polish, audit trail.

Candidate-threats first pass (middle ground)

To save Brad cycles without removing his role, Claude produces a candidate-threats list within ~24 hours of any V-paper shipping. Brad reads it, picks the threat (or rejects all of them in favor of his own), Mike weighs in on what would actually publish well, then a new issue gets filed for the next V-step.

The list is a first draft, not a recommendation. Claude's role is to enumerate plausible alternative explanations and assess operational feasibility. The judgment of which threat is most credible to test stays with Brad — that's the part that doesn't automate and shouldn't.

Format. Posted as a comment on the just-shipped paper's issue. Sections:

What V1 claimed. One sentence restating the finding so the threats are anchored to it.
Credible threats (~3-5). Each gets: name, one-paragraph mechanism for how it could fully or partially explain the V1 effect, what data would test it, whether we have that data today, rough effort estimate.
Less-credible threats (~3-5). Brief — one line each. Listed for completeness so Brad can confirm we're not blind to them.
Weak threats (~3-5). Even briefer. Listed so Brad can see what Claude already discounted and challenge if he disagrees.
Claude's gut pick. One sentence: which credible threat Claude would test first if forced. Brad ignores this freely; it just makes Claude's reasoning legible.

What Brad does with it. Rank, edit, dismiss, or replace. The output of Brad's pass is one filed issue: the next V-paper, with hypothesis + method + threshold per the existing PMA issue template.

What this is NOT. It is not a paper proposal. It is not a finding. It is not Claude picking the next research direction. It is a structured first draft to make Brad's threat-selection step faster.

How new V-arcs get started

A new arc starts when one of three things happens:

A new datasource comes online and unlocks a test that couldn't be run before. IGRA radiosondes unlocked CAPE × WSPR (#182) and LI control (#183). When NASA FIRMS gets onboarded, a wildfire-WSPR coupling arc becomes possible.
A live event is captured that's worth a standalone analysis. Enid EF-4 in 2026-04 was the template — paper #33 written same-day from the live GLM capture.
A prior V-series result gets challenged externally — though this hasn't happened yet, it would force a V-next.

How long-term ambitions shape the queue

The site exists to monetize climate intelligence on X. That biases the paper queue toward findings that are:

Visual — a figure that reads in a single glance is worth 10 papers of pure tables.
Surprising but defensible — "WSPR precursor independent of LI in the highly-unstable regime" is shareable; "we ran a t-test and p=0.04" is not.
Repeatable — a finding we can re-run as new data arrives is a renewable content asset.
Linked to active weather — papers shipped within hours of a real-world event get exponentially more reach than retrospectives.

This is why the V-series gets prioritized over one-off curiosity papers: each entry in a V-arc compounds the credibility of the prior entries.

Part 2 — Multi-day idle projects

These are projects to dip into when the day's cadence block is done and there's spare capacity. Rules:

Touch one project per idle slot. Don't context-switch.
Each session produces a visible artifact: a commit, a filed issue, a checked-off subtask.
Time-box to 30 min per slot. If it bleeds beyond that, leave a note and pick up next slot.
All work happens in main; no long-lived branches.
If a project completes, the entry here gets a strike-through and a link to the closing commit/issue.

Roster

1. Knowledge graph completeness

Scope: ~0.09% of last-7d rows are still untagged. The bulk is per-station metrics (mag_field_<STATION>, future per-station sounding metrics, etc.) that aren't matched by current auto_tag_rules. Walk the tag tree, identify gaps, propose new rules.

Per-slot deliverable: add or refine one regex rule, verify next audit shows coverage tick up.

Done when: tag coverage is ≥ 99.99% for four consecutive Monday audits.

Dependencies: none.

2. Datasource documentation pass

Scope: Many datasources.notes are null or auto-generated stubs. Each curated source deserves one human-readable paragraph: what it is, who runs it, what the data answers, known quirks, refresh cadence, license/attribution.

Per-slot deliverable: populate notes for 3-5 sources, commit.

Done when: every curated source (non-AutoSense) has a real notes field.

Dependencies: none.

3. Backfill gap matrix

Scope: Each curated source has a theoretical history (USGS earthquake → 1900; SPC tornado → 1950; IGRA → 1950s) and an actual coverage in TerraPulse. Build a markdown matrix: source × historical-start × current-start × gap-years × backfill-feasibility.

Per-slot deliverable: add one row to the matrix per session.

Done when: all curated sources are documented; Brad can pick the next backfill target from a single table.

Dependencies: none.

4. Cross-source correlation explorer

Scope: Proactively compute pairwise correlations across metric pairs to surface unexpected couplings as "research candidates" Brad can turn into proper paper issues. NOT papers themselves — just signal scouting.

Per-slot deliverable: one correlation matrix slice (e.g., all seismic × all space-weather), one one-paragraph writeup of the strongest non-trivial pair, committed to docs/research-candidates/.

Done when: open-ended; this project produces a renewable stream of candidate ideas.

Dependencies: the new (source_id, timestamp_utc) index — already in place.

Risk: must not present candidates as findings. Frame as "this pair correlates at r=X over period Y; needs a real paper to determine if it's spurious."

5. API surface polish

Scope: Every /api/v1/ endpoint should have a docstring, an example response, and a responses schema for OpenAPI. Critical for eventual monetization — the API is the product.

Per-slot deliverable: polish 2-3 endpoints.

Done when: the /docs page renders cleanly for an external developer with no internal context.

Dependencies: none.

6. Loader resilience survey

Scope: Audit each curated fetcher for retry/timeout/circuit-breaker patterns. Some inherit from BaseFetcher and get the standard treatment; some are older one-offs. Build a compliance matrix; migrate non-compliant fetchers to the standard.

Per-slot deliverable: audit 5 fetchers, migrate 1.

Done when: every curated fetcher uses the BaseFetcher retry/timeout pattern AND is wired through the LoaderRunner framework.

Dependencies: none.

7. Untapped public datasets survey

Scope: What major public climate datasets are NOT yet in TerraPulse? Examples: NOAA HRRR (sub-hourly CAPE — would unblock the #182 mechanism question), ERA5 reanalysis, MODIS LST, GHCN-Daily, NLDAS-2, CMIP6 outputs, GLDAS, etc. Build the inventory + onboarding effort estimate.

Per-slot deliverable: one datasource profiled (URL, format, auth, volume, refresh, relevance to active V-arcs).

Done when: 30 candidate sources are profiled; Brad has a prioritized roadmap.

Dependencies: none.

8. Live-event readiness drills

Scope: The Enid EF-4 paper got written the day of the storm because the GLM listener was live and the analysis pipeline already existed. We should be able to do the same for: hurricane landfall, M6+ earthquake near population, X-class solar flare, geomagnetic storm Kp≥8, major volcanic eruption. For each, draft the analysis-template-ready-to-go so the live event just needs to drop in.

Per-slot deliverable: one event-type template scaffolded under workspaces/templates/<event-type>/.

Done when: the five event types above each have a runnable template.

Dependencies: the relevant fetchers must be live and tested.

9. Memory hygiene

Scope: The .claude/projects/.../memory/ directory accumulates entries. Some go stale, some duplicate. Weekly pass to consolidate and remove.

Per-slot deliverable: review 5 memory entries, prune or merge.

Done when: ongoing, never "done." Aim for the index staying under 20 entries.

Dependencies: none.

10. Public-site SEO + brand fundamentals

Scope: terrapulse.info needs the basics for organic discovery and credibility. Open Graph tags, structured-data markup for papers (DataCatalog / ScholarlyArticle schema.org), sitemap.xml, robots.txt, canonical URLs, social preview images.

Per-slot deliverable: one fundamental added or improved.

Done when: site passes Lighthouse audit at ≥90 SEO and a manual eyeball check on Twitter Card preview / LinkedIn / Reddit.

Dependencies: none.

Part 3 — How to pick the next idle slice

When daily cadence is done and there's capacity, pick the next slice using this rule, in order:

Any project a follow-up issue is blocking? If yes, that one. (E.g., if I file an issue under #1 and a follow-up needs the project to advance, prioritize it.)
Any project where a one-slot push would close it? Closing a project is more valuable than opening a new one.
Any project visibly stale (no commit in 14 days)? If yes, that one. Stale projects feel abandoned.
Otherwise round-robin through the roster by the order above.

No project should run > 30 minutes per slot. If something needs more, file it as its own dedicated issue with a "needs deep-work session" label and resume during a planned block.

Part 4 — Token & cadence budget

The weekly cadence (Mon-Fri blocks) totals ~2-3 hours of focused work. Multi-day project slices add another ~30 min/day where capacity allows. So a healthy week looks like:

Day	Cadence block	Idle project slice (optional)
Mon	Data integrity audit (~30 min)	1 slice (~30 min)
Tue	Editorial review (~20 min)	1 slice (~30 min)
Wed	Public surface QA (~20 min)	1 slice (~30 min)
Thu	Infrastructure & cost (~30 min)	1 slice (~30 min)
Fri	Brand readout (~30 min)	— (focus on readout)

That's ~4-5 hours/week. Anything beyond that is paper-work (i.e., actually running a PMA paper from an issue), which is a separate budget item triggered by Brad filing an issue.

If a week's events drain tokens unusually hard (a live event, a paper, a backfill), idle slices get skipped first. Cadence blocks are the floor; idle slices are the ceiling.