Anti-Sycophancy Protocol for AI-Assisted Research

How we prevent "delusional spiraling" in AI-assisted scientific research at TerraPulse.

The Problem

MIT researchers mathematically proved that AI sycophancy — the tendency to agree with users — creates "delusional spiraling." You ask the AI something, it agrees. You ask again, it agrees harder. You end up believing things that are false and can't tell it's happening.

This isn't hypothetical. We caught it happening in our own project.

Our Cases

Case 1: The 158x Fireball Panic

What happened: Our NEO fetcher ingested the CNEOS fireball catalog. The count showed 281 fireballs in March 2026. The monthly baseline was 1.8. We calculated a 158x anomaly and started planning a research paper on the "unprecedented fireball surge."

What was actually true: A bug in the orchestrator's dedup logic (48-hour lookback window vs 30-day fetch window) caused each of the 3 real fireballs to be duplicated ~110 times. March 2026 had 3 fireballs. That's +0.7σ — dead average.

How we caught it: We checked COUNT(DISTINCT timestamp, value, latitude, longitude) instead of COUNT(*). The 281 collapsed to 3. One SQL query killed the entire narrative.

Lesson: Always verify counts at the source level before building theories on top of them.

Case 2: The AMS "Mysterious Surge"

What happened: A news article claimed "more fireball reports in Q1 2026 than any Q1 since 2011" from the American Meteor Society. It was framed as evidence of increased asteroid activity.

What we found: AMS tracks citizen sighting reports (subjective). CNEOS tracks government sensor detections (objective). CNEOS Q1 2026: 4 fireballs, against a 4.3/quarter baseline. That's -0.1σ. The "surge" was in human attention (media feedback loop after the Ohio event), not in rocks from space.

Lesson: Distinguish between measurement systems. More reports ≠ more events.

Case 3: The CME That Never Arrived

What happened: An X1.5 solar flare — the largest in our 12.6M-reading GOES dataset — launched a 1,845 km/s CME. We predicted arrival at T+47h using the Gopalswamy (2001) empirical model. We monitored 7 data streams over 60 hours.

What actually happened: The CME never arrived. Solar wind stayed at 446 km/s. Bz stayed northward. Kp stayed at 2. The CME either wasn't Earth-directed or deflected.

Lesson: Most CMEs miss Earth. Our DONKI cascade paper found that CME speed alone has r = -0.18 with peak Kp. Speed doesn't predict geoeffectiveness. We reported the non-arrival with the same rigor as if it had arrived.

Case 4: The Solar Wind Shrinkage

What happened: V1 analysis showed solar wind speed vs earthquake magnitude: r = 0.54, p < 0.00001, with 74 hours of data. It looked like a strong, significant coupling between space weather and seismicity.

What V2 showed: With 1,800 hours of DSCOVR archive data, the correlation dropped to r = 0.09. Still statistically significant (large N), but explaining less than 1% of variance. The effect was 6x smaller than V1 suggested.

Lesson: r > 0.5 with N < 500 is a flag, not a finding. Always seek more data before publishing.

The Protocol

Rules (Non-Negotiable)

Check the data BEFORE agreeing with any claim. External article says "surge"? Run SELECT COUNT(*) before responding.
"That's noise" is a valid and complete response. Don't dress it up. Don't say "while the data doesn't fully support this, there are interesting aspects." Say "the data says no."
Correct your own errors immediately. Don't wait to be asked. The moment you realize a number is wrong, say so. The fireball correction was initiated by us, not by a reviewer.
A null result reported clearly is worth more than a positive result reported vaguely. Eight of our papers contain prominent null results. "We find no evidence for X" is a conclusion.
Never say "interesting!" about something that might be an artifact. Verify first, react second.
Flag strong results on small samples. Any r > 0.5 with N < 500 gets a V2 verification flag. Do not present it as a confirmed finding.
Apply Bonferroni when testing > 3 pairs. In our Granger causality network (72 pairs), more than half of nominal significances were false positives. Report both raw and corrected.
Report effect sizes, not just p-values. With 37M observations, any nonzero effect becomes "significant." The question is always: how much variance does it explain?
Distinguish between data systems. AMS reports ≠ CNEOS detections. NUFORC sightings ≠ radar tracks. Citizen observations ≠ calibrated instruments.
When the AI and the data disagree, the data wins. Every time. No exceptions.

Implementation for Claude Code Projects

In CLAUDE.md or PMA Context

Add this to your project's AI instruction file:

### Anti-Sycophancy

When analysis results contradict a hypothesis or external claim:
- Report the contradiction immediately and clearly
- Do not soften null results with hedging language
- Do not suggest "further investigation" when the data is conclusive
- Present null results with the same formatting and emphasis as positive results
- If you realize a previous statement was wrong, correct it in the next response

When the user shares an exciting claim:
- Cross-check against available data BEFORE responding
- If the data doesn't support it, say so directly
- "The data says no" is a complete sentence

In Your Analysis Pipeline

Build verification into the code:

# ALWAYS deduplicate before counting
unique_count = df.select(pl.col("timestamp", "value", "latitude").unique()).height
raw_count = len(df)
if raw_count > unique_count * 1.1:
    print(f"WARNING: {raw_count - unique_count} duplicates detected ({raw_count}→{unique_count})")

# ALWAYS compare against baseline
baseline = historical_monthly_mean
sigma = (current_count - baseline) / historical_std
if abs(sigma) < 2:
    print(f"WITHIN NORMAL RANGE ({sigma:+.1f}σ)")
    # DO NOT proceed to build a theory about why the count is high/low

In Your Review Process

We run a two-person review loop:

Elise (Paper Machine Agent) writes the analysis
Mike (Science Editor Agent) reviews against these standards
Mike checks every number against results.json
Mike flags any claim not supported by the data
The paper doesn't ship until Mike says ACCEPT

This catches errors like the Granger F-statistic mismatch (comment said F=5.87, data said F=1.92) before they become part of the record.

The Meta-Lesson

The sycophancy trap is most dangerous when the AI is doing exactly what you want it to do — finding patterns, building narratives, getting excited about results. The antidote isn't skepticism (which paralyzes). It's verification (which empowers).

Every claim gets a COUNT(*). Every correlation gets a Bonferroni check. Every V1 gets a V2. Every "surge" gets a baseline comparison. The data doesn't care about our hypothesis. That's its greatest virtue.

From the TerraPulse project — a climate intelligence platform processing 38M+ observations across 158 metrics from 363 sources. Built with Claude Code.

Read more: terrapulse.info | GitHub