Listening for events…

Loader resilience compliance matrix

Project #6 from docs/research-arcs-and-idle-projects.md, first slice. Audits every curated fetcher in src/terrapulse/ingestion/fetchers/ for compliance with the BaseFetcher resilience standard.

What "compliant" means

A fetcher is compliant when it:

  1. Inherits from BaseFetcher — gets source_name, timeout, max_retries, and _get_auth() for free
  2. Uses self._get(...) for HTTP — gets retry-with-backoff (max_retries × 2^attempt seconds), default timeout from settings, automatic auth-config injection, smart 4xx handling (don't retry permanent client errors except 429), structured logging with source/url/status/elapsed
  3. Implements only async def fetch(self) -> list[dict] — keeps responsibilities tight

The default standard (from src/terrapulse/config.py via settings):

  • Timeout: fetch_timeout_seconds (default 30 s)
  • Retries: fetch_max_retries (default 3)
  • Backoff: exponential (2, 4, 8 s between retries)

Compliant fetchers also get wired through the LoaderRunner framework automatically because the scheduler (src/terrapulse/ingestion/scheduler.py) calls run_fetch(fetcher) which delegates to LoaderRunner.run(fetcher). The runner produces loader_runs rows, updates loader_status heartbeats, applies the schema-drift detector, and feeds the Monday liveness audit.

Audit results (2026-05-13)

49 curated fetcher files audited (excluding base.py and autosense.py). After today's migration of the two stragglers:

Compliant: 49 of 49. Every curated fetcher inherits from BaseFetcher and uses self._get(). Full list below; nothing flagged.

Listeners — intentionally NOT BaseFetcher

The streaming listeners under scripts/ use long-lived WebSocket connections rather than periodic HTTP polls. They have their own resilience model (auto-reconnect with bounded backoff, per-strike DB inserts with separate broadcast):

  • scripts/glm_listener.py — GOES GLM lightning
  • scripts/blitzortung_listener.py — Blitzortung lightning
  • scripts/pulse_streamer.py — multi-source pulse-event emitter

These aren't in the BaseFetcher framework and shouldn't be — the patterns don't fit. Their own resilience is internal to each service and should be audited separately if a "streaming listener resilience standard" is ever defined.

What changed today (2026-05-13)

Two fetchers had been bypassing self._get():

  • fink_transients.py — was using raw httpx.AsyncClient(verify=False) to handle Fink Broker's TLS cert issue. Fix: added a verify: bool = True kwarg to BaseFetcher._get() so callers can pass verify=False for known-bad-cert upstreams while still getting standard retry/timeout/logging. fink_transients now uses self._get(..., verify=False).
  • usdm_drought.py — was iterating 15 states with custom per-state httpx calls. Fix: switched the inner call to self._get(). Kept the 0.5 s per-state sleep for politeness; the _get()-internal retry now handles per-state transient failures.

Both migrations preserve original behavior while gaining BaseFetcher's retry/timeout/logging.

Compliance by category (just for navigation)

  • Seismic / geophysics: usgs_earthquake, usgs_volcanoes, emsc, gfz_geofon, isc, magnetic_poles, smithsonian_gvp, intermagnet, gmn_meteors — all compliant
  • Space weather: goes_xray, dscovr_solar_wind, nasa_donki, noaa_space_weather, dst_index, silso_sunspots, auger_uhecr, hamqsl_propagation, nmdb_cosmic_rays, gwosc_status, gwosc_events — all compliant
  • Severe weather: nws_alerts, spc_reports, spc_outlook — all compliant
  • Hydrology: usgs_water, noaa_tides, usdm_drought (migrated today), open_meteo_flood — all compliant
  • Atmospheric: igra_soundings, open_meteo, open_meteo_aqi, open_meteo_marine, nasa_power, noaa_climate_indices, noaa_co2, geosphere — all compliant
  • Astronomy / orbital: celestrak, jpl_sbdb, neo, launch_library, ams_fireballs, fink_transients (migrated today), asf_sentinel — all compliant
  • Radio: wspr, lwa_spectra — all compliant
  • Misc: world_bank, gfw_fires, safecast, lunar_tidal, ibtracs — all compliant

What's NOT covered by this matrix

  • The AutoSense fetcher (autosense.py) is a meta-fetcher that handles the long tail of catalog sources. It has its own resilience semantics (per-format probing) and isn't appropriate to evaluate against the curated-fetcher standard.
  • Bulk backfill scripts (scripts/backfill_igra_history.py, etc.) use direct psycopg2 + httpx paths because they intentionally bypass the framework for one-off runs. Their resilience model is "run once, fail loud."
  • LoaderRunner internals — the runner itself handles failure-tracking, schema-drift detection, and heartbeat updates. Those weren't audited here; they're part of the framework being audited against.

Recommendations / follow-ups

None for the curated-fetcher layer — fully compliant after today's pass.

Potential future work:

  • Define a streaming-listener resilience standard (similar to BaseFetcher) for the three scripts/*_listener.py services so they're auditable as a class.
  • Add a test that runs import re; from pathlib import Path; ... (the audit script above) in CI to keep the matrix green by failing any PR that introduces a non-compliant fetcher.

Project #6 status: First slice done. Matrix is clean; second slice should pivot to defining the streaming-listener standard if Brad/Mike want that work prioritized, otherwise the project closes here.

Live Feed