Loader resilience compliance matrix
Project #6 from docs/research-arcs-and-idle-projects.md, first slice. Audits every curated fetcher in src/terrapulse/ingestion/fetchers/ for compliance with the BaseFetcher resilience standard.
What "compliant" means
A fetcher is compliant when it:
- Inherits from
BaseFetcher— getssource_name,timeout,max_retries, and_get_auth()for free - Uses
self._get(...)for HTTP — gets retry-with-backoff (max_retries×2^attemptseconds), default timeout from settings, automatic auth-config injection, smart 4xx handling (don't retry permanent client errors except 429), structured logging with source/url/status/elapsed - Implements only
async def fetch(self) -> list[dict]— keeps responsibilities tight
The default standard (from src/terrapulse/config.py via settings):
- Timeout:
fetch_timeout_seconds(default 30 s) - Retries:
fetch_max_retries(default 3) - Backoff: exponential (2, 4, 8 s between retries)
Compliant fetchers also get wired through the LoaderRunner framework automatically because the scheduler (src/terrapulse/ingestion/scheduler.py) calls run_fetch(fetcher) which delegates to LoaderRunner.run(fetcher). The runner produces loader_runs rows, updates loader_status heartbeats, applies the schema-drift detector, and feeds the Monday liveness audit.
Audit results (2026-05-13)
49 curated fetcher files audited (excluding base.py and autosense.py). After today's migration of the two stragglers:
Compliant: 49 of 49. Every curated fetcher inherits from BaseFetcher and uses self._get(). Full list below; nothing flagged.
Listeners — intentionally NOT BaseFetcher
The streaming listeners under scripts/ use long-lived WebSocket connections rather than periodic HTTP polls. They have their own resilience model (auto-reconnect with bounded backoff, per-strike DB inserts with separate broadcast):
scripts/glm_listener.py— GOES GLM lightningscripts/blitzortung_listener.py— Blitzortung lightningscripts/pulse_streamer.py— multi-source pulse-event emitter
These aren't in the BaseFetcher framework and shouldn't be — the patterns don't fit. Their own resilience is internal to each service and should be audited separately if a "streaming listener resilience standard" is ever defined.
What changed today (2026-05-13)
Two fetchers had been bypassing self._get():
fink_transients.py— was using rawhttpx.AsyncClient(verify=False)to handle Fink Broker's TLS cert issue. Fix: added averify: bool = Truekwarg toBaseFetcher._get()so callers can passverify=Falsefor known-bad-cert upstreams while still getting standard retry/timeout/logging. fink_transients now usesself._get(..., verify=False).usdm_drought.py— was iterating 15 states with custom per-state httpx calls. Fix: switched the inner call toself._get(). Kept the 0.5 s per-state sleep for politeness; the_get()-internal retry now handles per-state transient failures.
Both migrations preserve original behavior while gaining BaseFetcher's retry/timeout/logging.
Compliance by category (just for navigation)
- Seismic / geophysics:
usgs_earthquake,usgs_volcanoes,emsc,gfz_geofon,isc,magnetic_poles,smithsonian_gvp,intermagnet,gmn_meteors— all compliant - Space weather:
goes_xray,dscovr_solar_wind,nasa_donki,noaa_space_weather,dst_index,silso_sunspots,auger_uhecr,hamqsl_propagation,nmdb_cosmic_rays,gwosc_status,gwosc_events— all compliant - Severe weather:
nws_alerts,spc_reports,spc_outlook— all compliant - Hydrology:
usgs_water,noaa_tides,usdm_drought(migrated today),open_meteo_flood— all compliant - Atmospheric:
igra_soundings,open_meteo,open_meteo_aqi,open_meteo_marine,nasa_power,noaa_climate_indices,noaa_co2,geosphere— all compliant - Astronomy / orbital:
celestrak,jpl_sbdb,neo,launch_library,ams_fireballs,fink_transients(migrated today),asf_sentinel— all compliant - Radio:
wspr,lwa_spectra— all compliant - Misc:
world_bank,gfw_fires,safecast,lunar_tidal,ibtracs— all compliant
What's NOT covered by this matrix
- The AutoSense fetcher (
autosense.py) is a meta-fetcher that handles the long tail of catalog sources. It has its own resilience semantics (per-format probing) and isn't appropriate to evaluate against the curated-fetcher standard. - Bulk backfill scripts (
scripts/backfill_igra_history.py, etc.) use directpsycopg2+httpxpaths because they intentionally bypass the framework for one-off runs. Their resilience model is "run once, fail loud." - LoaderRunner internals — the runner itself handles failure-tracking, schema-drift detection, and heartbeat updates. Those weren't audited here; they're part of the framework being audited against.
Recommendations / follow-ups
None for the curated-fetcher layer — fully compliant after today's pass.
Potential future work:
- Define a streaming-listener resilience standard (similar to BaseFetcher) for the three
scripts/*_listener.pyservices so they're auditable as a class. - Add a test that runs
import re; from pathlib import Path; ...(the audit script above) in CI to keep the matrix green by failing any PR that introduces a non-compliant fetcher.
Project #6 status: First slice done. Matrix is clean; second slice should pivot to defining the streaming-listener standard if Brad/Mike want that work prioritized, otherwise the project closes here.