Listening for events…

TerraPulse Data Source Handbook

Living document. Last updated: 2026-03-17.

Every source listed here follows the same onboarding pattern: FetcherDuckDB stageNormalizerPostgreSQLAPI/Admin


Quick Reference

Source Auth Format Interval Status
USGS Earthquake None GeoJSON 60s Active
USGS Water Services None JSON 5min Active
Open-Meteo Weather None JSON 10min Active
Safecast Radiation None JSON 5min Active
World Bank Climate None JSON 6hr Active
GFW Fire Alerts API key? JSON -- Disabled (403)
Open-Meteo Air Quality None JSON 10min Active
Open-Meteo Flood/River None JSON 15min Active
Open-Meteo Marine None JSON 15min Active
USDM Drought Monitor None JSON/CSV Weekly Active
NOAA Tides & Currents None JSON 6min Active
NASA POWER None JSON Daily Active
NOAA Climate Indices None Text Monthly Active
NOAA Space Weather None JSON 5min Active
ASF Sentinel-1 SAR None GeoJSON 1hr Active
NWS Alerts User-Agent GeoJSON 60s Phase 2
NASA FIRMS MAP_KEY CSV/JSON 60s Phase 2
EPA AirNow API key JSON Hourly Phase 2
OpenAQ API key JSON Hourly Phase 2
NOAA CDO Token JSON Daily Phase 2
PurpleAir API key JSON 2min Phase 2
NOAA CO₂ / Keeling Curve None CSV Daily Stretch
NSIDC Arctic Ice None CSV Daily Stretch
NIFC Fire Perimeters None GeoJSON Real-time Stretch
NOAA Hurricane Center None GeoJSON Seasonal Stretch
EPA UV Index None JSON Daily Stretch
Water Quality Portal None CSV/JSON Historical Stretch
ERDDAP/CoastWatch None Many Varies Stretch
CHIRPS Precipitation None GeoTIFF Daily Stretch

Active Sources

USGS Earthquake

  • Slug: usgs_earthquake
  • Base URL: https://earthquake.usgs.gov/fdsnws/event/1/query
  • Auth: None
  • Format: GeoJSON
  • Rate limit: Max 20,000 events per request
  • Schedule: Every 60s

What we fetch: Last hour of earthquake events globally, up to 500 per fetch.

Key params:

  • format=geojson
  • starttime — ISO datetime, 1 hour back
  • orderby=time
  • limit=500
  • Spatial: minlatitude/maxlatitude/minlongitude/maxlongitude (bbox) or latitude/longitude/maxradiuskm (circle)
  • Magnitude: minmagnitude/maxmagnitude
  • Depth: mindepth/maxdepth

Normalized fields: earthquake_magnitude metric, magnitude value, lat/lon/depth, event metadata in extra_json.

Gotchas:

  • Coordinates array is [longitude, latitude, depth] — lon first
  • For high-volume use, USGS recommends the summary feeds (e.g. all_hour.geojson) over the query endpoint
  • Event time is epoch milliseconds

Endpoints:

  • /query — main data query
  • /count — count matching events
  • /catalogs, /contributors — enumeration

USGS Water Services

  • Slug: usgs_water
  • Base URL: https://waterservices.usgs.gov/nwis/iv/
  • Auth: None
  • Format: JSON (WaterML 1.1 structure)
  • Rate limit: IP-based blocking for excessive use
  • Schedule: Every 5min

What we fetch: Instantaneous streamflow readings (param 00060) for California, last hour.

Key params:

  • format=json
  • parameterCd=00060 (discharge, cubic ft/sec) — also: 00065 (gage height), 00010 (water temp)
  • period=PT1H (ISO 8601 duration)
  • siteStatus=active
  • Site filter (pick one): stateCd, sites (max 100), huc, bBox (max 25-degree product), countyCd (max 20)

Normalized fields: streamflow_00060 metric, value in ft³/s, site metadata in extra_json.

Gotchas:

  • Currently scoped to stateCd=CA — parameterize for broader coverage
  • Response is deeply nested: value.timeSeries[].sourceInfo.geoLocation.geogLocation
  • Bbox lat×lon product cannot exceed 25 degrees
  • Data is provisional: "subject to revision"
  • Readings arrive ~every 15 minutes, transmitted hourly

Open-Meteo Weather

  • Slug: open_meteo
  • Base URL: https://api.open-meteo.com/v1/forecast
  • Auth: None (non-commercial)
  • Format: JSON
  • Rate limit: Reasonable use expected
  • Schedule: Every 10min

What we fetch: Current weather for 10 global cities (temperature, humidity, wind, precipitation, weather code).

Key params:

  • latitude/longitude — comma-separated for multiple locations
  • current=temperature_2m,relative_humidity_2m,wind_speed_10m,precipitation,weather_code
  • timezone=UTC
  • Also available: hourly, daily, forecast_days, past_days, models

Normalized fields: temperature_2m metric in °C, city + humidity/wind/precipitation in extra_json.

Sub-APIs on separate domains (same auth/format, ready for expansion):

Sub-API Domain Data
Air Quality air-quality-api.open-meteo.com/v1/air-quality PM2.5, PM10, O3, NO2, SO2, CO, AQI, pollen
Marine marine-api.open-meteo.com/v1/marine Wave height/period, swell, SST, currents
Flood flood-api.open-meteo.com/v1/flood River discharge (GloFAS), global 5km, 1984–present + 7mo forecast
Historical archive-api.open-meteo.com/v1/archive Past weather observations
Ensemble ensemble-api.open-meteo.com/v1/ensemble Probabilistic forecasts

Gotchas:

  • Multiple coordinates → response is an array, not a single object
  • AQ European domain is 11km/hourly; global is 45km/3-hourly
  • Flood API: "Due to 5km resolution, the closest river might not be selected correctly. Varying coordinates by ±0.1° can help"
  • Commercial use requires API key with customer- URL prefix

Safecast Radiation

  • Slug: safecast
  • Base URL: https://api.safecast.org/measurements.json
  • Auth: None (GET requests)
  • Format: JSON
  • Rate limit: Not documented
  • Schedule: Every 5min

What we fetch: Latest 500 radiation measurements globally.

Key params:

  • order=created_at desc
  • per_page=500
  • Spatial: latitude, longitude, distance (km radius)
  • Temporal: captured_after/captured_before, since/until

Normalized fields: radiation metric, value in CPM, device + location in extra_json.

Gotchas:

  • License is Creative Commons Non-Commercial
  • Citizen science data — quality is variable
  • Heavy concentration in Japan (post-Fukushima), US, Europe
  • bgeigie_imports endpoint has bulk uploads from handheld sensors
  • Date param format: YYYY-MM-DD HH:MM

World Bank Climate

  • Slug: world_bank
  • Base URL: https://api.worldbank.org/v2/
  • Auth: None
  • Format: JSON (returns [metadata, data_array])
  • Rate limit: Not documented
  • Schedule: Every 6 hours

What we fetch: 5 climate indicators × 15 countries × 6 years (2018–2023):

  • EN.ATM.CO2E.PC — CO₂ emissions (metric tons per capita)
  • AG.LND.FRST.ZS — Forest area (% of land area)
  • EG.USE.PCAP.KG.OE — Energy use (kg oil equiv per capita)
  • EN.ATM.METH.KT.CE — Methane emissions (kt CO₂ equiv)
  • SP.POP.TOTL — Total population

Expand with:

  • EN.ATM.GHGO.KT.CE — Other GHG emissions
  • AG.LND.ARBL.ZS — Arable land %
  • EG.ELC.RNEW.ZS — Renewable electricity %

Key params:

  • /country/{CODES}/indicator/{ID}?format=json&date=2018:2023&per_page=500
  • Country codes semicolon-separated: USA;CHN;IND;BRA;...
  • per_page max 1000

Normalized fields: wb_{indicator_id} metric, country + year in extra_json. No lat/lon (country-level data).

Gotchas:

  • Response is always [metadata_object, data_array] — use data[1]
  • Must include v2 in URL (v1 discontinued June 2020)
  • Many indicator values are null for recent years (1–3 year data lag)
  • 189 countries, ~16,000 indicators, 45+ databases

Ready to Onboard (No Auth)

Open-Meteo Air Quality

  • Base URL: https://air-quality-api.open-meteo.com/v1/air-quality
  • Auth: None
  • Format: JSON (same structure as weather API)

Same fetcher pattern as Open-Meteo Weather. Key params:

?latitude=40.71,-34.05&longitude=-74.01,151.21
&current=us_aqi,pm2_5,pm10,ozone,nitrogen_dioxide,sulphur_dioxide,carbon_monoxide
&timezone=UTC

Variables: us_aqi, european_aqi, pm2_5, pm10, ozone, nitrogen_dioxide, sulphur_dioxide, carbon_monoxide, dust, uv_index, aerosol_optical_depth, various pollen types (Europe only).


Open-Meteo Flood / River Discharge

  • Base URL: https://flood-api.open-meteo.com/v1/flood
  • Auth: None
  • Format: JSON
?latitude=47.37&longitude=8.55&daily=river_discharge&forecast_days=7

Global 5km resolution, GloFAS data. River discharge in m³/s. Ensemble stats available (river_discharge_mean, river_discharge_median, river_discharge_max, river_discharge_min). Historical from 1984.


Open-Meteo Marine

  • Base URL: https://marine-api.open-meteo.com/v1/marine
  • Auth: None
  • Format: JSON
?latitude=54.32&longitude=10.12
&current=wave_height,wave_period,wave_direction,ocean_current_velocity,ocean_current_direction

Wave data, swell components, sea surface temperature, sea level, ocean currents.


USDM Drought Monitor

  • Base URL: https://usdmdataservices.unl.edu/api/
  • Auth: None
  • Format: JSON (via Accept: application/json header), CSV, XML

Endpoints:

  • Comprehensive stats (area, population, DSCI by region + date range)
  • Stats by drought threshold (D0–D4)
  • Consecutive weeks in drought

Params: aoi (state FIPS, county, HUC, etc.), startdate/enddate, statisticsType, dx (drought level 0–4)

Data: Drought category (D0–D4 = Abnormally Dry → Exceptional Drought), percent area, DSCI index.

Coverage: US only. Weekly updates (Thursdays).

Gotchas: Date format is M/D/YYYY (unusual).


NOAA Tides & Currents

  • Base URL: https://api.tidesandcurrents.noaa.gov/api/prod/datagetter
  • Auth: None (optional application= param for logging)
  • Format: JSON, XML, CSV
?station=9414290&begin_date=20260316&end_date=20260316
&product=water_level&datum=MLLW&units=metric&time_zone=gmt&format=json

Products: Water levels, tide predictions, air/water temperature, wind, barometric pressure, humidity, salinity, currents, conductivity.

Data limits by interval:

  • 1-minute: 4 days
  • 6-minute: 1 month
  • Hourly: 1 year
  • Daily: 10 years

Coverage: US coastal + Great Lakes stations.

Why it matters: Sea level rise monitoring, coastal flooding, storm surge.


NASA POWER

  • Base URL: https://power.larc.nasa.gov/api/temporal/daily/point
  • Auth: None
  • Format: JSON, CSV, NetCDF, ASCII
?parameters=T2M,PRECTOTCORR,ALLSKY_SFC_SW_DWN&community=RE
&latitude=40.71&longitude=-74.01&start=20260101&end=20260315&format=json

200+ parameters including temperature, precipitation, solar irradiance, wind, humidity, soil moisture. Global 0.5° grid (~50km), MERRA-2 reanalysis. 1981–present.

Why it matters: Long historical climate baselines, solar resource data, global coverage.

Gotchas: Fill value -999.0 for missing data — our normalizer already handles this.


NOAA Climate Indices

  • Auth: None (static files)
  • Format: Space-delimited text
Index URL Update
MEI v2 (ENSO) https://psl.noaa.gov/enso/mei/data/meiv2.data Monthly
ONI https://www.cpc.ncep.noaa.gov/data/indices/oni.ascii.txt Monthly
SOI https://www.cpc.ncep.noaa.gov/data/indices/soi Monthly
NAO https://www.cpc.ncep.noaa.gov/products/precip/CWlink/pna/norm.nao.monthly.b5001.current.ascii Monthly
PDO https://www.ncei.noaa.gov/pub/data/cmb/ersst/v5/index/ersst.v5.pdo.dat Monthly

These are the core climate oscillation indices that drive global weather patterns. Simple text parsing, low-volume, essential context.


NOAA Space Weather

  • Base URL: https://services.swpc.noaa.gov/products/
  • Auth: None
  • Format: JSON (arrays of arrays)
Product Endpoint Data
Kp Index noaa-planetary-k-index.json Geomagnetic activity (current)
Kp Forecast noaa-planetary-k-index-forecast.json 3-day Kp forecast
Dst Index kyoto-dst.json Disturbance storm time
Solar Wind solar-wind/mag-*.json Magnetometer data
Solar Flares flares/ Flare events
Alerts alerts.json Active space weather alerts

Coverage: Global (space weather affects the whole planet). Niche but relevant — geomagnetic storms affect power grids and comms.


ASF Sentinel-1 SAR

  • Slug: asf_sentinel
  • Base URL: https://api.daac.asf.alaska.edu/services/search/param
  • Auth: None (searches are public)
  • Format: GeoJSON
  • Rate limit: Max 2,000 results per query; be polite with request spacing
  • Schedule: Every hour

What we fetch: Sentinel-1 GRD_HD (Ground Range Detected, High Density) SAR scenes from 5 climate-relevant regions — Arctic/Svalbard, Greenland Ice Sheet, Amazon Basin, Gulf Coast US, Southeast Asia. Rolling 3-day window, 20 scenes per region max.

Key params:

  • platform=Sentinel-1
  • processingLevel=GRD_HD — 10m resolution, reduced speckle
  • beamMode=IW — Interferometric Wide, 250km swath
  • intersectsWith=POLYGON(...) — WKT spatial filter
  • output=geojson
  • maxResults=20

Normalized fields: sar_scene metric, value = file size in MB, center lat/lon, orbit/path/frame metadata in extra_json.

Climate applications of Sentinel-1 SAR:

  • Flood mapping — works through clouds, day/night (GRD products)
  • Sea ice monitoring — concentration, type, lead/ridge detection
  • Glacier velocity — ice sheet flow speed via InSAR (SLC products)
  • Ground subsidence — millimeter-scale deformation detection
  • Forest change — deforestation, biomass estimation via VH cross-pol
  • Urban damage assessment — post-disaster building damage

Other platforms available: ALOS PALSAR, ALOS-2, ERS-1/2, JERS-1, RADARSAT-1/2, NISAR (upcoming)

Gotchas:

  • Use platform= not dataset= (parameter names are case-sensitive)
  • Unbounded queries over 2,000 results will error
  • Downloads from datapool.asf.alaska.edu require no credentials
  • AWS S3 access requires NGAP credentials (optional)
  • Added to TerraPulse by Michael Isenbek via the admin UI

Phase 2 (Free API Key Required)

NASA FIRMS

  • Base URL: https://firms.modaps.eosdis.nasa.gov/api/area/
  • Auth: MAP_KEY (free via Earthdata login)
  • Format: CSV, JSON, KML, shapefiles
  • Rate limit: 5,000 transactions / 10 minutes
/api/area/csv/{MAP_KEY}/VIIRS_SNPP_NRT/{west},{south},{east},{north}/1

Sources: MODIS, VIIRS (SNPP, NOAA-20, NOAA-21), Landsat. Ultra Real-Time within 60 seconds (North America). This is the upstream source for GFW fire data — better to use directly.

Day range: 1–5 only per query.


NWS Alerts

  • Base URL: https://api.weather.gov/
  • Auth: User-Agent header required — (TerraPulse, contact@terrapulse.info)
  • Format: GeoJSON
/alerts/active?area=CA
/points/{lat},{lon} → /gridpoints/{office}/{x},{y}/forecast

Two-step process for forecasts: First /points to get grid coordinates, then /gridpoints for data.


EPA AirNow

  • Base URL: https://www.airnowapi.org/aq/
  • Auth: API key (free registration)
  • Format: JSON, XML, CSV
  • Rate limit: 500 requests/hour

Endpoints for current observations (by bbox, zip, lat/lon), forecasts, historical. US/Canada/Mexico, 2,500+ monitoring stations.


OpenAQ

  • Base URL: https://api.openaq.org/v3/
  • Auth: API key (free registration)
  • Format: JSON

160+ countries, 40,000+ locations. PM2.5, PM10, O3, NO2, SO2, CO, BC. Broader global coverage than EPA AirNow. Also has free AWS S3 open data archive for bulk downloads.


PurpleAir

  • Base URL: https://api.purpleair.com/
  • Auth: API key (via develop.purpleair.com)
  • Format: JSON

100,000+ low-cost sensors. Real-time PM2.5 (2-minute updates). Highest density AQ network, hyperlocal data.


Stretch Goals

ERDDAP / CoastWatch

  • Base URL: https://coastwatch.pfeg.noaa.gov/erddap/
  • Auth: None
  • Format: JSON, CSV, NetCDF, KML, many more

3,084+ datasets: SST, chlorophyll-a, ocean color, wave data, currents, salinity, wind, ocean heat content. griddap for gridded data, tabledap for station data. The premier open ocean data server.

CHIRPS Precipitation

  • Base URL: https://data.chc.ucsb.edu/products/CHIRPS-2.0/
  • Auth: None
  • Format: GeoTIFF, NetCDF (file downloads, no REST API)

Global precipitation, 5km resolution, 1981–present. Best free high-res precipitation for developing regions. Would need a file-download fetcher rather than the standard HTTP JSON approach.

NOAA CO₂ / Keeling Curve

  • URL: https://gml.noaa.gov/ccgg/trends/co2/co2_daily_mlo.csv
  • Auth: None
  • Format: CSV flat file (no REST API)

Daily Mauna Loa CO₂ readings. No formal API — build a cron fetcher that downloads and parses the CSV. Scripps CO₂ Program publishes the same data. Essential for the "Big Picture" view. See also: Copernicus CAMS for global GHG (Python CDS API, free registration).

NSIDC Arctic Ice Extent

  • URL: https://nsidc.org/data/seaice_index/
  • Auth: None
  • Format: CSV, GeoTIFF, shapefiles

Daily Arctic/Antarctic sea ice extent and area. Key climate indicator. Data files downloadable, some via ERDDAP.

NIFC Fire Perimeters (ArcGIS)

  • URL: https://services3.arcgis.com/T4QMspbfLg3qTGWY/arcgis/rest/services/
  • Auth: None
  • Format: GeoJSON via ArcGIS REST API

Real-time fire perimeters, containment percentages, and incident data from the National Interagency Fire Center. Complements point-based FIRMS hotspot data with polygon perimeters.

NOAA National Hurricane Center

  • URL: https://www.nhc.noaa.gov/gis/
  • Auth: None
  • Format: GeoJSON, KML, shapefiles

Active tropical cyclone tracks, forecast cones, wind speed probabilities, storm surge. Published as GeoJSON/KML feeds during hurricane season. Seasonal but critical for alerts.

EPA Envirofacts UV Index

  • URL: https://enviro.epa.gov/enviro/efservice/
  • Auth: None
  • Format: JSON, XML, CSV

Daily UV index forecasts by zip code. Free, reasonable rate limits. Open-Meteo also includes hourly UV in its air quality API as an alternative.

Water Quality Portal (USGS/EPA)

  • URL: https://www.waterqualitydata.us/
  • Auth: None
  • Format: CSV, JSON, XML

Joint USGS/EPA database with 430M+ records. Primarily historical rather than real-time. Parameters include nutrients, metals, pesticides, bacteria. Stretch goal — useful for trend analysis.


Commercial Aggregators (Reference)

These are documented for future reference if we need to fill coverage gaps.

Aggregator Best For Free Tier Paid
IQAir/AirVisual Consumer-grade AQI + health recs 500 calls/day $399/mo (100K/day)
OpenWeatherMap Budget commercial weather 1K calls/day (One Call 3.0) €0.14/100 overage
Tomorrow.io Advanced weather intelligence 500 calls/day (core) Custom enterprise
Ambee Multi-category environmental 100 records/day trial Custom
OpenUV Dedicated UV data 50 calls/day Paid plans

Strategy from project docs: Start with free government APIs as backbone, add Open-Meteo for dev, OpenWeatherMap One Call 3.0 for production. This covers all 10 environmental categories at minimal cost.


Onboarding Checklist

When adding a new data source:

  1. Research — document the API in this handbook (URL, auth, format, gotchas)
  2. Fetcher — create src/terrapulse/ingestion/fetchers/{slug}.py inheriting BaseFetcher
  3. Normalizer — add normalize_{slug}() to src/terrapulse/ingestion/normalizer.py
  4. Orchestrator — register normalizer in NORMALIZERS dict in orchestrator.py
  5. Seed — add entry to MVP_SOURCES in src/terrapulse/db/seed.py
  6. Scheduler — add job to JOBS list in scheduler.py
  7. Admin — add fetcher class to fetcher_map in admin/app.py trigger_fetch handler
  8. Test — add mocked fetcher test in tests/test_fetchers/
  9. Deploysudo systemctl restart terrapulse
  10. Verify — check admin dashboard for new observations flowing
Live Feed