TerraPulse Data Source Handbook
Living document. Last updated: 2026-03-17.
Every source listed here follows the same onboarding pattern: Fetcher → DuckDB stage → Normalizer → PostgreSQL → API/Admin
Quick Reference
| Source | Auth | Format | Interval | Status |
|---|---|---|---|---|
| USGS Earthquake | None | GeoJSON | 60s | Active |
| USGS Water Services | None | JSON | 5min | Active |
| Open-Meteo Weather | None | JSON | 10min | Active |
| Safecast Radiation | None | JSON | 5min | Active |
| World Bank Climate | None | JSON | 6hr | Active |
| GFW Fire Alerts | API key? | JSON | -- | Disabled (403) |
| Open-Meteo Air Quality | None | JSON | 10min | Active |
| Open-Meteo Flood/River | None | JSON | 15min | Active |
| Open-Meteo Marine | None | JSON | 15min | Active |
| USDM Drought Monitor | None | JSON/CSV | Weekly | Active |
| NOAA Tides & Currents | None | JSON | 6min | Active |
| NASA POWER | None | JSON | Daily | Active |
| NOAA Climate Indices | None | Text | Monthly | Active |
| NOAA Space Weather | None | JSON | 5min | Active |
| ASF Sentinel-1 SAR | None | GeoJSON | 1hr | Active |
| NWS Alerts | User-Agent | GeoJSON | 60s | Phase 2 |
| NASA FIRMS | MAP_KEY | CSV/JSON | 60s | Phase 2 |
| EPA AirNow | API key | JSON | Hourly | Phase 2 |
| OpenAQ | API key | JSON | Hourly | Phase 2 |
| NOAA CDO | Token | JSON | Daily | Phase 2 |
| PurpleAir | API key | JSON | 2min | Phase 2 |
| NOAA CO₂ / Keeling Curve | None | CSV | Daily | Stretch |
| NSIDC Arctic Ice | None | CSV | Daily | Stretch |
| NIFC Fire Perimeters | None | GeoJSON | Real-time | Stretch |
| NOAA Hurricane Center | None | GeoJSON | Seasonal | Stretch |
| EPA UV Index | None | JSON | Daily | Stretch |
| Water Quality Portal | None | CSV/JSON | Historical | Stretch |
| ERDDAP/CoastWatch | None | Many | Varies | Stretch |
| CHIRPS Precipitation | None | GeoTIFF | Daily | Stretch |
Active Sources
USGS Earthquake
- Slug:
usgs_earthquake - Base URL:
https://earthquake.usgs.gov/fdsnws/event/1/query - Auth: None
- Format: GeoJSON
- Rate limit: Max 20,000 events per request
- Schedule: Every 60s
What we fetch: Last hour of earthquake events globally, up to 500 per fetch.
Key params:
format=geojsonstarttime— ISO datetime, 1 hour backorderby=timelimit=500- Spatial:
minlatitude/maxlatitude/minlongitude/maxlongitude(bbox) orlatitude/longitude/maxradiuskm(circle) - Magnitude:
minmagnitude/maxmagnitude - Depth:
mindepth/maxdepth
Normalized fields: earthquake_magnitude metric, magnitude value, lat/lon/depth, event metadata in extra_json.
Gotchas:
- Coordinates array is
[longitude, latitude, depth]— lon first - For high-volume use, USGS recommends the summary feeds (e.g.
all_hour.geojson) over the query endpoint - Event
timeis epoch milliseconds
Endpoints:
/query— main data query/count— count matching events/catalogs,/contributors— enumeration
USGS Water Services
- Slug:
usgs_water - Base URL:
https://waterservices.usgs.gov/nwis/iv/ - Auth: None
- Format: JSON (WaterML 1.1 structure)
- Rate limit: IP-based blocking for excessive use
- Schedule: Every 5min
What we fetch: Instantaneous streamflow readings (param 00060) for California, last hour.
Key params:
format=jsonparameterCd=00060(discharge, cubic ft/sec) — also:00065(gage height),00010(water temp)period=PT1H(ISO 8601 duration)siteStatus=active- Site filter (pick one):
stateCd,sites(max 100),huc,bBox(max 25-degree product),countyCd(max 20)
Normalized fields: streamflow_00060 metric, value in ft³/s, site metadata in extra_json.
Gotchas:
- Currently scoped to
stateCd=CA— parameterize for broader coverage - Response is deeply nested:
value.timeSeries[].sourceInfo.geoLocation.geogLocation - Bbox lat×lon product cannot exceed 25 degrees
- Data is provisional: "subject to revision"
- Readings arrive ~every 15 minutes, transmitted hourly
Open-Meteo Weather
- Slug:
open_meteo - Base URL:
https://api.open-meteo.com/v1/forecast - Auth: None (non-commercial)
- Format: JSON
- Rate limit: Reasonable use expected
- Schedule: Every 10min
What we fetch: Current weather for 10 global cities (temperature, humidity, wind, precipitation, weather code).
Key params:
latitude/longitude— comma-separated for multiple locationscurrent=temperature_2m,relative_humidity_2m,wind_speed_10m,precipitation,weather_codetimezone=UTC- Also available:
hourly,daily,forecast_days,past_days,models
Normalized fields: temperature_2m metric in °C, city + humidity/wind/precipitation in extra_json.
Sub-APIs on separate domains (same auth/format, ready for expansion):
| Sub-API | Domain | Data |
|---|---|---|
| Air Quality | air-quality-api.open-meteo.com/v1/air-quality |
PM2.5, PM10, O3, NO2, SO2, CO, AQI, pollen |
| Marine | marine-api.open-meteo.com/v1/marine |
Wave height/period, swell, SST, currents |
| Flood | flood-api.open-meteo.com/v1/flood |
River discharge (GloFAS), global 5km, 1984–present + 7mo forecast |
| Historical | archive-api.open-meteo.com/v1/archive |
Past weather observations |
| Ensemble | ensemble-api.open-meteo.com/v1/ensemble |
Probabilistic forecasts |
Gotchas:
- Multiple coordinates → response is an array, not a single object
- AQ European domain is 11km/hourly; global is 45km/3-hourly
- Flood API: "Due to 5km resolution, the closest river might not be selected correctly. Varying coordinates by ±0.1° can help"
- Commercial use requires API key with
customer-URL prefix
Safecast Radiation
- Slug:
safecast - Base URL:
https://api.safecast.org/measurements.json - Auth: None (GET requests)
- Format: JSON
- Rate limit: Not documented
- Schedule: Every 5min
What we fetch: Latest 500 radiation measurements globally.
Key params:
order=created_at descper_page=500- Spatial:
latitude,longitude,distance(km radius) - Temporal:
captured_after/captured_before,since/until
Normalized fields: radiation metric, value in CPM, device + location in extra_json.
Gotchas:
- License is Creative Commons Non-Commercial
- Citizen science data — quality is variable
- Heavy concentration in Japan (post-Fukushima), US, Europe
bgeigie_importsendpoint has bulk uploads from handheld sensors- Date param format:
YYYY-MM-DD HH:MM
World Bank Climate
- Slug:
world_bank - Base URL:
https://api.worldbank.org/v2/ - Auth: None
- Format: JSON (returns
[metadata, data_array]) - Rate limit: Not documented
- Schedule: Every 6 hours
What we fetch: 5 climate indicators × 15 countries × 6 years (2018–2023):
EN.ATM.CO2E.PC— CO₂ emissions (metric tons per capita)AG.LND.FRST.ZS— Forest area (% of land area)EG.USE.PCAP.KG.OE— Energy use (kg oil equiv per capita)EN.ATM.METH.KT.CE— Methane emissions (kt CO₂ equiv)SP.POP.TOTL— Total population
Expand with:
EN.ATM.GHGO.KT.CE— Other GHG emissionsAG.LND.ARBL.ZS— Arable land %EG.ELC.RNEW.ZS— Renewable electricity %
Key params:
/country/{CODES}/indicator/{ID}?format=json&date=2018:2023&per_page=500- Country codes semicolon-separated:
USA;CHN;IND;BRA;... per_pagemax 1000
Normalized fields: wb_{indicator_id} metric, country + year in extra_json. No lat/lon (country-level data).
Gotchas:
- Response is always
[metadata_object, data_array]— usedata[1] - Must include
v2in URL (v1 discontinued June 2020) - Many indicator values are null for recent years (1–3 year data lag)
- 189 countries, ~16,000 indicators, 45+ databases
Ready to Onboard (No Auth)
Open-Meteo Air Quality
- Base URL:
https://air-quality-api.open-meteo.com/v1/air-quality - Auth: None
- Format: JSON (same structure as weather API)
Same fetcher pattern as Open-Meteo Weather. Key params:
?latitude=40.71,-34.05&longitude=-74.01,151.21
¤t=us_aqi,pm2_5,pm10,ozone,nitrogen_dioxide,sulphur_dioxide,carbon_monoxide
&timezone=UTC
Variables: us_aqi, european_aqi, pm2_5, pm10, ozone, nitrogen_dioxide, sulphur_dioxide, carbon_monoxide, dust, uv_index, aerosol_optical_depth, various pollen types (Europe only).
Open-Meteo Flood / River Discharge
- Base URL:
https://flood-api.open-meteo.com/v1/flood - Auth: None
- Format: JSON
?latitude=47.37&longitude=8.55&daily=river_discharge&forecast_days=7
Global 5km resolution, GloFAS data. River discharge in m³/s. Ensemble stats available (river_discharge_mean, river_discharge_median, river_discharge_max, river_discharge_min). Historical from 1984.
Open-Meteo Marine
- Base URL:
https://marine-api.open-meteo.com/v1/marine - Auth: None
- Format: JSON
?latitude=54.32&longitude=10.12
¤t=wave_height,wave_period,wave_direction,ocean_current_velocity,ocean_current_direction
Wave data, swell components, sea surface temperature, sea level, ocean currents.
USDM Drought Monitor
- Base URL:
https://usdmdataservices.unl.edu/api/ - Auth: None
- Format: JSON (via
Accept: application/jsonheader), CSV, XML
Endpoints:
- Comprehensive stats (area, population, DSCI by region + date range)
- Stats by drought threshold (D0–D4)
- Consecutive weeks in drought
Params: aoi (state FIPS, county, HUC, etc.), startdate/enddate, statisticsType, dx (drought level 0–4)
Data: Drought category (D0–D4 = Abnormally Dry → Exceptional Drought), percent area, DSCI index.
Coverage: US only. Weekly updates (Thursdays).
Gotchas: Date format is M/D/YYYY (unusual).
NOAA Tides & Currents
- Base URL:
https://api.tidesandcurrents.noaa.gov/api/prod/datagetter - Auth: None (optional
application=param for logging) - Format: JSON, XML, CSV
?station=9414290&begin_date=20260316&end_date=20260316
&product=water_level&datum=MLLW&units=metric&time_zone=gmt&format=json
Products: Water levels, tide predictions, air/water temperature, wind, barometric pressure, humidity, salinity, currents, conductivity.
Data limits by interval:
- 1-minute: 4 days
- 6-minute: 1 month
- Hourly: 1 year
- Daily: 10 years
Coverage: US coastal + Great Lakes stations.
Why it matters: Sea level rise monitoring, coastal flooding, storm surge.
NASA POWER
- Base URL:
https://power.larc.nasa.gov/api/temporal/daily/point - Auth: None
- Format: JSON, CSV, NetCDF, ASCII
?parameters=T2M,PRECTOTCORR,ALLSKY_SFC_SW_DWN&community=RE
&latitude=40.71&longitude=-74.01&start=20260101&end=20260315&format=json
200+ parameters including temperature, precipitation, solar irradiance, wind, humidity, soil moisture. Global 0.5° grid (~50km), MERRA-2 reanalysis. 1981–present.
Why it matters: Long historical climate baselines, solar resource data, global coverage.
Gotchas: Fill value -999.0 for missing data — our normalizer already handles this.
NOAA Climate Indices
- Auth: None (static files)
- Format: Space-delimited text
| Index | URL | Update |
|---|---|---|
| MEI v2 (ENSO) | https://psl.noaa.gov/enso/mei/data/meiv2.data |
Monthly |
| ONI | https://www.cpc.ncep.noaa.gov/data/indices/oni.ascii.txt |
Monthly |
| SOI | https://www.cpc.ncep.noaa.gov/data/indices/soi |
Monthly |
| NAO | https://www.cpc.ncep.noaa.gov/products/precip/CWlink/pna/norm.nao.monthly.b5001.current.ascii |
Monthly |
| PDO | https://www.ncei.noaa.gov/pub/data/cmb/ersst/v5/index/ersst.v5.pdo.dat |
Monthly |
These are the core climate oscillation indices that drive global weather patterns. Simple text parsing, low-volume, essential context.
NOAA Space Weather
- Base URL:
https://services.swpc.noaa.gov/products/ - Auth: None
- Format: JSON (arrays of arrays)
| Product | Endpoint | Data |
|---|---|---|
| Kp Index | noaa-planetary-k-index.json |
Geomagnetic activity (current) |
| Kp Forecast | noaa-planetary-k-index-forecast.json |
3-day Kp forecast |
| Dst Index | kyoto-dst.json |
Disturbance storm time |
| Solar Wind | solar-wind/mag-*.json |
Magnetometer data |
| Solar Flares | flares/ |
Flare events |
| Alerts | alerts.json |
Active space weather alerts |
Coverage: Global (space weather affects the whole planet). Niche but relevant — geomagnetic storms affect power grids and comms.
ASF Sentinel-1 SAR
- Slug:
asf_sentinel - Base URL:
https://api.daac.asf.alaska.edu/services/search/param - Auth: None (searches are public)
- Format: GeoJSON
- Rate limit: Max 2,000 results per query; be polite with request spacing
- Schedule: Every hour
What we fetch: Sentinel-1 GRD_HD (Ground Range Detected, High Density) SAR scenes from 5 climate-relevant regions — Arctic/Svalbard, Greenland Ice Sheet, Amazon Basin, Gulf Coast US, Southeast Asia. Rolling 3-day window, 20 scenes per region max.
Key params:
platform=Sentinel-1processingLevel=GRD_HD— 10m resolution, reduced specklebeamMode=IW— Interferometric Wide, 250km swathintersectsWith=POLYGON(...)— WKT spatial filteroutput=geojsonmaxResults=20
Normalized fields: sar_scene metric, value = file size in MB, center lat/lon, orbit/path/frame metadata in extra_json.
Climate applications of Sentinel-1 SAR:
- Flood mapping — works through clouds, day/night (GRD products)
- Sea ice monitoring — concentration, type, lead/ridge detection
- Glacier velocity — ice sheet flow speed via InSAR (SLC products)
- Ground subsidence — millimeter-scale deformation detection
- Forest change — deforestation, biomass estimation via VH cross-pol
- Urban damage assessment — post-disaster building damage
Other platforms available: ALOS PALSAR, ALOS-2, ERS-1/2, JERS-1, RADARSAT-1/2, NISAR (upcoming)
Gotchas:
- Use
platform=notdataset=(parameter names are case-sensitive) - Unbounded queries over 2,000 results will error
- Downloads from
datapool.asf.alaska.edurequire no credentials - AWS S3 access requires NGAP credentials (optional)
- Added to TerraPulse by Michael Isenbek via the admin UI
Phase 2 (Free API Key Required)
NASA FIRMS
- Base URL:
https://firms.modaps.eosdis.nasa.gov/api/area/ - Auth: MAP_KEY (free via Earthdata login)
- Format: CSV, JSON, KML, shapefiles
- Rate limit: 5,000 transactions / 10 minutes
/api/area/csv/{MAP_KEY}/VIIRS_SNPP_NRT/{west},{south},{east},{north}/1
Sources: MODIS, VIIRS (SNPP, NOAA-20, NOAA-21), Landsat. Ultra Real-Time within 60 seconds (North America). This is the upstream source for GFW fire data — better to use directly.
Day range: 1–5 only per query.
NWS Alerts
- Base URL:
https://api.weather.gov/ - Auth: User-Agent header required —
(TerraPulse, contact@terrapulse.info) - Format: GeoJSON
/alerts/active?area=CA
/points/{lat},{lon} → /gridpoints/{office}/{x},{y}/forecast
Two-step process for forecasts: First /points to get grid coordinates, then /gridpoints for data.
EPA AirNow
- Base URL:
https://www.airnowapi.org/aq/ - Auth: API key (free registration)
- Format: JSON, XML, CSV
- Rate limit: 500 requests/hour
Endpoints for current observations (by bbox, zip, lat/lon), forecasts, historical. US/Canada/Mexico, 2,500+ monitoring stations.
OpenAQ
- Base URL:
https://api.openaq.org/v3/ - Auth: API key (free registration)
- Format: JSON
160+ countries, 40,000+ locations. PM2.5, PM10, O3, NO2, SO2, CO, BC. Broader global coverage than EPA AirNow. Also has free AWS S3 open data archive for bulk downloads.
PurpleAir
- Base URL:
https://api.purpleair.com/ - Auth: API key (via develop.purpleair.com)
- Format: JSON
100,000+ low-cost sensors. Real-time PM2.5 (2-minute updates). Highest density AQ network, hyperlocal data.
Stretch Goals
ERDDAP / CoastWatch
- Base URL:
https://coastwatch.pfeg.noaa.gov/erddap/ - Auth: None
- Format: JSON, CSV, NetCDF, KML, many more
3,084+ datasets: SST, chlorophyll-a, ocean color, wave data, currents, salinity, wind, ocean heat content. griddap for gridded data, tabledap for station data. The premier open ocean data server.
CHIRPS Precipitation
- Base URL:
https://data.chc.ucsb.edu/products/CHIRPS-2.0/ - Auth: None
- Format: GeoTIFF, NetCDF (file downloads, no REST API)
Global precipitation, 5km resolution, 1981–present. Best free high-res precipitation for developing regions. Would need a file-download fetcher rather than the standard HTTP JSON approach.
NOAA CO₂ / Keeling Curve
- URL:
https://gml.noaa.gov/ccgg/trends/co2/co2_daily_mlo.csv - Auth: None
- Format: CSV flat file (no REST API)
Daily Mauna Loa CO₂ readings. No formal API — build a cron fetcher that downloads and parses the CSV. Scripps CO₂ Program publishes the same data. Essential for the "Big Picture" view. See also: Copernicus CAMS for global GHG (Python CDS API, free registration).
NSIDC Arctic Ice Extent
- URL:
https://nsidc.org/data/seaice_index/ - Auth: None
- Format: CSV, GeoTIFF, shapefiles
Daily Arctic/Antarctic sea ice extent and area. Key climate indicator. Data files downloadable, some via ERDDAP.
NIFC Fire Perimeters (ArcGIS)
- URL:
https://services3.arcgis.com/T4QMspbfLg3qTGWY/arcgis/rest/services/ - Auth: None
- Format: GeoJSON via ArcGIS REST API
Real-time fire perimeters, containment percentages, and incident data from the National Interagency Fire Center. Complements point-based FIRMS hotspot data with polygon perimeters.
NOAA National Hurricane Center
- URL:
https://www.nhc.noaa.gov/gis/ - Auth: None
- Format: GeoJSON, KML, shapefiles
Active tropical cyclone tracks, forecast cones, wind speed probabilities, storm surge. Published as GeoJSON/KML feeds during hurricane season. Seasonal but critical for alerts.
EPA Envirofacts UV Index
- URL:
https://enviro.epa.gov/enviro/efservice/ - Auth: None
- Format: JSON, XML, CSV
Daily UV index forecasts by zip code. Free, reasonable rate limits. Open-Meteo also includes hourly UV in its air quality API as an alternative.
Water Quality Portal (USGS/EPA)
- URL:
https://www.waterqualitydata.us/ - Auth: None
- Format: CSV, JSON, XML
Joint USGS/EPA database with 430M+ records. Primarily historical rather than real-time. Parameters include nutrients, metals, pesticides, bacteria. Stretch goal — useful for trend analysis.
Commercial Aggregators (Reference)
These are documented for future reference if we need to fill coverage gaps.
| Aggregator | Best For | Free Tier | Paid |
|---|---|---|---|
| IQAir/AirVisual | Consumer-grade AQI + health recs | 500 calls/day | $399/mo (100K/day) |
| OpenWeatherMap | Budget commercial weather | 1K calls/day (One Call 3.0) | €0.14/100 overage |
| Tomorrow.io | Advanced weather intelligence | 500 calls/day (core) | Custom enterprise |
| Ambee | Multi-category environmental | 100 records/day trial | Custom |
| OpenUV | Dedicated UV data | 50 calls/day | Paid plans |
Strategy from project docs: Start with free government APIs as backbone, add Open-Meteo for dev, OpenWeatherMap One Call 3.0 for production. This covers all 10 environmental categories at minimal cost.
Onboarding Checklist
When adding a new data source:
- Research — document the API in this handbook (URL, auth, format, gotchas)
- Fetcher — create
src/terrapulse/ingestion/fetchers/{slug}.pyinheritingBaseFetcher - Normalizer — add
normalize_{slug}()tosrc/terrapulse/ingestion/normalizer.py - Orchestrator — register normalizer in
NORMALIZERSdict inorchestrator.py - Seed — add entry to
MVP_SOURCESinsrc/terrapulse/db/seed.py - Scheduler — add job to
JOBSlist inscheduler.py - Admin — add fetcher class to
fetcher_mapinadmin/app.pytrigger_fetch handler - Test — add mocked fetcher test in
tests/test_fetchers/ - Deploy —
sudo systemctl restart terrapulse - Verify — check admin dashboard for new observations flowing