TerraPulse Data Dimensions
Living document. Last updated: 2026-03-18.
814K+ observations. 23 metrics. 11 categories. 44 countries. 76 years of temporal depth.
The Observation Cube
Every observation in TerraPulse lives in a multi-dimensional space:
Time × Location × Flavor = the observation cube
Each record carries these dimensions at varying levels of completeness, creating a rich but heterogeneous dataset that requires careful normalization for cross-source analysis.
Dimension 1: Time (100% coverage)
All 814K+ rows have timestamp_utc. This is the universal join key — everything can be aligned to a shared timeline.
| Property | Value |
|---|---|
| Coverage | 100% of all observations |
| Span | 76 years (1949–2026) |
| Earliest | 1949-12-31 (ONI climate index) |
| Latest | Current (real-time ingestion) |
| Distinct days | 2,612 |
Temporal Granularity by Source
| Granularity | Sources |
|---|---|
| 10-minute | GeoSphere TAWES stations |
| 15-minute | USGS Water (streamflow) |
| Hourly | NOAA SWPC (Kp index, solar flux), INCA |
| ~Real-time | USGS Earthquake, Safecast, Open-Meteo |
| Daily | NASA POWER reanalysis, USDM drought snapshots |
| Weekly | USDM drought (published Thursdays) |
| Monthly | Climate indices (MEI, ONI) |
| Annual | World Bank indicators |
Temporal Depth by Metric
| Metric | Earliest | Latest | Depth |
|---|---|---|---|
| climate_oni | 1949 | 2025 | 76 years |
| wb_SP.POP.TOTL | 1960 | 2022 | 63 years |
| climate_mei_v2 | 1979 | 2025 | 47 years |
| wb_AG.LND.FRST.ZS | 1990 | 2022 | 33 years |
| wb_EG.USE.PCAP.KG.OE | 1990 | 2022 | 33 years |
| drought_area_pct | 2010 | 2026 | 16 years |
| earthquake_magnitude | 2021 | 2026 | 5 years |
| nasa_power_t2m | 2023 | 2026 | 3 years |
| All others | 2026 | 2026 | Days–weeks (real-time only) |
Dimension 2: Location (93% spatial coverage)
756K rows (93%) have latitude/longitude. The remaining 7% are inherently non-point data.
| Property | Value |
|---|---|
| Rows with lat/lon | 756,128 (93%) |
| Rows geocoded | 357,006 (44%) |
| Distinct countries | 44 |
| Distinct states | 115 |
| Coordinate system | WGS84 (EPSG:4326) |
| Storage | PostGIS Geography(Point, 4326) |
Spatial Coverage by Category
| Category | Rows | Has Lat/Lon | Has Geocode | Coverage |
|---|---|---|---|---|
| Hydrology | 418K | 100% | 48% | US (California focus) |
| Radiation | 223K | 100% | 64% | Global (citizen science, Japan-heavy) |
| Seismic | 97K | 100% | 10% | Global |
| Temperature | 13K | 100% | 12% | 10 global cities + Austria |
| Air Quality | 2.2K | 100% | 72% | 10 global + 8 European cities |
| Water/Ocean | 5K | 32% | 4% | US coasts + 10 ocean regions + 10 rivers |
| Satellite | 1K | 100% | 63% | 5 climate regions (Arctic, Greenland, Amazon, Gulf, SE Asia) |
| Space Weather | 37K | 2% | 1% | Non-spatial (Sun-Earth system) |
| Drought | 13K | 0% | 0% | US states (area-based, not point) |
| Climate Indices | 3K | 0% | 0% | Global indices (not point-localizable) |
| Socioeconomic | 3K | 0% | 0% | Country-level (15 countries) |
Non-Point Data
Some metrics are inherently non-spatial or use different spatial models:
- Drought — area percentages per US state (D0–D4 levels). Location is
statein extra_json. - Climate Indices — global ocean-atmosphere oscillations (ENSO). No meaningful point location.
- World Bank — country-level aggregates.
country_codein extra_json. - Space Weather — Sun-Earth system events. Kp index, solar flux, CME trajectories exist in heliographic coordinates, not Earth surface.
- Tides — station-based but lat/lon not yet populated (station_id in extra_json).
Geocoding Gap
Only 44% of rows have reverse-geocoded country/state/city fields. This is because:
- Geocoding only runs on new observations (since the feature was added)
- Backfilled historical data (earthquakes, drought, etc.) was not geocoded
- Rate limits on Nominatim prevent bulk retroactive geocoding
Fix: A batch geocoding job could fill in the ~400K spatial rows missing geo hierarchy.
Dimension 3: Flavor (23 metrics, 11 categories)
"Flavor" is what is being measured — the semantic meaning of the observation value.
Category Breakdown
| Category | Metrics | Total Rows | Unit(s) | Deepest History |
|---|---|---|---|---|
| Hydrology | streamflow_00060 | 418K | ft³/s | 2026 (real-time only) |
| Radiation | radiation | 223K | CPM | 1970–2026 |
| Seismic | earthquake_magnitude | 97K | magnitude | 2021–2026 |
| Space Weather | space_kp_index, space_solar_flux_10cm, donki_cme_speed, donki_flare_class, donki_radiation_belt | 37K | Kp, sfu, km/s, class, event | 2026 |
| Temperature | temperature_2m, nasa_power_t2m, geosphere_temperature | 13K | °C | 2023–2026 |
| Drought | drought_area_pct | 13K | % | 2010–2026 |
| Water/Ocean | water_level, wave_height, river_discharge | 5K | m, m, m³/s | 2026 |
| Climate Indices | climate_oni, climate_mei_v2 | 3K | index | 1949–2025 |
| Socioeconomic | wb_SP.POP.TOTL, wb_AG.LND.FRST.ZS, wb_EG.USE.PCAP.KG.OE | 3K | varies | 1960–2022 |
| Air Quality | us_aqi, geosphere_pm25 | 2.2K | AQI, µg/m³ | 2026 |
| Satellite | sar_scene | 1K | MB | 2026 |
Value Semantics
Each metric's value field means something different:
| Metric | Value Meaning | Scale |
|---|---|---|
| earthquake_magnitude | Richter magnitude | 0–9+ (logarithmic) |
| streamflow_00060 | Water discharge rate | 0–100K+ ft³/s |
| radiation | Counts per minute | 0–1000+ CPM |
| temperature_2m | Air temperature | -40 to +50 °C |
| us_aqi | Air Quality Index | 0–500 (EPA scale) |
| space_kp_index | Geomagnetic activity | 0–9 Kp |
| drought_area_pct | Percent of state in drought | 0–100% |
| donki_cme_speed | CME velocity | 100–3000 km/s |
| donki_flare_class | Flare intensity (numeric) | 0.1–10000 (A=0.1, X10=10000) |
| wave_height | Significant wave height | 0–15+ m |
| water_level | Tidal water level | -2 to +3 m (relative to MLLW) |
| river_discharge | River flow rate | 0–100K+ m³/s |
| climate_oni | El Niño/La Niña index | -3 to +3 |
| climate_mei_v2 | Multivariate ENSO Index | -3 to +3 |
| sar_scene | SAR scene file size | 100–2000 MB |
| geosphere_pm25 | PM2.5 concentration | 0–500 µg/m³ |
| nasa_power_t2m | Daily mean temperature | -40 to +50 °C |
| wb_SP.POP.TOTL | National population | Millions |
| wb_AG.LND.FRST.ZS | Forest area percentage | 0–100% |
| wb_EG.USE.PCAP.KG.OE | Energy use per capita | 0–10000 kg oil equiv |
Dimension 4: Source (16 active)
Source = provenance. Who measured this, how, and with what instrument.
| Source | Slug | Type | Authority |
|---|---|---|---|
| USGS Earthquake | usgs_earthquake | Government | US Geological Survey |
| USGS Water Services | usgs_water | Government | US Geological Survey |
| Open-Meteo Weather | open_meteo | Open source | Open-Meteo GmbH |
| Open-Meteo Air Quality | open_meteo_aqi | Open source | Open-Meteo GmbH |
| Open-Meteo Flood/River | open_meteo_flood | Open source | GloFAS/Copernicus |
| Open-Meteo Marine | open_meteo_marine | Open source | Open-Meteo GmbH |
| Safecast Radiation | safecast | Citizen science | Safecast Foundation |
| World Bank Climate | world_bank | International org | World Bank Group |
| USDM Drought Monitor | usdm_drought | Government | NDMC/USDA/NOAA |
| NOAA Tides & Currents | noaa_tides | Government | NOAA CO-OPS |
| NASA POWER | nasa_power | Government | NASA Langley |
| NOAA Climate Indices | noaa_climate_indices | Government | NOAA PSL/CPC |
| NOAA Space Weather | noaa_space_weather | Government | NOAA SWPC |
| ASF Sentinel-1 SAR | asf_sentinel | Government | NASA ASF DAAC / ESA |
| NASA DONKI | nasa_donki | Government | NASA CCMC |
| GeoSphere Austria | geosphere | Government | GeoSphere Austria |
Dimension 5: Extra (in extra_json)
Every observation carries an extra_json TEXT field with source-specific metadata not captured in the normalized columns. This is the "long tail" of dimensions.
Key Extra Fields by Category
| Category | Extra Fields Available |
|---|---|
| Seismic | event_id, place, felt, cdi, mmi, alert, tsunami, sig, depth_km |
| Space Weather | activity_id, linked_events, instruments, predicted_kp, earth_arrival |
| Hydrology | site_code, site_name, variable_name |
| Temperature | city, humidity_pct, wind_speed, precipitation, solar_irradiance |
| Satellite | scene_id, orbit, path_number, polarization, browse_url, download_url |
| Drought | state, d0_pct, d1_pct, d2_pct, d3_pct, d4_pct |
| Socioeconomic | country_code, country_name, indicator_name |
| Air Quality | pm10, ozone, no2, so2, co |
Missing Dimensions (Future)
Altitude/Depth
- Earthquake
depth_kmexists in extra_json but not as a column - GeoSphere stations have elevation metadata
- NASA POWER is at 2m above ground
- Ocean data could carry depth
Severity/Significance
- Earthquakes have
sigcomposite score - Drought has D0–D4 categorical levels
- AQI has EPA category thresholds
- Space weather has G1–G5 storm scale
- A universal "significance score" (0–1 normalized) could enable cross-category alerting
Forecast vs Observed
- GeoSphere PM2.5 is a forecast
- DONKI CME has predicted Kp vs SWPC observed Kp
- Open-Meteo weather is current observations
- Distinguishing forecast from observation is important for accuracy analysis
Data Quality
- USGS streamflow has
qualifiers(provisional, approved) - Safecast radiation is citizen science (variable quality)
- USGS earthquake has
status(automatic, reviewed) - A normalized quality dimension could weight observations in analysis
Cross-Dimensional Analysis Opportunities
Time × Flavor (Trend Analysis)
- Temperature trends across cities (NASA POWER, 2023–2026)
- Earthquake frequency by magnitude band over years
- Drought severity progression per state (2010–2026)
- ENSO cycle phase identification from ONI/MEI
Location × Flavor (Spatial Correlation)
- AQI vs temperature at co-located monitoring points
- Streamflow vs precipitation in the same watershed
- Earthquake clustering by geographic region
- Radiation levels near nuclear facilities vs background
Time × Location × Flavor (The Full Cube)
- "During El Niño years, how does California drought severity change?"
- "Do M5+ earthquake clusters correlate with any preceding seismic pattern?"
- "When Kp exceeds 5, what happens to GPS accuracy in the following 24 hours?"
- "Is wildfire-season AQI getting worse year over year in specific cities?"
Flavor × Flavor (Cross-Metric Correlation)
- ENSO → Drought: ONI values vs drought_area_pct (lagged by 3–6 months)
- CME → Kp: donki_cme_speed vs space_kp_index (lagged by 1–3 days)
- Temperature → AQI: nasa_power_t2m vs us_aqi (same-day, same location)
- Solar Flux → Temperature: space_solar_flux_10cm vs global temperature (11-year cycle)
- Streamflow → Precipitation: streamflow_00060 vs NASA POWER precip (same watershed)
Normalization Gaps to Address
- Geocode backfill — 400K+ spatial rows missing country/state/city
- Tides need lat/lon — station coordinates available in metadata but not populated
- Drought needs lat/lon — state centroid coordinates could be assigned
- World Bank needs lat/lon — country centroid coordinates could be assigned
- Depth/elevation — promote earthquake depth_km to a column
- Forecast flag — add
is_forecastboolean to distinguish predictions from observations - Quality score — normalized 0–1 quality based on source-specific flags