Listening for events…

Scope freeze — streamgauge Locationdex kind

Frozen 2026-06-26. Fourth kind of the Locationdex family (docs/locationdex-framework.md), after the neutron_monitor pilot, tide_gauge, and magnetic_observatory. The slot is a place: one fixed USGS real-time stream gauge. This is the usgs_water admin source (https://waterservices.usgs.gov/nwis/iv/), organized — not a new ingestion.

The slot

  • One slot = one USGS real-time stream gauge. Slot ID = the USGS site number (e.g. 01646500 Potomac R near Washington DC; 11120000 Atascadero C near Goleta CA).
  • Full-roster deep pull (no-spiking rule). The live usgs_water fetcher hard-codes ~12 gauges (a Goleta-CA cluster); the dex registers the full active real-time discharge network as the usgs_streamgauges spine, pulled from the USGS NWIS site service.
  • Scope filter (matches the live IV feed): siteStatus=active, siteType=ST (surface-water stream), parameterCd=00060 (discharge), hasDataTypeCd=iv (real-time / instantaneous values). One RDB call per state/territory — the site service rejects an un-narrowed national query, so the reload iterates the 50 states + DC + territories.
  • A gauge IS a place, so each slot carries lat/lon + a PostGIS point. A spatial sweep is buildable later; this kind stays catalog-first.

Slot contents

  • Identity: USGS site number, station name, state.
  • Place: latitude, longitude (from dec_lat_va/dec_long_va), elevation (ft), coordinate datum, PostGIS point.
  • Station character: HUC (hydrologic unit code) — the permanent basin classifier.
  • Measured-series summary joined from the live usgs_water feed (streamflow_00060) by USGS site number: observation count, coverage span, mean/min/max discharge (ft³/s), unit — cited to usgs_water per the universal multi-source rule. Gauges not yet streaming are real registry places (has_series=false) and are kept, not dropped — the live feed streams only ~12, so the overwhelming majority are registry-only.

Measured reality

  • IN: streamflow_00060 = the measured discharge at the gauge (what the river physically did). Clean per feedback_measured_reality_only.
  • OUT: the CEMS-GLOFAS streamflow source (cems-glofas-ra-streamflow-analysis) is JRC model reanalysis, held out and entirely unrelated to this measured USGS network. The dex draws only the measured USGS discharge series.

Storage

  • Spine source usgs_streamgauges (registry, active=False — refreshed by re-running the reload, not by a 60s feed).
  • Slots in the third sibling storehouse data/location_storehouse/ (shared with the other Locationdex kinds), reusing the event_storehouse write + disk-rebuilt-index machinery via its base_dir argument. This is the largest Locationdex kind by slot count (~13k), still well within the file-per-slot storehouse.

Gotchas (reuse-critical)

  • Site is keyed in extra_json->>'site_code', NOT quality_flag (which holds the USGS data-qualifier like P or P,Rat). The series join differs from the other Locationdex kinds — key on the JSON site_code.
  • The site service rejects a national query — narrow per state (stateCd=<ST>); a state with no matching sites returns HTTP 404, handled as empty.
  • RDB format = tab-delimited with # comment lines, then a header row, then a 5s 15s ... format-spec row that must be skipped before the data.
  • Only non-null discharge counts toward a gauge's series, so a gauge that registers but reports no usable values rolls up as registry-only, not a phantom series.

Frozen vs deferred-v2

  • Frozen (this brick): full active real-time discharge roster as place-slots; measured discharge series summary cited; catalog index.
  • Deferred-v2: spatial sweep (a gauge slot CAN anchor a radius read-across); full time-series streaming of every gauge (the live feed covers ~12); daily-value (DV) historical backfill enrichment; HUC-basin or drainage-area stratified reads; gauge-height (00065) as a second measured series.
Live Feed