Scope freeze — streamgauge Locationdex kind
Frozen 2026-06-26. Fourth kind of the Locationdex family
(docs/locationdex-framework.md), after the neutron_monitor pilot, tide_gauge, and
magnetic_observatory. The slot is a place: one fixed USGS real-time stream gauge. This
is the usgs_water admin source (https://waterservices.usgs.gov/nwis/iv/), organized —
not a new ingestion.
The slot
- One slot = one USGS real-time stream gauge. Slot ID = the USGS site number (e.g.
01646500Potomac R near Washington DC;11120000Atascadero C near Goleta CA). - Full-roster deep pull (no-spiking rule). The live
usgs_waterfetcher hard-codes ~12 gauges (a Goleta-CA cluster); the dex registers the full active real-time discharge network as theusgs_streamgaugesspine, pulled from the USGS NWIS site service. - Scope filter (matches the live IV feed):
siteStatus=active,siteType=ST(surface-water stream),parameterCd=00060(discharge),hasDataTypeCd=iv(real-time / instantaneous values). One RDB call per state/territory — the site service rejects an un-narrowed national query, so the reload iterates the 50 states + DC + territories. - A gauge IS a place, so each slot carries lat/lon + a PostGIS point. A spatial sweep is buildable later; this kind stays catalog-first.
Slot contents
- Identity: USGS site number, station name, state.
- Place: latitude, longitude (from
dec_lat_va/dec_long_va), elevation (ft), coordinate datum, PostGIS point. - Station character: HUC (hydrologic unit code) — the permanent basin classifier.
- Measured-series summary joined from the live
usgs_waterfeed (streamflow_00060) by USGS site number: observation count, coverage span, mean/min/max discharge (ft³/s), unit — cited tousgs_waterper the universal multi-source rule. Gauges not yet streaming are real registry places (has_series=false) and are kept, not dropped — the live feed streams only ~12, so the overwhelming majority are registry-only.
Measured reality
- IN:
streamflow_00060= the measured discharge at the gauge (what the river physically did). Clean perfeedback_measured_reality_only. - OUT: the CEMS-GLOFAS streamflow source (
cems-glofas-ra-streamflow-analysis) is JRC model reanalysis, held out and entirely unrelated to this measured USGS network. The dex draws only the measured USGS discharge series.
Storage
- Spine source
usgs_streamgauges(registry,active=False— refreshed by re-running the reload, not by a 60s feed). - Slots in the third sibling storehouse
data/location_storehouse/(shared with the other Locationdex kinds), reusing theevent_storehousewrite + disk-rebuilt-index machinery via itsbase_dirargument. This is the largest Locationdex kind by slot count (~13k), still well within the file-per-slot storehouse.
Gotchas (reuse-critical)
- Site is keyed in
extra_json->>'site_code', NOTquality_flag(which holds the USGS data-qualifier likePorP,Rat). The series join differs from the other Locationdex kinds — key on the JSON site_code. - The site service rejects a national query — narrow per state (
stateCd=<ST>); a state with no matching sites returns HTTP 404, handled as empty. - RDB format = tab-delimited with
#comment lines, then a header row, then a5s 15s ...format-spec row that must be skipped before the data. - Only non-null discharge counts toward a gauge's series, so a gauge that registers but reports no usable values rolls up as registry-only, not a phantom series.
Frozen vs deferred-v2
- Frozen (this brick): full active real-time discharge roster as place-slots; measured discharge series summary cited; catalog index.
- Deferred-v2: spatial sweep (a gauge slot CAN anchor a radius read-across); full
time-series streaming of every gauge (the live feed covers ~12); daily-value (DV)
historical backfill enrichment; HUC-basin or drainage-area stratified reads; gauge-height
(
00065) as a second measured series.