Scope freeze — water_quality_station Locationdex (10th Locationdex kind)
Frozen 2026-06-28. Source pasted by Mike: data.gov record for EPA ScienceHub DOI
10.23719/1407630, "Assessing Dungeness River BMP Effectiveness Using an Ecological Function
Approach." Mike chose the shape: option A — an expandable water_quality_station kind (over a
watershed-specific dungeness_water_quality kind).
What it is
An expandable, multi-study registry of measured water-quality monitoring stations. The slot is one
fixed monitoring station; the STUDY / watershed it came from is a slot FIELD, not the key. This is
the same registry shape Mike chose for coral_reef_station: future EPA water-quality studies (or a
broader STORET/WQX pull) extend the SAME kind rather than minting a new one each time.
Slot = a monitoring station (Locationdex)
- Slot id:
<study_id>-<station_id>, namespaced so a future study's station "716" can never overwrite this study's station 716. - The place: station name, latitude, longitude, datum, org, state, county, HUC,
station_type. - Embedded measured record: a per-characteristic summary (n, detected vs non-detect counts,
units, detected min/median/max, year range) plus the raw
sampleslist. A precipitation station instead embeds annual precip totals (raw hourly is summarized, not embedded wholesale).
First study loaded — Dungeness River, WA (study_id = dungeness)
One small watershed near Sequim, WA (Clallam County, HUC 17110020), monitored by Clallam County, the Jamestown S'Klallam Tribe, WA Dept. of Ecology and EPA. 84 station slots:
- 80 STORET water-quality stations (
station_type = water_quality), 1999-2014, ~16,500 grab-sample results across 316 characteristics (fecal coliform, water temperature, flow, nutrients, turbidity, pH, pesticide panels, ...). - 1 USGS gauge
USGS-12048000(station_type = usgs_wq_gauge), water chemistry + flow back to 1959. - 3 NOAA COOP precipitation gauges (
station_type = precipitation), Port Angeles area, back to 1948.
Measured reality vs held out (bright line)
feedback_measured_reality_only. All three files are physical measurements: grab-sample lab/field
results, gauge flow readings, rain-gauge precipitation.
- OUT — USGS rows whose method is "Computation by NWIS algorithm" (e.g. total dissolved solids
computed from specific conductance, parameter 70301): an algorithm's estimate, not a
measurement. 133 such rows dropped from the Dungeness USGS gauge; the count is recorded on the
slot (
n_computed_held_out). A measured TDS value (param 70300) is kept.
feedback_data_is_data_partial_coverage: non-detect samples are kept (value nulled, detect = "non-detect") rather than dropped; a station with no coordinates (the precip gauges) is kept with
lat/lon null. Pesticide-screen stations that are almost entirely non-detect (e.g. station 716, 310
of 331 samples below quantification) are carried in full, the non-detection itself being a
measured result.
Storage
Locationdex sibling storehouse data/location_storehouse/water_quality_station/<slot_id>.json,
file-per-slot (84 slots), via event_storehouse.write_dossier + rebuild_index_from_disk. The
build clears the kind dir first so removed/renamed slots never linger.
Expanding the kind
Add a sibling study by adding one entry to STUDIES (label, watershed, HUC, file URLs) and, if it
is not STORET-shaped, a small reader feeding the shared summarize_characteristics /
station_summary aggregators. Same-shaped studies (STORET WQX exports) need only the registry
entry. Slot ids are namespaced by study_id, so studies never collide.
Deferred
- Coordinates for the 3 COOP precip gauges (the rainfall file carries no per-COOP lat/lon).
- Raw hourly precipitation (only annual totals embedded in v1).
- Cross-reference of
USGS-12048000to thestreamgaugeLocationdex slot for the same gauge. - Sibling EPA water-quality studies / a broader STORET/WQX national pull (the point of option A).
- Spatial sweep / cross-match (deferred for every Locationdex kind until one needs it).
- No live edge: the Dungeness study is a closed 1948-2014 archive.