Listening for events…

Scope freeze — nrsa_water_quality YearLocationdex (4th YearLocationdex kind)

Decided 2026-06-27 (Mike re-pasted the NARS data page as the go-ahead for the next sibling). Source = EPA National Aquatic Resource Surveys (NARS), the National Rivers and Streams Assessment (NRSA). The rivers-and-streams sibling of nla_water_quality (lakes) and ncca_water_quality (coastal). Same shape Mike approved for NARS: a site × survey-cycle grid (YearLocationdex). The 4th YearLocationdex kind (docs/yearlocationdex-framework.md). Categorization settled by the NLA/NCCA precedent, so this was built directly.

Slot

One (river/stream site, survey cycle) cell holding that visit's MEASURED water-quality indicators. Unlike NCCA (one cycle), NRSA gives a real multi-cycle year axis.

Source / scope

https://www.epa.gov/national-aquatic-resource-surveys/data-national-aquatic-resource-surveys. Pilot scope = the three modern wide-format cycles 2013-14, 2018-19, 2023-24 (the 2023-24 data was posted 2026-06). The older 2008-09 cycle uses a different schema and is a deferred backfill that would extend the axis to 2008 ("data is data"). year = the cycle's nominal start year (2013/2018/2023); a cycle label ("2013-14") is carried alongside.

Measured reality — IN / OUT (bright line feedback_measured_reality_only)

  • IN — raw per-site lab measurements: total nitrogen, total phosphorus, chlorophyll-a, conductivity, pH, turbidity, dissolved organic carbon, acid-neutralizing capacity.
  • OUT — the survey's design-based condition estimates (the *_allcond rollups carry population weights that extrapolate the sampled sites to "% of US river miles in good condition" = a computed population estimate) and the MMI / index scores.

Cell indicators (canonical units)

ptl_ugl (µg/L) · ntl_ugl (µg/L) · chla_ugl (µg/L) · cond_uscm (µS/cm) · ph · turb_ntu (NTU) · doc_mgl (mg/L) · anc_ueql (µeq/L). This is the full NLA vocabulary minus Secchi (rivers are not Secchi-sampled); the strongest cross-kind overlap of the three water kinds.

Decisions / gotchas frozen here

  • Unit harmonization (load-bearing, and asymmetric vs NLA/NCCA). NRSA total N is mg/L → ×1000 to µg/L (shares ntl_ugl). NRSA total P is ALREADY µg/L — no scaling — unlike NLA and NCCA where PTL was mg/L. Getting this backwards would inflate river TP 1000×. CHLA (µg/L), COND (µS/cm), TURB (NTU), DOC (mg/L), ANC (µeq/L), pH already share units across all three cycles (verified from each cycle's *_UNITS columns). Sanity after build: TP median 59 µg/L, TN median 620 µg/L (aligned with NLA freshwater ≈600).
  • Wide chem, one cell column per analyte. All three cycles store <ANALYTE>_RESULT columns (1314 uses _RESULT_UNITS, 1819/2324 use _UNITS). The mapper (map_wide_chem) is schema-uniform: an analyte absent in a cycle yields its null cell column.
  • Chlorophyll split. 1819/2324 carry CHLA_RESULT in the chem file; 1314 keeps it in a separate widewchl file, joined on UID.
  • latin-1 bytes. Some NRSA site files carry non-UTF-8 bytes (accented place names); the parser uses encoding="utf8-lossy" so a stray byte doesn't abort the whole file. CR-only / CRLF endings normalized as in the other NARS kinds. curl, not urllib/httpx (sandbox hang).
  • One cell per (site, cycle). A handful of NRSA reference (RF) sites carry two index records within a cycle; the build dedups on (site_id, year) keeping the first (40 of 5,944 rows).
  • Site IDs differ across cycles. Fresh probability sample each cycle → the place axis is sparse across cycles; coords carried for later spatial re-linking.

Storage

YearLocationdex pattern: ONE place-sorted spine-parquet (data/yearlocation_storehouse/nrsa_water_quality/nrsa_water_quality_spine.parquet) sorted by site_id, year, plus a by-year index JSON. Built by scripts/build_nrsa_water_quality_yearlocationdex.py. 5,904 river-site-cycle cells (2013-14: 2,069; 2018-19: 1,919; 2023-24: 1,916).

Deferred (not in v1)

  • NRSA 2008-09 (older schema; extends the year axis back to 2008).
  • NWCA (Wetlands) as the last NARS sibling kind. (NLA / NCCA / NRSA done.)
  • Revisits (VISIT_NO==2); benthic / fish biology, physical-habitat, enterococci, fish-tissue mercury as additional cell layers.
  • Cross-cycle resample crosswalk (same-site linkage across cycles).
  • Live edge: none — NARS is a periodic survey posted in cycle batches.
Live Feed