Scope freeze — nwca_wetland_chemistry YearLocationdex (5th YearLocationdex kind)
Decided 2026-06-27 (Mike pasted the NARS master data page; I surfaced the shape fork and he
ruled "soil + water in one wetland kind"). Source = EPA's National Wetland Condition Assessment
(NWCA), the wetlands member of the National Aquatic Resource Surveys. The slot is one
(wetland site, survey cycle) cell — the same YearLocationdex shape as the NARS lake/coastal/
river kinds (docs/yearlocationdex-framework.md). This is the wetlands sibling that completes the
NARS freshwater/coastal/flowing/wetland set; the last NARS member (NWCA) of the family.
Why this kind breaks the water-only pattern (the categorization call)
The three earlier NARS YearLocationdex kinds (nla_water_quality lakes, ncca_water_quality
coastal, nrsa_water_quality rivers) are pure water-chemistry kinds. NWCA is not, and forcing it
into that mold would have lost data:
- A wetland's headline measurement is its SOIL, not its water.
- Only ~55% of NWCA sites have standing surface water to sample (631 of
1,129 in 2011, 675 in 2016, all 666 in 2021), but **92% have soil chemistry** (~1,035 per soil-bearing cycle). - Naming it
nwca_water_qualityand carrying only water would silently drop the ~400 soil-only wetland sites each cycle — a direct "data is data" violation (feedback_data_is_data_partial_coverage).
So Mike chose ONE wetland kind carrying both blocks. A site with only one block (water-only in
2021, soil-only in the dry inland wetlands) is still a cell. Named nwca_wetland_chemistry, not
*_condition, because "condition" is the held-out design-based product.
Slot
One (wetland site, survey cycle) cell. site_id × year is the slot key; cycle (the human
"YYYY" label), lat/lon, state, and hyd_cls (INLAND / TIDAL) are slot fields. The index
visit (VISIT_NO == 1) is the cell; revisits deferred. Like the other NARS kinds, NWCA draws a
fresh probability sample each cycle, so site IDs do not align across cycles (sparse place axis);
coords are carried for later spatial re-linking.
Year axis (real, 3 cycles)
2011, 2016, 2021. Unlike NCCA (one cycle), NWCA gives a genuine multi-cycle year axis like NRSA. 2021 released no soil chemistry as of this build, so its soil block is null (carry the water, null the rest, backfill later).
Indicators carried
Water block (shared NARS vocabulary): ptl_ugl, ntl_ugl, cond_uscm, ph, chla_ugl.
These are the lakes set trimmed to what wetland water sampling delivers (no Secchi/turb/doc/anc).
Soil block (new to the family — this kind's primary signal): soil_carbon_pct,
soil_nitrogen_pct, soil_sulfur_pct, soil_ph_h2o, soil_cec_cmolkg.
Measured reality — IN / OUT (bright line feedback_measured_reality_only)
- IN — raw per-site lab measurements: water total N/P, chlorophyll-a, conductivity, pH; soil total carbon/nitrogen/sulfur (percent), soil-water pH, cation exchange capacity (cmol(+)/kg).
- OUT — the survey's design-based condition estimates (
*_condition_estimates,*_data_for_population_estimates) and the vegetation MMI / index scores (population-weighted "% of US wetland area in good condition" = a computed population estimate, not a measurement).
Unit harmonization (load-bearing, frozen here)
- Total P is already µg/L →
ptl_uglunscaled (matches NRSA, NOT NLA/NCCA where PTL was mg/L). - Total N is mg N/L →
ntl_ugl×1000. Getting these backwards manufactures a fake 1000× offset. Verified harmonized: ptl medians 104–122 µg/L, ntl medians 1065–1210 µg/L across cycles. - Conductivity µS/cm; CHLA µg/L; pH dimensionless.
- Soil total C/N/S are percent (peat soils run to ~46% C; values 0.16–46.5 confirm percent, not mg/kg); CEC cmol(+)/kg; soil pH dimensionless.
Per-cycle schema drift (one adapter per cycle, frozen)
- Water 2011 = bare columns (
TP/TN/COND/PH) + a separatenwca2011_chla.csvjoined on UID. Water 2016/2021 = wide<ANALYTE>_RESULTcolumns. - Soil 2011 = per-LAYER long file → take the surface horizon (shallowest
DEPTH) per site, the comparable biologically-active layer (load-bearing: not a row aggregate). Soil 2016 = a standardized-depth core, already oneHORIZON='STD'row per site (no pick). Soil 2021 = not released → null soil block.
Storage
The YearLocationdex pattern: ONE place-sorted spine-parquet
(data/yearlocation_storehouse/nwca_wetland_chemistry/nwca_wetland_chemistry_spine.parquet,
sorted by site_id, year) + a by-year index JSON. Never file-per-slot (keeps the shared
storehouse_index lean). Built by scripts/build_nwca_wetland_chemistry_yearlocationdex.py.
2,849 cells (2011: 1,129 / 2016: 1,054 / 2021: 666).
Deferred (not in v1)
- NWCA 2021 soil chemistry (not released as of build — extends the soil block to all 3 cycles).
- Soil trace-metal, texture, and exchangeable-cation suites (the full soilchem column set).
- Vegetation / hydrology / buffer / USA-RAM condition layers (the other NWCA workbooks).
- Cross-cycle same-site resample crosswalk and revisits (
VISIT_NO == 2). - Spatial sweep / cross-match (deferred to the YearLocationdex v2 read-across, as with the other NARS kinds).