Scope freeze — reservoir_cyano_water_quality Locationdex (8th Locationdex kind)
Decided 2026-06-27/28 (Mike pasted the data.gov catalog URL; chose the slot shape on 06-28). Source = EPA ScienceHub dataset "1987-2018 Cyanobacteria and Water Quality Data for 20 Reservoirs" (DOI 10.23719/1503175), the data behind Smucker, Beaulieu, Nietch & Young, "Increasingly severe cyanobacterial blooms and deep water hypoxia coincide with warming water temperatures in reservoirs," Global Change Biology 27(11):2507-2519 (2021), https://doi.org/10.1111/gcb.15618.
Slot = a reservoir (Mike's call)
The shape was a genuine fork: 640 reservoir-year cells (YearLocationdex) or 20 reservoir slots each embedding its annual series (Locationdex). Mike chose 20 reservoir slots (2026-06-28). So this is the 8th Locationdex kind (slot = a fixed place), after neutron_monitor, tide_gauge, magnetic_observatory, streamgauge, radiation_monitor, coral_reef_station, radiosonde_station.
The slot is one of the 20 US Army Corps of Engineers reservoirs (Kentucky / Indiana / Ohio). It
carries the reservoir's identity + place + morphometry, a measured-series summary, and the
full annual_series array of per-year measured indicators (1987-2018, up to 32 entries). The
embedded-series pattern follows the neo CelestialObjectDex slot, which embeds its close-approach
array; here a reservoir embeds its water-quality year array.
Source (multi-source cited slot)
Two EPA ScienceHub tables, each cited on the slot (sources):
- Roster
Reservoir_information.xlsx— the place: abbreviation, name, type (Forest / Ag / Urban), stratification, latitude, longitude, year filled, watershed area, forest fraction, surface area, storage volume, max/mean depth, Zmean:Zmax. - Master measurements
CyanoMaxCD_environmental_vars_FINAL.xlsx(sheetData) — the merged per-reservoir-year measured indicators that become the slot'sannual_series.
Cyanobacteria cell densities derive from samples the US Army Corps of Engineers collected at each reservoir's deepest station (20001), counted and provided to EPA in October 2019.
What a slot holds
- Identity + place:
event_id= the reservoir abbreviation (e.g.EFR), name,latitude,longitude,reservoir_type,stratification,year_filled, and the roster morphometry (watershed_km2,forest_frac,surface_km2,storage_10e6m3,max_depth_m,mean_depth_m,zmean_zmax). series_summary:n_years,year_range,n_years_with_cyano, the peak cyanobacteria maximum cell density (peak_cyano_max_cells_ml) and the year it occurred (peak_cyano_year), andn_present_by_indicator(per-indicator non-null year counts).annual_series: one entry per year, each with the 26 measured indicators below (null where not measured that year).
Measured indicators (26, per year)
cyano_max_cells_ml · chla_ugl · secchi_cm · tp_ppb · p_dissolved_ppb · tkn_ppm ·
nh3_ppm · nox_ppm · toc_ppm · alkalinity_ppm · nh3_inflow_ppm · tkn_inflow_ppm ·
nox_inflow_ppm · tp_inflow_ppb · toc_inflow_ppm · summer_precip_inches ·
may/jun/jul/aug_surface_temp_c · may/jun_deep_do_mgl · may/jun/jul/aug_deep_temp_c.
Measured reality — IN / OUT (bright line feedback_measured_reality_only)
- IN — the raw per-reservoir-per-year measurements above (cyanobacteria max cell density, chlorophyll, clarity, nutrients, summer precipitation, surface + deep water temperatures, deep dissolved oxygen).
- OUT — the study's GAM fits (
GAM_for_stratifying_reservoirs.xlsx,GAM_for_nonstratifying_reservoirs.xlsx) are Generalized Additive model output (smoothed fitted trends), not measurements → not ingested. The master table'slogCyanoMax(a log transform) andSummer_precip_Z-score(a standardization) are skipped as redundant derivatives of carried measurements, not as a bright-line exclusion.
"data is data" (feedback_data_is_data_partial_coverage)
216 of the 640 reservoir-years have no cyanobacteria count, and many years miss individual
analytes. Every year stays in the slot's annual_series with the missing fields nulled, rather
than dropping sparse years. The series summary counts only the years that actually carry a value.
Decisions / gotchas frozen here
- Slot = reservoir (Locationdex), not reservoir-year (YearLocationdex) — Mike's 2026-06-28 call.
- 'na' = null. The master table uses the string
nafor missing;clean_numcoercesna/ blank / None to null and keeps real numbers (unit-tested). - XLSX, not CSV (read via openpyxl read-only); curl, not urllib/httpx — the known sandbox hang.
- Headline sanity check. EFR (East Fork Lake) peaks at 12.8M cyano cells/mL in 2017, the same year its June deep dissolved oxygen falls to ~0.09 mg/L — the paper's warming-blooms-with-deep- hypoxia coincidence, visible directly in one slot's series.
Storage
Locationdex pattern: file-per-slot at
data/location_storehouse/reservoir_cyano_water_quality/<reservoir>.json, written via the shared
event_storehouse.write_dossier + rebuild_index_from_disk(base_dir=…) machinery. 20 reservoir
slots. Built by scripts/build_reservoir_cyano_water_quality_locationdex.py.
Deferred (not in v1)
- Sub-annual measured series: the per-sample cyanobacteria and cyanotoxin/taxa station tables
(
Cyanobacteria_data.xlsx,Cyanotoxin_taxa_data.xlsx) as a finer layer. - Monthly / depth-profile measured tables: standardized surface/deep temperature and deep-DO depth profiles and the monthly nutrient-trend tables.
- NLCD watershed land-cover as a static per-reservoir covariate.
- Spatial sweep / cross-match (deferred to the Locationdex v2 read-across, as with every kind).
- Live edge: none — this is a closed 1987-2018 study dataset, not a live feed.