Listening for events…

Scope freeze — reservoir_cyano_water_quality Locationdex (8th Locationdex kind)

Decided 2026-06-27/28 (Mike pasted the data.gov catalog URL; chose the slot shape on 06-28). Source = EPA ScienceHub dataset "1987-2018 Cyanobacteria and Water Quality Data for 20 Reservoirs" (DOI 10.23719/1503175), the data behind Smucker, Beaulieu, Nietch & Young, "Increasingly severe cyanobacterial blooms and deep water hypoxia coincide with warming water temperatures in reservoirs," Global Change Biology 27(11):2507-2519 (2021), https://doi.org/10.1111/gcb.15618.

Slot = a reservoir (Mike's call)

The shape was a genuine fork: 640 reservoir-year cells (YearLocationdex) or 20 reservoir slots each embedding its annual series (Locationdex). Mike chose 20 reservoir slots (2026-06-28). So this is the 8th Locationdex kind (slot = a fixed place), after neutron_monitor, tide_gauge, magnetic_observatory, streamgauge, radiation_monitor, coral_reef_station, radiosonde_station.

The slot is one of the 20 US Army Corps of Engineers reservoirs (Kentucky / Indiana / Ohio). It carries the reservoir's identity + place + morphometry, a measured-series summary, and the full annual_series array of per-year measured indicators (1987-2018, up to 32 entries). The embedded-series pattern follows the neo CelestialObjectDex slot, which embeds its close-approach array; here a reservoir embeds its water-quality year array.

Source (multi-source cited slot)

Two EPA ScienceHub tables, each cited on the slot (sources):

  • Roster Reservoir_information.xlsx — the place: abbreviation, name, type (Forest / Ag / Urban), stratification, latitude, longitude, year filled, watershed area, forest fraction, surface area, storage volume, max/mean depth, Zmean:Zmax.
  • Master measurements CyanoMaxCD_environmental_vars_FINAL.xlsx (sheet Data) — the merged per-reservoir-year measured indicators that become the slot's annual_series.

Cyanobacteria cell densities derive from samples the US Army Corps of Engineers collected at each reservoir's deepest station (20001), counted and provided to EPA in October 2019.

What a slot holds

  • Identity + place: event_id = the reservoir abbreviation (e.g. EFR), name, latitude, longitude, reservoir_type, stratification, year_filled, and the roster morphometry (watershed_km2, forest_frac, surface_km2, storage_10e6m3, max_depth_m, mean_depth_m, zmean_zmax).
  • series_summary: n_years, year_range, n_years_with_cyano, the peak cyanobacteria maximum cell density (peak_cyano_max_cells_ml) and the year it occurred (peak_cyano_year), and n_present_by_indicator (per-indicator non-null year counts).
  • annual_series: one entry per year, each with the 26 measured indicators below (null where not measured that year).

Measured indicators (26, per year)

cyano_max_cells_ml · chla_ugl · secchi_cm · tp_ppb · p_dissolved_ppb · tkn_ppm · nh3_ppm · nox_ppm · toc_ppm · alkalinity_ppm · nh3_inflow_ppm · tkn_inflow_ppm · nox_inflow_ppm · tp_inflow_ppb · toc_inflow_ppm · summer_precip_inches · may/jun/jul/aug_surface_temp_c · may/jun_deep_do_mgl · may/jun/jul/aug_deep_temp_c.

Measured reality — IN / OUT (bright line feedback_measured_reality_only)

  • IN — the raw per-reservoir-per-year measurements above (cyanobacteria max cell density, chlorophyll, clarity, nutrients, summer precipitation, surface + deep water temperatures, deep dissolved oxygen).
  • OUT — the study's GAM fits (GAM_for_stratifying_reservoirs.xlsx, GAM_for_nonstratifying_reservoirs.xlsx) are Generalized Additive model output (smoothed fitted trends), not measurements → not ingested. The master table's logCyanoMax (a log transform) and Summer_precip_Z-score (a standardization) are skipped as redundant derivatives of carried measurements, not as a bright-line exclusion.

"data is data" (feedback_data_is_data_partial_coverage)

216 of the 640 reservoir-years have no cyanobacteria count, and many years miss individual analytes. Every year stays in the slot's annual_series with the missing fields nulled, rather than dropping sparse years. The series summary counts only the years that actually carry a value.

Decisions / gotchas frozen here

  • Slot = reservoir (Locationdex), not reservoir-year (YearLocationdex) — Mike's 2026-06-28 call.
  • 'na' = null. The master table uses the string na for missing; clean_num coerces na / blank / None to null and keeps real numbers (unit-tested).
  • XLSX, not CSV (read via openpyxl read-only); curl, not urllib/httpx — the known sandbox hang.
  • Headline sanity check. EFR (East Fork Lake) peaks at 12.8M cyano cells/mL in 2017, the same year its June deep dissolved oxygen falls to ~0.09 mg/L — the paper's warming-blooms-with-deep- hypoxia coincidence, visible directly in one slot's series.

Storage

Locationdex pattern: file-per-slot at data/location_storehouse/reservoir_cyano_water_quality/<reservoir>.json, written via the shared event_storehouse.write_dossier + rebuild_index_from_disk(base_dir=…) machinery. 20 reservoir slots. Built by scripts/build_reservoir_cyano_water_quality_locationdex.py.

Deferred (not in v1)

  • Sub-annual measured series: the per-sample cyanobacteria and cyanotoxin/taxa station tables (Cyanobacteria_data.xlsx, Cyanotoxin_taxa_data.xlsx) as a finer layer.
  • Monthly / depth-profile measured tables: standardized surface/deep temperature and deep-DO depth profiles and the monthly nutrient-trend tables.
  • NLCD watershed land-cover as a static per-reservoir covariate.
  • Spatial sweep / cross-match (deferred to the Locationdex v2 read-across, as with every kind).
  • Live edge: none — this is a closed 1987-2018 study dataset, not a live feed.
Live Feed