Scope freeze — coral_reef_station Locationdex (6th Locationdex kind)
Decided 2026-06-27 (Mike pasted the data.gov record + ruled "build small standalone
Locationdex now"). Source = EPA's 2011 probabilistic coral-reef condition survey along the
southern coast of Puerto Rico. The 6th Locationdex kind (docs/locationdex-framework.md) after
neutron_monitor, tide_gauge, magnetic_observatory, streamgauge, radiation_monitor. The slot is a
PLACE: one reef survey station.
This one broke the session's NARS water-quality pattern (it's coral-reef ecology, not water chemistry; one region; one year), so the shape was an explicit categorization call — Mike chose a standalone Locationdex over parking it.
Slot
One (reef station) slot. Slot id is namespaced by survey: <survey_id>-<station> (e.g.
pr2011-1), so stations from different surveys never collide in the file-per-slot store. Region
- survey year are slot fields, not the slot key.
This is an EXPANDING kind (Mike, 2026-06-27: "this will definitely not be the last coral reef
survey"). The build is a multi-survey registry from the start (SURVEYS dict in the build
script): adding a same-format survey = one registry entry; a survey whose sheets differ in shape
gets its own per-sheet reader feeding the shared aggregators + dossier (written against the real
file, not a guessed schema). build() clears and fully rebuilds the kind dir from source each
run, so renamed/removed slots never linger. First survey = PR 2011 (64 stations).
Source
Data.gov record: https://catalog.data.gov/dataset/2011-pr-survey-data. Seven per-taxon xlsx
workbooks on EPA's pasteur host (10.23719/1407509). v1 carries the three headline
reef-condition layers + station info; secondary taxa deferred.
Measured reality — IN (bright line feedback_measured_reality_only)
Every value carried is a direct field measurement, all IN:
- Stony coral (per colony): % live tissue, bleached / diseased tallies, height, max diameter, colony count, taxa richness, colony density.
- Fish (per species, belt transect): counts by size class → total individuals, species richness, density.
- Rugosity: draped-length / linear-distance transect ratio (structural complexity index).
The survey's design-based regional condition characterization (the probabilistic population estimate) is not in these raw files and is not carried — consistent with holding out the NARS condition estimates.
Per-station aggregation / gotchas frozen here
- Density uses the single transect-area value, not a row sum. Every colony in a station shares the one survey transect's area (25 m² for coral, 100 m² for fish); density = count / that area, not count / (rows × area). Getting this wrong would deflate density by the colony count. If a station ever carries more than one distinct transect area, the build sums the distinct values (one transect per area).
- Fish total = sum across every size-class bin, across all species rows (counts are spread
over 25 size bins
<5 cm…90-95 cm). - Bleached / Diseased / Clionid are "Yes" / blank flags → percent-of-colonies tallies.
- Station ids are integers in the sheets → normalized to a stable string slot key
(
1.0→"1"). - xlsx via openpyxl (read-only, data_only); curl cache-first (sandbox urllib hang).
Storage
Locationdex sibling storehouse data/location_storehouse/coral_reef_station/, file-per-slot
(64 slots is negligible for the shared storehouse_index), via the event_storehouse
write-dossier + disk-rebuilt-index machinery (base_dir=location_storehouse). Built by
scripts/build_coral_reef_station_locationdex.py. 64 station slots, all with the coral layer
(coral % live ≈ 79–90%, bleaching 0–10%, fish richness 14–22 spp, rugosity index 1.07–1.63).
Deferred (not in v1)
- Secondary taxa layers: invertebrates, gorgonians/sponges (SpGorg), Palythoa — the other three workbooks.
- Sibling EPA regional reef surveys (other years / jurisdictions) that would turn this into a multi-region reef-station network instead of a single 2011 PR snapshot.
- Spatial sweep / cross-match (
swept=false,cross_match="deferred-v2-locationdex"), as with the other Locationdex kinds. - Exact per-transect replication (the v1 density assumes one transect per area value).