YearLocationdex framework (5th dex family)
Mike, 2026-06-26. Both the concept and the name are his: he described the shape as "a Yeardex of Locationdexes" and named the family YearLocationdex — the name says its two axes outright (year × location), so one broad family covers the whole space instead of splitting into week/month/year variants (Mike: "I want as few dex categories as possible, so the name describes what I am looking for"). The fifth dex family, after Eventdex (slot = event), Yeardex (slot = year), Locationdex (slot = place), and CelestialObjectDex (slot = object).
A YearLocationdex tracks each place across the years — a Yeardex crossed with a
Locationdex. The slot is a (place, period) cell. Year is the top time unit, and a
year holds its finer periods inside it (a year is ~52–54 weeks; drought is one cell per
county per USDM week, those weeks rolling up under each year). So the family name commits
to "year × location" while the source's native cadence (weekly here, monthly or annual
elsewhere) nests within the year.
When a source is a YearLocationdex, not a single-axis dex
Ask: does the measurement exist for every (place, period) pair on a regular grid?
- A drought map redraws every US county every week → county × (year→week) grid.
- A station network's annual value per station → station × year grid.
- A county climate normal per month → county × (year→month) grid.
If instead each record is a one-off happening (slot = event), a fixed sensor's running series (slot = place, time is just a column), or a per-year subject with no place axis (slot = year), it is one of the four single-axis families. The test is regular coverage on both axes — a full place × time grid, not a sparse log.
Storage — one flat place-sorted spine, plus a by-year index
Decided for drought (Mike, 2026-06-26: "which one combines the most simplicity with
the fastest access to data"). Do not physically nest two families or keep two sorted
copies. Store the grid as ONE flat spine-parquet:
rows = place × period, one row per cell, columns = the cell's measured value(s).- Sorted by place (then period) so one place's whole history reads contiguously — the dominant read ("show me this county's drought history").
- A small by-year index JSON alongside gives the time cross-section directly (per-year counts + span) without a second 4M-row sorted copy. Year is the natural top key precisely because the finer periods nest under it.
- Spine-parquet, never file-per-slot: the grid is millions of cells (drought = 4.2M);
file-per-slot would bloat the shared
storehouse_index. Each YearLocationdex kind gets its own base dir (data/yearlocation_storehouse/<kind>/), kept out of the event index.
This is "both axes first-class" done cheaply: place axis is the physical sort, year axis is the index. A derived single-axis read (e.g. a drought-episode Eventdex, one slot per continuous spell) can be built on top later; it is not the primary store.
Multi-value cells ("data is data")
A cell may carry more than one measured value at different coverage spans. Carry them
all; null where a value's span does not reach (feedback_data_is_data_partial_coverage).
Do not drop a richer value because it covers fewer years than another.
Bright line
Same as every family: measured reality only (feedback_measured_reality_only). A
grid of assessed/observed conditions is IN (USDM, on the FEMA administrative-record
precedent); a grid of modeled values (SPEI's modeled PET, any reanalysis grid) is OUT.
Kinds
| Kind | Slot | Source | Cells | Storage | Commit |
|---|---|---|---|---|---|
drought |
county × week | USDM (NCEI/NIDIS CDC archive) | 4,198,880 (3,220 counties × 2000–2024 weeks) | data/yearlocation_storehouse/drought/drought_spine.parquet + by-year index |
<pending> |
Cell values for drought: in_drought_d1 (binary D1+ flag, 2000–2024) and
area_pct_d1 (percent of county area in D1+, 2000–2021; null 2022–2024 until backfill).
Scope freeze: docs/scope-drought-yearlocationdex.md.
Deferred
- Live edge for
drought. No current-year county file exists (NCEI posts the annual file after the fact); the liveusdm_droughtPG feed is state-level (coarser). The live edge is a separate decision — likely a county-AOI pull from the USDM data services API. - County centroids for mapping (a county is a polygon; v1 keys on FIPS + name + state).
- Per-week D0–D4 split (the historical archive carries only the D1+ threshold).