US Dairy Yeardex — Scope + Frozen Settings
Status: FROZEN 2026-06-16 (Mike chose dairy as the second Yeardex subject) · Owner:
Mike + Claude (engine room)
Parent: docs/yeardex-framework.md (the Yeardex extension; slot = calendar year) and
docs/event-spine-framework.md. This is the second Yeardex kind after landuse, and it
follows that kind's pattern exactly: a one-time parse of USDA ERS tables into a clean source, then
one year slot per calendar year, each value citing its ERS source table.
This document scopes the US dairy Yeardex: how much milk the United States produced and where it went, year by year, as USDA accounted for it. Each slot is one calendar year. The slot accretes, for that year, the national dairy figures, milk produced, cows milked, milk per cow, the supply-and-utilization balance, and the price indexes, each figure tagged with the ERS table it came from.
The theme
One national timeline of American milk: not a forecast, the recorded statistical accounting of how much milk the herd gave and how the country used it, every billion pounds of production, every category of stocks and exports and domestic use. A consumer can ask "how much US milk in 1998 vs 2025" or "how has milk-per-cow changed" and read it straight off the year slots.
The spine (one provider, several annual tables)
| Source (ERS media id) | Population | Annual years | Status |
|---|---|---|---|
| 5503 — US Milk Production and Related Data | milk cows, milk per cow, milk production, feed value, replacement-cow price | 1998–2025 | NEW (one-time CSV parse) |
| 5507 — Supply and Utilization of Milk in All Products | supply/utilization balance × 4 product bases (milk-fat / skim-solids) | 2011–2025 | NEW |
| 5501 — US Dairy Situation at a Glance | price indexes + situation indicators across 10 categories | 2024–2025 | NEW |
Union of annual years = 28 slots, 1998–2025. 5503 is the backbone (28 years); 5507 enriches the
supply/utilization detail from 2011; 5501 is the current two-year snapshot. A year slot therefore
carries whatever tables cover it, cited, the multi-source cited-slot rule (a 2024 slot has all
three tables; a 2000 slot has 5503 only). These are tidy long-format CSVs
(ers.usda.gov/media/<id>/…csv); the PG-staged AutoSense rows for the links are catalog_csv_row
stubs, so this is a fresh one-time parse into a clean usda_dairy source, the same move landuse and
FEMA made.
Annual rows only. Each table also carries monthly/quarterly rows; the Yeardex slot is a year, so
only the Annual / ANNUAL rows are parsed. Monthly granularity is out of scope (a year slot is
not a month).
What a year slot accretes (FROZEN)
For each year slot, for each table, for each annual figure, the value tagged with:
source— the ERS media id (the cited block).table— the human table name.category— the figure's group where the table has one (Supply/Utilization/ a 5501 category); null where it does not.data_item— the measured series (e.g. "Milk production", "Milk cow", "Milk per cow").product— the product basis where the table has one (5507: milk-fat / skim-solids basis); null otherwise.value+unit— the figure and its raw unit (Million pounds / 1,000 head / Pounds / index); never converted.
Geometry — national, no point, no sweep (FROZEN)
These are national figures. v1 stores no geography point: observation lat/lon are NULL,
exactly as landuse and FEMA did. There is no spatial sweep, a year is not a place. Yeardex is
catalog-first, sweep-none by definition.
LOAD-BEARING — measured-reality: IN (inherits the landuse ruling)
These are USDA ERS recorded statistics: milk actually produced, cows actually milked, dairy products actually moved through supply and use, prices that actually obtained. This is the same class as landuse, which Mike ruled IN ("it's accounting") on 2026-06-16, and the same class as the FEMA administrative record. The bright line ([[feedback_measured_reality_only]]):
- IN: recorded national production / supply / utilization / price figures for past years.
- OUT: any USDA projection / forecast of future dairy (e.g. WASDE forward balances). None of the three v1 tables carry projection rows; if a future edition adds them, they are dropped.
The kind is subject-scoped (dairy), consistent with Mike's landuse ruling that each annual
subject gets its own kind rather than a us-annual grab-bag.
Excluded from v1 (noted, not lost)
- Historical xlsx 5524 (commercial disappearance 1995–2010), 5525 (milk supply & utilization 1970–2010), 5526 (bottling plants 1960–2007): extend history back to 1960/1970 but are capped/superseded and overlap the CSV years. Deferred as a history-extension brick (their cited figures would simply add to the pre-1998 slots under the same cited-slot rule).
- 5505 (supply & utilization of dairy product categories, butter/cheese/etc.): an easy future add, more product detail; not needed to prove the kind.
- Monthly/quarterly rows: out by definition (annual slot).
No live edge (historical, by Mike's license)
The ERS tables refresh a few times a year. Per the standing license (historical data with no
streaming feed is fine to Yeardex without a live fetcher), v1 is a one-time parse into a clean
usda_dairy source (active=False). A refresh is a re-run of the parse, not a scheduled fetcher.
Dossier / slot shape
Year-slot header (year, kind = dairy). Body: the year's accreted figures, grouped by source
table, each value carrying its source (ERS media id) per the cited-slot rule, plus a national
headline roll-up (milk production, milk cows, milk per cow for the year). Kind directory:
data/year_storehouse/dairy/<year>.json, index rebuilt from disk by globbing the kind dir.
Build bricks
- Brick A — freeze. THIS DOC (2026-06-16): theme, 3-table USDA ERS dairy spine, 28 year slots 1998–2025, annual rows only, bare-year slot id, cited-figure accretion, national / no-point / no-sweep, one-time parse. Measured-reality IN (inherits landuse), kind subject-scoped.
- Brick B — spine parse. DONE 2026-06-16:
scripts/reload_dairy_spine.pypulled the 3 CSVs (curl-seededdata/usda_dairy_cache/; httpx hangs against ERS here), kept the annual rows, and loaded 1,074 annual data points = 5503 (140) + 5507 (862) + 5501 (72) into a cleanusda_dairysource (active=False), one observation per (year, table, category, data_item, product), geo NULL, metricdairy_annual, each row citing its ERS media id. Raw archived to DuckDB, roster to parquet. 28 year slots 1998–2025. - Brick C/D — year slots. DONE 2026-06-16:
src/terrapulse/monitor/dairy_yeardex.pygroups the points into one record per year (figures grouped + cited by source table + national headline roll-up from the 5503 backbone), builds the year slot in the separatedata/year_storehouse/, backfills all 28 slots, rebuilds the index (year_storehouse now holds 2 kinds: dairy 28 + landuse 16 = 44). 4 unit tests. Spot check passes: 2023 reads 226.3 billion lb milk, 9.38 M cows, 24,117 lb/cow, matching USDA. The US Dairy Yeardex is COMPLETE (A+B+C/D) — the second Yeardex kind. - First report: deferred. Engine room, not paper mode.