US Land Use Yeardex — Scope + Frozen Settings
Status: FROZEN 2026-06-16 (Mike confirmed both load-bearing calls: measured-reality IN, kind
subject-scoped) · Owner: Mike + Claude (engine room)
Parent: docs/yeardex-framework.md (the Yeardex extension) and docs/event-spine-framework.md
(the Eventdex framework). This is the first Yeardex kind: the slot is a calendar year, not a
discrete event.
This document scopes the US land use Yeardex: how the United States' land was used, decade by decade, as the federal government accounted for it. Each slot is one survey year. The slot accretes, for that year, how many acres fell into each land-use category (cropland, pasture, forest, urban, special uses) in each state. The series runs from 1945 to 2017 on the Census-of-Agriculture cadence.
This is the first Yeardex kind, so it also serves as the pattern other annual series will follow. It deliberately exercises the multi-source cited-slot rule: one provider (USDA ERS) but many category tables, each contributing cited columns to a year slot.
The theme
One timeline of the American land itself: not a sensor feed, the authoritative statistical accounting of how the nation's surface was allocated, every cropland acre and forest acre and city acre, tracked across seven decades. A consumer can ask "how much US cropland in 1945 vs 2017" or "which states lost farmland fastest" and read it straight off the year slots.
The spine (one provider, many category tables)
| Source | Population | Slots | History | Status |
|---|---|---|---|---|
| USDA ERS, Major Uses of Land in the United States | every land-use category × state × survey year | 16 years | 1945–2017 | NEW (one-time xlsx parse) |
16 uniform category tables collapse onto 16 year slots. USDA ERS publishes the Major Uses of
Land product as ~21 separate by-state spreadsheets (ers.usda.gov/media/<id>/…). Of these, 16 are
uniform "1945–2017 by state" tables (years-across-columns, geographies-down-rows, values in
thousands of acres): 12 base categories + 4 totals. The other 5 are summary cross-tabs
(Summary Tables 1–5: the same figures re-laid-out with categories as columns for a single year) and
are excluded from v1 as redundant — they add no new measured data and break the uniform parse;
the by-state tables already carry the region and national rows the summaries aggregate. A 6th media
id under the product (5615, an export/import trade zip) is not a land-use table and is excluded. The
year is the natural slot; each table contributes that category's column to every year slot. The
PG-staged AutoSense rows for these links are catalog_xlsx_row stubs (the crawler grabbed the file,
never parsed the figures), so the spine is a fresh one-time parse of the actual xlsx files into a
clean source, the same move FEMA's spine made.
The 12 base categories (which partition the land, never double-counting) are: cropland-used-for-
crops, cropland-used-for-pasture, cropland-idled, grassland-pasture-and-range, forest-use-grazed,
forest-use-not-grazed, rural-transportation, rural-parks-and-wildlife, defense-and-industrial,
miscellaneous-farmland, urban-area, all-other-land-uses. The 4 totals (kept and flagged
category_kind=total so a consumer never sums them with the bases) are: total-cropland,
total-forest-use-land, total-special-uses, total-land. Brick D reconciles the totals against the sums
of their constituent bases as a parse-correctness check.
The slots — survey years (FROZEN)
The 16 slots are the survey years carried in the table headers:
1945, 1949, 1954, 1959, 1964, 1969, 1974, 1978, 1982, 1987, 1992, 1997, 2002, 2007, 2012, 2017.
These cluster on the Census of Agriculture's 5-year cadence (the data's backbone). Slot ID is the
bare year (2017.json). No floor: every survey year is a slot.
What a year slot accretes (FROZEN)
For each year slot, for each of the 21 category tables, for each geography row (US total, the 9 farm production regions, and the 50 states + DC), the acreage figure, tagged with:
category— the land-use category (the table's subject).category_kind—base(a non-overlapping land use) ortotal/summary(a derived sum). The 12 base categories (cropland-for-crops, cropland-pasture, cropland-idled, grassland-pasture-and-range, forest-grazed, forest-not-grazed, rural-transportation, rural-parks-and-wildlife, defense-and-industrial, misc-farmland, urban, all-other) partition the land; the totals (total cropland, total forest, total special uses, total land) and the 6 summary tables are sums, kept and flagged so a consumer never double-counts.geography—national/region/state, with the name.value_kacres— thousands of acres (the raw unit; never converted in v1).source— the ERS media id (the cited block, per the multi-source rule).
N.A. cells (not available) are stored as null, never zero.
Geometry — area, not point (FROZEN)
Land use is a state/region/national area, not a point. v1 stores the geography name and level and
leaves the observation lat/lon NULL, exactly as FEMA did for multi-county disasters. State
centroids are a later refinement, not v1. There is no spatial sweep — a year is not a place, so
there is nothing to query within a radius. Yeardex is catalog-first, sweep-none by definition.
No live edge (historical, by Mike's license)
Major Uses of Land updates roughly every five years, on the Census cadence; the 2017 edition is current. Per Mike's standing license (historical data with no new feed is fine to Yeardex without a live fetcher), v1 is a one-time parse. A refresh is a re-run of the parse when ERS publishes the next edition, not a scheduled fetcher. (If we later want it on the scheduler, it is a trivial add; not v1.)
LOAD-BEARING CALL #1 — measured-reality ruling — FROZEN: IN (Mike, 2026-06-16)
These figures are USDA ERS estimates, "based on data from the Census of Agriculture" and related recorded sources. Does an estimate clear the measured-reality bright line ([[feedback_measured_reality_only]])? Mike's ruling (2026-06-16): IN — "it's accounting." The bright line is measurement of what physically happened vs a computer's estimate of what will, might, or would-have happened. These are a backward-looking statistical accounting of land that actually existed in that state in that year, anchored to the Census of Agriculture (an actual enumeration of real farms, conducted on the ground). "Estimate" here means census-based accounting of the past, not model projection of the future. It is the same class as FEMA (an administrative record of a real disaster) and the catalogs of real events we already ship. The bright line for this kind is drawn precisely:
- IN: recorded/accounted area of actual past land use, by state and survey year.
- OUT: any ERS or other projection / scenario of future or counterfactual land use (none are in this product; if a future edition adds projected columns, they are dropped, the way FEMA's National Risk Index loss models are dropped).
(If Mike had ruled an "estimate" too far from raw measurement, the fallback was to hold this kind out
the way open_meteo/era5_cloud are held out and prove Yeardex on a purely-enumerated series. He
ruled IN, so landuse stands as the first Yeardex.)
LOAD-BEARING CALL #2 — kind granularity — FROZEN: subject-scoped (Mike, 2026-06-16)
Per the Yeardex framework's open decision: subject-scoped kind vs one national timeline.
Mike's ruling (2026-06-16): subject-scoped. This kind is landuse; future annual series (milk
supply, emissions) each get their own subject-kind. The us-annual grab-bag is the rejected
alternative.
Dossier / slot shape
Year-slot header (year = the survey year, kind = landuse). Body: the accreted category × geography
matrix described above, each value carrying its source (ERS media id) per the cited-slot rule, plus
roll-up conveniences (national totals per category for the year). Kind directory:
data/year_storehouse/landuse/<year>.json. Index rebuilt from disk by globbing the kind dir, the same
mechanism as the event storehouse.
Data-source notes / gotchas (for the build bricks)
- xlsx shape: sheet 1, row 1 title, row 2 header (
Regions and States+ the 16 year columns), data rows 3→"U.S. total", then notes/source rows. Year headers carry footnote markers (2012 1/,2017trailing space) — strip to the bare 4-digit year. State rows are indented (leading spaces); region rows are flush-left.read_onlyinflatesmax_row; stop at theU.S. totalrow. - Units: thousands of acres, uniform across all tables. Never converted in v1.
- Provider is clean + no-auth:
https://www.ers.usda.gov/media/<id>/<slug>.xlsx. ~21 files, cache each todata/usda_cache/like every other spine. - Totals vs bases: flag
category_kindso a consumer summing categories does not double-count the total tables.
Build bricks
- Brick A — freeze. THIS DOC (2026-06-16): theme, USDA ERS Major Uses of Land spine, 16
survey-year slots, full population (all 21 tables × all geographies, no floor), bare-year slot ID,
category × geography accretion with cited source blocks, area-not-point geometry, no sweep, one-time
historical parse. Both load-bearing calls FROZEN by Mike 2026-06-16: measured-reality IN ("it's
accounting"); kind subject-scoped (
landuse). - Brick B — spine parse. DONE 2026-06-16:
scripts/reload_landuse_spine.pydownloaded the 16 uniform xlsx tables (cached todata/usda_cache/) and parsed them to 16,128 tidy data points = 16 survey years (1945–2017) × 16 categories × 63 geographies (US total + 12 ERS regions + 50 states + DC). Raw archived to DuckDB (raw_landuse_spine); roster →data/usda_landuse_roster.parquet; loaded into a cleanusda_landusesource (self-registered,active=False, no live fetcher) as one observation per (year, category, geography), geo NULL, value = acreage, unit =kacres, metric =landuse_kacres, each row citing its ERS media id. - Brick C — kind registration. DONE 2026-06-16:
src/terrapulse/monitor/landuse_yeardex.pyregisters thelanduseYeardex kind (LANDUSE_CONFIG: radius_km=None, sensor_slugs=(); slot id = bare year),get_yearsgroups the data points into one year record (full category × geography matrix + cited sources + national roll-up), year-slotbuild_dossier. Stored in a separatedata/year_storehouse/(year axis kept distinct from the event axis), reusing the event_storehouse write + disk-rebuilt-index machinery viabase_dir. 3 unit tests. - Brick D — slot backfill. DONE 2026-06-16:
landuse_yeardex.backfill_and_store()built the 16 year-slot dossiers in one pass. Verified one-slot-per-year (16 files == 16 survey years) and totals reconcile against the base sums at national level across all 16 years (worst gap 7 of 390,000 thousand-acres = 0.002%, pure USDA independent-rounding noise; Total forest-use land matches exactly). Headline US 2017: cropland 390 M, forest-use 622 M, grassland/pasture 659 M, urban 74 M, total land 2.26 B acres — matches USDA's published Major Uses figures. The US Land Use Yeardex is COMPLETE (A+B+C+D) — the first Yeardex kind. - First report: deferred. Engine room, not paper mode.