Listening for events…

US Land Use Yeardex — Scope + Frozen Settings

Status: FROZEN 2026-06-16 (Mike confirmed both load-bearing calls: measured-reality IN, kind subject-scoped) · Owner: Mike + Claude (engine room) Parent: docs/yeardex-framework.md (the Yeardex extension) and docs/event-spine-framework.md (the Eventdex framework). This is the first Yeardex kind: the slot is a calendar year, not a discrete event.

This document scopes the US land use Yeardex: how the United States' land was used, decade by decade, as the federal government accounted for it. Each slot is one survey year. The slot accretes, for that year, how many acres fell into each land-use category (cropland, pasture, forest, urban, special uses) in each state. The series runs from 1945 to 2017 on the Census-of-Agriculture cadence.

This is the first Yeardex kind, so it also serves as the pattern other annual series will follow. It deliberately exercises the multi-source cited-slot rule: one provider (USDA ERS) but many category tables, each contributing cited columns to a year slot.

The theme

One timeline of the American land itself: not a sensor feed, the authoritative statistical accounting of how the nation's surface was allocated, every cropland acre and forest acre and city acre, tracked across seven decades. A consumer can ask "how much US cropland in 1945 vs 2017" or "which states lost farmland fastest" and read it straight off the year slots.

The spine (one provider, many category tables)

Source Population Slots History Status
USDA ERS, Major Uses of Land in the United States every land-use category × state × survey year 16 years 1945–2017 NEW (one-time xlsx parse)

16 uniform category tables collapse onto 16 year slots. USDA ERS publishes the Major Uses of Land product as ~21 separate by-state spreadsheets (ers.usda.gov/media/<id>/…). Of these, 16 are uniform "1945–2017 by state" tables (years-across-columns, geographies-down-rows, values in thousands of acres): 12 base categories + 4 totals. The other 5 are summary cross-tabs (Summary Tables 1–5: the same figures re-laid-out with categories as columns for a single year) and are excluded from v1 as redundant — they add no new measured data and break the uniform parse; the by-state tables already carry the region and national rows the summaries aggregate. A 6th media id under the product (5615, an export/import trade zip) is not a land-use table and is excluded. The year is the natural slot; each table contributes that category's column to every year slot. The PG-staged AutoSense rows for these links are catalog_xlsx_row stubs (the crawler grabbed the file, never parsed the figures), so the spine is a fresh one-time parse of the actual xlsx files into a clean source, the same move FEMA's spine made.

The 12 base categories (which partition the land, never double-counting) are: cropland-used-for- crops, cropland-used-for-pasture, cropland-idled, grassland-pasture-and-range, forest-use-grazed, forest-use-not-grazed, rural-transportation, rural-parks-and-wildlife, defense-and-industrial, miscellaneous-farmland, urban-area, all-other-land-uses. The 4 totals (kept and flagged category_kind=total so a consumer never sums them with the bases) are: total-cropland, total-forest-use-land, total-special-uses, total-land. Brick D reconciles the totals against the sums of their constituent bases as a parse-correctness check.

The slots — survey years (FROZEN)

The 16 slots are the survey years carried in the table headers: 1945, 1949, 1954, 1959, 1964, 1969, 1974, 1978, 1982, 1987, 1992, 1997, 2002, 2007, 2012, 2017. These cluster on the Census of Agriculture's 5-year cadence (the data's backbone). Slot ID is the bare year (2017.json). No floor: every survey year is a slot.

What a year slot accretes (FROZEN)

For each year slot, for each of the 21 category tables, for each geography row (US total, the 9 farm production regions, and the 50 states + DC), the acreage figure, tagged with:

  • category — the land-use category (the table's subject).
  • category_kindbase (a non-overlapping land use) or total/summary (a derived sum). The 12 base categories (cropland-for-crops, cropland-pasture, cropland-idled, grassland-pasture-and-range, forest-grazed, forest-not-grazed, rural-transportation, rural-parks-and-wildlife, defense-and-industrial, misc-farmland, urban, all-other) partition the land; the totals (total cropland, total forest, total special uses, total land) and the 6 summary tables are sums, kept and flagged so a consumer never double-counts.
  • geographynational / region / state, with the name.
  • value_kacres — thousands of acres (the raw unit; never converted in v1).
  • source — the ERS media id (the cited block, per the multi-source rule).

N.A. cells (not available) are stored as null, never zero.

Geometry — area, not point (FROZEN)

Land use is a state/region/national area, not a point. v1 stores the geography name and level and leaves the observation lat/lon NULL, exactly as FEMA did for multi-county disasters. State centroids are a later refinement, not v1. There is no spatial sweep — a year is not a place, so there is nothing to query within a radius. Yeardex is catalog-first, sweep-none by definition.

No live edge (historical, by Mike's license)

Major Uses of Land updates roughly every five years, on the Census cadence; the 2017 edition is current. Per Mike's standing license (historical data with no new feed is fine to Yeardex without a live fetcher), v1 is a one-time parse. A refresh is a re-run of the parse when ERS publishes the next edition, not a scheduled fetcher. (If we later want it on the scheduler, it is a trivial add; not v1.)

LOAD-BEARING CALL #1 — measured-reality ruling — FROZEN: IN (Mike, 2026-06-16)

These figures are USDA ERS estimates, "based on data from the Census of Agriculture" and related recorded sources. Does an estimate clear the measured-reality bright line ([[feedback_measured_reality_only]])? Mike's ruling (2026-06-16): IN — "it's accounting." The bright line is measurement of what physically happened vs a computer's estimate of what will, might, or would-have happened. These are a backward-looking statistical accounting of land that actually existed in that state in that year, anchored to the Census of Agriculture (an actual enumeration of real farms, conducted on the ground). "Estimate" here means census-based accounting of the past, not model projection of the future. It is the same class as FEMA (an administrative record of a real disaster) and the catalogs of real events we already ship. The bright line for this kind is drawn precisely:

  • IN: recorded/accounted area of actual past land use, by state and survey year.
  • OUT: any ERS or other projection / scenario of future or counterfactual land use (none are in this product; if a future edition adds projected columns, they are dropped, the way FEMA's National Risk Index loss models are dropped).

(If Mike had ruled an "estimate" too far from raw measurement, the fallback was to hold this kind out the way open_meteo/era5_cloud are held out and prove Yeardex on a purely-enumerated series. He ruled IN, so landuse stands as the first Yeardex.)

LOAD-BEARING CALL #2 — kind granularity — FROZEN: subject-scoped (Mike, 2026-06-16)

Per the Yeardex framework's open decision: subject-scoped kind vs one national timeline. Mike's ruling (2026-06-16): subject-scoped. This kind is landuse; future annual series (milk supply, emissions) each get their own subject-kind. The us-annual grab-bag is the rejected alternative.

Dossier / slot shape

Year-slot header (year = the survey year, kind = landuse). Body: the accreted category × geography matrix described above, each value carrying its source (ERS media id) per the cited-slot rule, plus roll-up conveniences (national totals per category for the year). Kind directory: data/year_storehouse/landuse/<year>.json. Index rebuilt from disk by globbing the kind dir, the same mechanism as the event storehouse.

Data-source notes / gotchas (for the build bricks)

  • xlsx shape: sheet 1, row 1 title, row 2 header (Regions and States + the 16 year columns), data rows 3→"U.S. total", then notes/source rows. Year headers carry footnote markers (2012 1/, 2017 trailing space) — strip to the bare 4-digit year. State rows are indented (leading spaces); region rows are flush-left. read_only inflates max_row; stop at the U.S. total row.
  • Units: thousands of acres, uniform across all tables. Never converted in v1.
  • Provider is clean + no-auth: https://www.ers.usda.gov/media/<id>/<slug>.xlsx. ~21 files, cache each to data/usda_cache/ like every other spine.
  • Totals vs bases: flag category_kind so a consumer summing categories does not double-count the total tables.

Build bricks

  • Brick A — freeze. THIS DOC (2026-06-16): theme, USDA ERS Major Uses of Land spine, 16 survey-year slots, full population (all 21 tables × all geographies, no floor), bare-year slot ID, category × geography accretion with cited source blocks, area-not-point geometry, no sweep, one-time historical parse. Both load-bearing calls FROZEN by Mike 2026-06-16: measured-reality IN ("it's accounting"); kind subject-scoped (landuse).
  • Brick B — spine parse. DONE 2026-06-16: scripts/reload_landuse_spine.py downloaded the 16 uniform xlsx tables (cached to data/usda_cache/) and parsed them to 16,128 tidy data points = 16 survey years (1945–2017) × 16 categories × 63 geographies (US total + 12 ERS regions + 50 states + DC). Raw archived to DuckDB (raw_landuse_spine); roster → data/usda_landuse_roster.parquet; loaded into a clean usda_landuse source (self-registered, active=False, no live fetcher) as one observation per (year, category, geography), geo NULL, value = acreage, unit = kacres, metric = landuse_kacres, each row citing its ERS media id.
  • Brick C — kind registration. DONE 2026-06-16: src/terrapulse/monitor/landuse_yeardex.py registers the landuse Yeardex kind (LANDUSE_CONFIG: radius_km=None, sensor_slugs=(); slot id = bare year), get_years groups the data points into one year record (full category × geography matrix + cited sources + national roll-up), year-slot build_dossier. Stored in a separate data/year_storehouse/ (year axis kept distinct from the event axis), reusing the event_storehouse write + disk-rebuilt-index machinery via base_dir. 3 unit tests.
  • Brick D — slot backfill. DONE 2026-06-16: landuse_yeardex.backfill_and_store() built the 16 year-slot dossiers in one pass. Verified one-slot-per-year (16 files == 16 survey years) and totals reconcile against the base sums at national level across all 16 years (worst gap 7 of 390,000 thousand-acres = 0.002%, pure USDA independent-rounding noise; Total forest-use land matches exactly). Headline US 2017: cropland 390 M, forest-use 622 M, grassland/pasture 659 M, urban 74 M, total land 2.26 B acres — matches USDA's published Major Uses figures. The US Land Use Yeardex is COMPLETE (A+B+C+D) — the first Yeardex kind.
  • First report: deferred. Engine room, not paper mode.
Live Feed