Listening for events…

Scope freeze — power_plant Locationdex (9th Locationdex kind)

Frozen 2026-06-28. Source pasted by Mike: EPA eGRID historical-data page (https://www.epa.gov/egrid/historical-egrid-data). Bright-line call reconfirmed by Mike the same day: "yes on locationdex, as always, no on projections and/or models."

What it is

EPA eGRID (Emissions & Generation Resource Integrated Database) is EPA's plant-level inventory of US electric power generation and the air emissions that go with it. For every power plant, each year: how much electricity it put on the grid and how much it emitted.

Slot = a power plant (Locationdex)

A power plant is a fixed place with a stable ID, so it is a Locationdex slot, exactly like the reservoir-as-place call (reservoir_cyano_water_quality) and the station kinds before it.

  • Slot ID: ORISPL, the DOE/EIA ORIS plant code (stable across years).
  • The place / permanent character: name, state, latitude, longitude, primary fuel + fuel category, nameplate capacity (MW), balancing authority, eGRID subregion, NERC region. Taken from the plant's most recent year present.
  • Embedded annual series: one row per year carrying the plant's MEASURED quantities, the way the reservoir slot embeds its 1987-2018 series and the neo CelestialObjectDex slot embeds its close-approach array.

Measured reality vs held out (bright line)

feedback_measured_reality_only. eGRID is a MIX of measured and calculated numbers, and eGRID stamps every emission value with its provenance, so the split is per-row and clean.

  • IN — net electricity generation (PLNGENAN, metered MWh reported to EIA).
  • IN — CEMS-measured emissions. Emissions whose eGRID source flag is EPA/CAMD: a Continuous Emissions Monitoring System is a physical instrument on the smokestack that directly measures CO2, SO2 and NOx (and Hg where monitored) in the flue gas. Carried as co2_cems_tons, so2_cems_tons, nox_cems_tons (short tons) and hg_cems_lbs (pounds).
  • OUT — emission-factor estimates. Emissions flagged Estimated using emissions factor (fuel burned x a textbook pollution-per-fuel number) are a calculation, not a measurement. Nulled, never carried.
  • OUT — eGRID's computed plant-level emission totals, which blend CEMS + estimated units. Not used; the slot RE-SUMS emissions from the unit sheet over only the EPA/CAMD-flagged units. The unit-level filter is the load-bearing logic (unit-tested).
  • OUT — eGRID's derived emission RATES (lb/MWh). A ratio of measured quantities is recomputable downstream from the raw masses; we store the raw masses, not the rate.

By tonnage this keeps most of the signal even though it drops many small units: across the fleet about 84% of CO2 MASS is CEMS-measured (every big baseload unit carries a monitor), and the "estimated" units are mostly small/intermittent. The slot carries n_units vs n_units_cems each year so the measured coverage is explicit. A wind/solar/landfill-gas plant with metered generation but no monitored stack keeps its generation and nulls the emissions ("data is data", feedback_data_is_data_partial_coverage).

Coverage

  • 2018-2022 (5 modern eGRID editions, identical unit-level schema, per-row CEMS source flags verified consistent across all five). ~13,715 plant slots.
  • 2017: eGRID published no 2017 edition. Natural gap in the series, not dropped data.
  • 1996-2016 (deferred backfill): the historical bundle uses older split / boiler-level schemas; the early years (1996-2012) lack the modern per-row CEMS flag, so emissions there cannot be cleanly separated from estimates. The backfill is a follow-up requiring per-era adapters and, for any year without a per-row flag, keeping metered generation only and holding emissions out. data is data: take the clean modern years now, backfill later.

Multi-source cited slot

Universal rule: each eGRID year is its own cited source (epa_egrid_2018 .. epa_egrid_2022); a plant accretes one row per year, each attributed.

Storage

Locationdex sibling storehouse data/location_storehouse/power_plant/<orispl>.json, file-per-slot, via the event_storehouse write + disk-rebuilt index machinery (base_dir=location_storehouse). The build clears the kind dir before rebuild so renamed/removed slots never linger.

Sanity check

Top CEMS-measured CO2 emitters resolve to the real US heavyweights at the right magnitudes: James H Miller Jr (AL, ~24M short tons CO2 in 2022, the single largest US power-plant emitter), Scherer (GA), Monroe (MI), Gibson (IN), Labadie (MO), Martin Lake (TX), Gen J M Gavin (OH), W A Parish (TX). Miller's low SO2 (~850-1,260 tons against ~20M+ tons CO2) correctly reflects its scrubbers.

Deferred

  • 1996-2016 historical backfill (per-era adapters; generation-only where no per-row CEMS flag).
  • Heat input (HTIAN) is mostly EIA-reported rather than CAMD; left out of v1 to keep the bright line crisp around emissions. Could be carried later, clearly tagged by source.
  • A live edge: eGRID is an annual after-the-fact publication; the current-year edge (if wanted) would come from EPA CAMD hourly CEMS, a separate decision.
  • Spatial sweep / cross-match (deferred for every Locationdex kind until one needs it).
Live Feed