Scope freeze — power_plant Locationdex (9th Locationdex kind)
Frozen 2026-06-28. Source pasted by Mike: EPA eGRID historical-data page (https://www.epa.gov/egrid/historical-egrid-data). Bright-line call reconfirmed by Mike the same day: "yes on locationdex, as always, no on projections and/or models."
What it is
EPA eGRID (Emissions & Generation Resource Integrated Database) is EPA's plant-level inventory of US electric power generation and the air emissions that go with it. For every power plant, each year: how much electricity it put on the grid and how much it emitted.
Slot = a power plant (Locationdex)
A power plant is a fixed place with a stable ID, so it is a Locationdex slot, exactly like the
reservoir-as-place call (reservoir_cyano_water_quality) and the station kinds before it.
- Slot ID:
ORISPL, the DOE/EIA ORIS plant code (stable across years). - The place / permanent character: name, state, latitude, longitude, primary fuel + fuel category, nameplate capacity (MW), balancing authority, eGRID subregion, NERC region. Taken from the plant's most recent year present.
- Embedded annual series: one row per year carrying the plant's MEASURED quantities, the way
the reservoir slot embeds its 1987-2018 series and the
neoCelestialObjectDex slot embeds its close-approach array.
Measured reality vs held out (bright line)
feedback_measured_reality_only. eGRID is a MIX of measured and calculated numbers, and eGRID
stamps every emission value with its provenance, so the split is per-row and clean.
- IN — net electricity generation (
PLNGENAN, metered MWh reported to EIA). - IN — CEMS-measured emissions. Emissions whose eGRID source flag is
EPA/CAMD: a Continuous Emissions Monitoring System is a physical instrument on the smokestack that directly measures CO2, SO2 and NOx (and Hg where monitored) in the flue gas. Carried asco2_cems_tons,so2_cems_tons,nox_cems_tons(short tons) andhg_cems_lbs(pounds). - OUT — emission-factor estimates. Emissions flagged
Estimated using emissions factor(fuel burned x a textbook pollution-per-fuel number) are a calculation, not a measurement. Nulled, never carried. - OUT — eGRID's computed plant-level emission totals, which blend CEMS + estimated units.
Not used; the slot RE-SUMS emissions from the unit sheet over only the
EPA/CAMD-flagged units. The unit-level filter is the load-bearing logic (unit-tested). - OUT — eGRID's derived emission RATES (lb/MWh). A ratio of measured quantities is recomputable downstream from the raw masses; we store the raw masses, not the rate.
By tonnage this keeps most of the signal even though it drops many small units: across the
fleet about 84% of CO2 MASS is CEMS-measured (every big baseload unit carries a monitor), and
the "estimated" units are mostly small/intermittent. The slot carries n_units vs
n_units_cems each year so the measured coverage is explicit. A wind/solar/landfill-gas plant
with metered generation but no monitored stack keeps its generation and nulls the emissions
("data is data", feedback_data_is_data_partial_coverage).
Coverage
- 2018-2022 (5 modern eGRID editions, identical unit-level schema, per-row CEMS source flags verified consistent across all five). ~13,715 plant slots.
- 2017: eGRID published no 2017 edition. Natural gap in the series, not dropped data.
- 1996-2016 (deferred backfill): the historical bundle uses older split / boiler-level
schemas; the early years (1996-2012) lack the modern per-row CEMS flag, so emissions there
cannot be cleanly separated from estimates. The backfill is a follow-up requiring per-era
adapters and, for any year without a per-row flag, keeping metered generation only and
holding emissions out.
data is data: take the clean modern years now, backfill later.
Multi-source cited slot
Universal rule: each eGRID year is its own cited source (epa_egrid_2018 .. epa_egrid_2022);
a plant accretes one row per year, each attributed.
Storage
Locationdex sibling storehouse data/location_storehouse/power_plant/<orispl>.json,
file-per-slot, via the event_storehouse write + disk-rebuilt index machinery
(base_dir=location_storehouse). The build clears the kind dir before rebuild so
renamed/removed slots never linger.
Sanity check
Top CEMS-measured CO2 emitters resolve to the real US heavyweights at the right magnitudes: James H Miller Jr (AL, ~24M short tons CO2 in 2022, the single largest US power-plant emitter), Scherer (GA), Monroe (MI), Gibson (IN), Labadie (MO), Martin Lake (TX), Gen J M Gavin (OH), W A Parish (TX). Miller's low SO2 (~850-1,260 tons against ~20M+ tons CO2) correctly reflects its scrubbers.
Deferred
- 1996-2016 historical backfill (per-era adapters; generation-only where no per-row CEMS flag).
- Heat input (
HTIAN) is mostly EIA-reported rather than CAMD; left out of v1 to keep the bright line crisp around emissions. Could be carried later, clearly tagged by source. - A live edge: eGRID is an annual after-the-fact publication; the current-year edge (if wanted) would come from EPA CAMD hourly CEMS, a separate decision.
- Spatial sweep / cross-match (deferred for every Locationdex kind until one needs it).