PUDL Data Dictionary#

The following data tables have been cleaned and transformed by our ETL process.

out_censusdp1tract__counties#

Noticed any irregularities with this table? File a report.

County-level summarized US Census demographic information, along with a geometry describing county boundaries.

Most-recent data:: 2010
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: Census DP1 – Profile of General Demographic Characteristics
Primary key:: county_id_fips

Usage Warnings#

The original data is no longer being collected or reported in this way.

Additional Details#

Based on the 2010 Census. Used as a convenient source of geospatial information within PUDL, but should not be treated as authoritative.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_censusdp1tract__counties.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_censusdp1tract__counties.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_censusdp1tract__counties.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_censusdp1tract__counties.parquet"
)

Columns#

Field Name	Type	Description
county	string	County name.
county_id_fips	string	County ID from the Federal Information Processing Standard Publication 6-4.
dp0010001	integer	DPSF1. Sex and age - Total population
dp0010002	integer	DPSF1. Sex and age - Total population under 5 years
dp0010003	integer	DPSF1. Sex and age - Total population 5 to 9 years
dp0010004	integer	DPSF1. Sex and age - Total population 10 to 14 years
dp0010005	integer	DPSF1. Sex and age - Total population 15 to 19 years
dp0010006	integer	DPSF1. Sex and age - Total population 20 to 24 years
dp0010007	integer	DPSF1. Sex and age - Total population 25 to 29 years
dp0010008	integer	DPSF1. Sex and age - Total population 30 to 34 years
dp0010009	integer	DPSF1. Sex and age - Total population 35 to 39 years
dp0010010	integer	DPSF1. Sex and age - Total population 40 to 44 years
dp0010011	integer	DPSF1. Sex and age - Total population 45 to 49 years
dp0010012	integer	DPSF1. Sex and age - Total population 50 to 54 years
dp0010013	integer	DPSF1. Sex and age - Total population 55 to 59 years
dp0010014	integer	DPSF1. Sex and age - Total population 60 to 64 years
dp0010015	integer	DPSF1. Sex and age - Total population 65 to 69 years
dp0010016	integer	DPSF1. Sex and age - Total population 70 to 74 years
dp0010017	integer	DPSF1. Sex and age - Total population 75 to 79 years
dp0010018	integer	DPSF1. Sex and age - Total population 80 to 84 years
dp0010019	integer	DPSF1. Sex and age - Total population 85 years and over
dp0010020	integer	DPSF1. Sex and age - Male population total
dp0010021	integer	DPSF1. Sex and age - Male population under 5 years
dp0010022	integer	DPSF1. Sex and age - Male population 5 to 9 years
dp0010023	integer	DPSF1. Sex and age - Male population 10 to 14 years
dp0010024	integer	DPSF1. Sex and age - Male population 15 to 19 years
dp0010025	integer	DPSF1. Sex and age - Male population 20 to 24 years
dp0010026	integer	DPSF1. Sex and age - Male population 25 to 29 years
dp0010027	integer	DPSF1. Sex and age - Male population 30 to 34 years
dp0010028	integer	DPSF1. Sex and age - Male population 35 to 39 years
dp0010029	integer	DPSF1. Sex and age - Male population 40 to 44 years
dp0010030	integer	DPSF1. Sex and age - Male population 45 to 49 years
dp0010031	integer	DPSF1. Sex and age - Male population 50 to 54 years
dp0010032	integer	DPSF1. Sex and age - Male population 55 to 59 years
dp0010033	integer	DPSF1. Sex and age - Male population 60 to 64 years
dp0010034	integer	DPSF1. Sex and age - Male population 65 to 69 years
dp0010035	integer	DPSF1. Sex and age - Male population 70 to 74 years
dp0010036	integer	DPSF1. Sex and age - Male population 75 to 79 years
dp0010037	integer	DPSF1. Sex and age - Male population 80 to 84 years
dp0010038	integer	DPSF1. Sex and age - Male population 85 years and over
dp0010039	integer	DPSF1. Sex and age - Female population total
dp0010040	integer	DPSF1. Sex and age - Female population under 5 years
dp0010041	integer	DPSF1. Sex and age - Female population 5 to 9 years
dp0010042	integer	DPSF1. Sex and age - Female population 10 to 14 years
dp0010043	integer	DPSF1. Sex and age - Female population 15 to 19 years
dp0010044	integer	DPSF1. Sex and age - Female population 20 to 24 years
dp0010045	integer	DPSF1. Sex and age - Female population 25 to 29 years
dp0010046	integer	DPSF1. Sex and age - Female population 30 to 34 years
dp0010047	integer	DPSF1. Sex and age - Female population 35 to 39 years
dp0010048	integer	DPSF1. Sex and age - Female population 40 to 44 years
dp0010049	integer	DPSF1. Sex and age - Female population 45 to 49 years
dp0010050	integer	DPSF1. Sex and age - Female population 50 to 54 years
dp0010051	integer	DPSF1. Sex and age - Female population 55 to 59 years
dp0010052	integer	DPSF1. Sex and age - Female population 60 to 64 years
dp0010053	integer	DPSF1. Sex and age - Female population 65 to 69 years
dp0010054	integer	DPSF1. Sex and age - Female population 70 to 74 years
dp0010055	integer	DPSF1. Sex and age - Female population 75 to 79 years
dp0010056	integer	DPSF1. Sex and age - Female population 80 to 84 years
dp0010057	integer	DPSF1. Sex and age - Female population 85 years and over
dp0020001	number	DPSF2. Median age by sex - Both sexes
dp0020002	number	DPSF2. Median age by sex - Male
dp0020003	number	DPSF2. Median age by sex - Female
dp0030001	integer	DPSF3. Sex for population 16 years and over - Total
dp0030002	integer	DPSF3. Sex for population 16 years and over - Male
dp0030003	integer	DPSF3. Sex for population 16 years and over - Female
dp0040001	integer	DPSF4. Sex for population 18 years and over - Total
dp0040002	integer	DPSF4. Sex for population 18 years and over - Male
dp0040003	integer	DPSF4. Sex for population 18 years and over - Female
dp0050001	integer	DPSF5. Sex for population 21 years and over - Total
dp0050002	integer	DPSF5. Sex for population 21 years and over - Male
dp0050003	integer	DPSF5. Sex for population 21 years and over - Female
dp0060001	integer	DPSF6. Sex for population 62 years and over - Total
dp0060002	integer	DPSF6. Sex for population 62 years and over - Male
dp0060003	integer	DPSF6. Sex for population 62 years and over - Female
dp0070001	integer	DPSF7. Sex for population 65 years and over - Total
dp0070002	integer	DPSF7. Sex for population 65 years and over - Male
dp0070003	integer	DPSF7. Sex for population 65 years and over - Female
dp0080001	integer	DPSF8. Race - Total population
dp0080002	integer	DPSF8. Race - Population of one race
dp0080003	integer	DPSF8. Race - White
dp0080004	integer	DPSF8. Race - Black or African American
dp0080005	integer	DPSF8. Race - American Indian and Alaska Native
dp0080006	integer	DPSF8. Race - Asian total
dp0080007	integer	DPSF8. Race - Asian Indian
dp0080008	integer	DPSF8. Race - Chinese
dp0080009	integer	DPSF8. Race - Filipino
dp0080010	integer	DPSF8. Race - Japanese
dp0080011	integer	DPSF8. Race - Korean
dp0080012	integer	DPSF8. Race - Vietnamese
dp0080013	integer	DPSF8. Race - Other Asian
dp0080014	integer	DPSF8. Race - Native Hawaiian and Other Pacific Islander total
dp0080015	integer	DPSF8. Race - Native Hawaiian
dp0080016	integer	DPSF8. Race - Guamanian or Chamorro
dp0080017	integer	DPSF8. Race - Samoan
dp0080018	integer	DPSF8. Race - Other Pacific Islander
dp0080019	integer	DPSF8. Race - Some Other Race
dp0080020	integer	DPSF8. Race - Population of Two or More Races
dp0080021	integer	DPSF8. Race - White; American Indian and Alaska Native
dp0080022	integer	DPSF8. Race - White; Asian
dp0080023	integer	DPSF8. Race - White; Black or African American
dp0080024	integer	DPSF8. Race - White; Some Other Race
dp0090001	integer	DPSF9. Race (total races tallied) - White alone or in combination with one or more other races
dp0090002	integer	DPSF9. Race (total races tallied) - Black or African American alone or in combination with one or more other races
dp0090003	integer	DPSF9. Race (total races tallied) - American Indian and Alaska Native alone or in combination with one or more other races
dp0090004	integer	DPSF9. Race (total races tallied) - Asian alone or in combination with one or more other races
dp0090005	integer	DPSF9. Race (total races tallied) - Native Hawaiian and Other Pacific Islander alone or in combination with one or more other races
dp0090006	integer	DPSF9. Race (total races tallied) - Some Other Race alone or in combination with one or more other races
dp0100001	integer	DPSF10. Hispanic or Latino by specific origin - Total population
dp0100002	integer	DPSF10. Hispanic or Latino by specific origin - Hispanic or Latino (of any race)
dp0100003	integer	DPSF10. Hispanic or Latino by specific origin - Mexican
dp0100004	integer	DPSF10. Hispanic or Latino by specific origin - Puerto Rican
dp0100005	integer	DPSF10. Hispanic or Latino by specific origin - Cuban
dp0100006	integer	DPSF10. Hispanic or Latino by specific origin - Other Hispanic or Latino
dp0100007	integer	DPSF10. Hispanic or Latino by specific origin - Not Hispanic or Latino
dp0110001	integer	DPSF11. Hispanic or Latino and race - Total population
dp0110002	integer	DPSF11. Hispanic or Latino and race - Hispanic or Latino total
dp0110003	integer	DPSF11. Hispanic or Latino and race - Hispanic or Latino: White alone
dp0110004	integer	DPSF11. Hispanic or Latino and race - Hispanic or Latino: Black or African American alone
dp0110005	integer	DPSF11. Hispanic or Latino and race - Hispanic or Latino: American Indian and Alaska Native alone
dp0110006	integer	DPSF11. Hispanic or Latino and race - Hispanic or Latino: Asian alone
dp0110007	integer	DPSF11. Hispanic or Latino and race - Hispanic or Latino: Native Hawaiian and Other Pacific Islander alone
dp0110008	integer	DPSF11. Hispanic or Latino and race - Hispanic or Latino: Some Other Race alone
dp0110009	integer	DPSF11. Hispanic or Latino and race - Hispanic or Latino: Two or More Races
dp0110010	integer	DPSF11. Hispanic or Latino and race - Not Hispanic or Latino total
dp0110011	integer	DPSF11. Hispanic or Latino and race - Not Hispanic or Latino: White alone
dp0110012	integer	DPSF11. Hispanic or Latino and race - Not Hispanic or Latino: Black or African American alone
dp0110013	integer	DPSF11. Hispanic or Latino and race - Not Hispanic or Latino: American Indian and Alaska Native alone
dp0110014	integer	DPSF11. Hispanic or Latino and race - Not Hispanic or Latino: Asian alone
dp0110015	integer	DPSF11. Hispanic or Latino and race - Not Hispanic or Latino: Native Hawaiian and Other Pacific Islander alone
dp0110016	integer	DPSF11. Hispanic or Latino and race - Not Hispanic or Latino: Some Other Race alone
dp0110017	integer	DPSF11. Hispanic or Latino and race - Not Hispanic or Latino: Two or More Races
dp0120001	integer	DPSF12. Relationship - Total population
dp0120002	integer	DPSF12. Relationship - In households
dp0120003	integer	DPSF12. Relationship - Householder
dp0120004	integer	DPSF12. Relationship - Spouse
dp0120005	integer	DPSF12. Relationship - Child
dp0120006	integer	DPSF12. Relationship - Own child under 18 years
dp0120007	integer	DPSF12. Relationship - Other relatives
dp0120008	integer	DPSF12. Relationship - Other relatives under 18 years
dp0120009	integer	DPSF12. Relationship - Other relatives 65 years and over
dp0120010	integer	DPSF12. Relationship - Nonrelatives
dp0120011	integer	DPSF12. Relationship - Nonrelatives under 18 years
dp0120012	integer	DPSF12. Relationship - Nonrelatives 65 years and over
dp0120013	integer	DPSF12. Relationship - Unmarried partner
dp0120014	integer	DPSF12. Relationship - In group quarters
dp0120015	integer	DPSF12. Relationship - Institutionalized population
dp0120016	integer	DPSF12. Relationship - Institutionalized population: Male
dp0120017	integer	DPSF12. Relationship - Institutionalized population: Female
dp0120018	integer	DPSF12. Relationship - Noninstitutionalized population
dp0120019	integer	DPSF12. Relationship - Noninstitutionalized population: Male
dp0120020	integer	DPSF12. Relationship - Noninstitutionalized population: Female
dp0130001	integer	DPSF13. Households by type - Total households
dp0130002	integer	DPSF13. Households by type - Family households (families)
dp0130003	integer	DPSF13. Households by type - Family households with own children under 18 years
dp0130004	integer	DPSF13. Households by type - Husband-wife family
dp0130005	integer	DPSF13. Households by type - Husband-wife family with own children under 18 years
dp0130006	integer	DPSF13. Households by type - Male householder, no wife present
dp0130007	integer	DPSF13. Households by type - Male householder, no wife present, with own children under 18 years
dp0130008	integer	DPSF13. Households by type - Female householder, no husband present
dp0130009	integer	DPSF13. Households by type - Female householder, no husband present, with own children under 18 years
dp0130010	integer	DPSF13. Households by type - Nonfamily households
dp0130011	integer	DPSF13. Households by type - Householder living alone
dp0130012	integer	DPSF13. Households by type - Householder living alone: Male
dp0130013	integer	DPSF13. Households by type - Householder living alone: Male 65 years and over
dp0130014	integer	DPSF13. Households by type - Householder living alone: Female
dp0130015	integer	DPSF13. Households by type - Householder living alone: Female 65 years and over
dp0140001	integer	DPSF14. Households with individuals under 18 years - Total
dp0150001	integer	DPSF15. Households with individuals 65 years and over - Total
dp0160001	number	DPSF16. Average household size - Average household size
dp0170001	number	DPSF17. Average family size - Average family size
dp0180001	integer	DPSF18. Housing occupancy - Total housing units
dp0180002	integer	DPSF18. Housing occupancy - Occupied housing units
dp0180003	integer	DPSF18. Housing occupancy - Vacant housing units
dp0180004	integer	DPSF18. Housing occupancy - Vacant housing units for rent
dp0180005	integer	DPSF18. Housing occupancy - Vacant housing units rented, not occupied
dp0180006	integer	DPSF18. Housing occupancy - Vacant housing units for sale only
dp0180007	integer	DPSF18. Housing occupancy - Vacant housing units sold, not occupied
dp0180008	integer	DPSF18. Housing occupancy - Vacant housing units for seasonal, recreational, or occasional use
dp0180009	integer	DPSF18. Housing occupancy - All other vacant housing units
dp0190001	number	DPSF19. Homeowner vacancy rate - Homeowner vacancy rate (percent)
dp0200001	number	DPSF20. Rental vacancy rate - Rental vacancy rate (percent)
dp0210001	integer	DPSF21. Housing tenure - Total occupied housing units
dp0210002	integer	DPSF21. Housing tenure - Owner-occupied housing units
dp0210003	integer	DPSF21. Housing tenure - Renter-occupied housing units
dp0220001	integer	DPSF22. Population in occupied housing units by tenure - Owner-occupied housing units
dp0220002	integer	DPSF22. Population in occupied housing units by tenure - Renter-occupied housing units
dp0230001	number	DPSF23. Average household size by tenure - Owner occupied
dp0230002	number	DPSF23. Average household size by tenure - Renter occupied
functional_status_code_census	string	The functional status (FUNCSTAT) code defines the current functional status of a geographic entity. These codes can be found in the TIGER/Line products, gazetteer files, and other products.
geometry	geometry	Geospatial representation of the feature.
internal_point_latitude	number	Internal point latitude in decimal degrees.
internal_point_longitude	number	Internal point longitude in decimal degrees.
land_area	number	Land area in square meters.
shape_area	number	Area of the feature in square degrees.
shape_length	number	Length of the feature’s perimeter in degrees.
water_area	number	Water area in square meters.

out_censusdp1tract__states#

Noticed any irregularities with this table? File a report.

State-level summarized US Census demographic information, along with a geometry describing state boundaries.

Most-recent data:: 2010
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: Census DP1 – Profile of General Demographic Characteristics
Primary key:: state_id_fips

Usage Warnings#

The original data is no longer being collected or reported in this way.

Additional Details#

Based on the 2010 Census. Used as a convenient source of geospatial information within PUDL, but should not be treated as authoritative.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_censusdp1tract__states.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_censusdp1tract__states.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_censusdp1tract__states.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_censusdp1tract__states.parquet"
)

Columns#

Field Name	Type	Description
dp0010001	integer	DPSF1. Sex and age - Total population
dp0010002	integer	DPSF1. Sex and age - Total population under 5 years
dp0010003	integer	DPSF1. Sex and age - Total population 5 to 9 years
dp0010004	integer	DPSF1. Sex and age - Total population 10 to 14 years
dp0010005	integer	DPSF1. Sex and age - Total population 15 to 19 years
dp0010006	integer	DPSF1. Sex and age - Total population 20 to 24 years
dp0010007	integer	DPSF1. Sex and age - Total population 25 to 29 years
dp0010008	integer	DPSF1. Sex and age - Total population 30 to 34 years
dp0010009	integer	DPSF1. Sex and age - Total population 35 to 39 years
dp0010010	integer	DPSF1. Sex and age - Total population 40 to 44 years
dp0010011	integer	DPSF1. Sex and age - Total population 45 to 49 years
dp0010012	integer	DPSF1. Sex and age - Total population 50 to 54 years
dp0010013	integer	DPSF1. Sex and age - Total population 55 to 59 years
dp0010014	integer	DPSF1. Sex and age - Total population 60 to 64 years
dp0010015	integer	DPSF1. Sex and age - Total population 65 to 69 years
dp0010016	integer	DPSF1. Sex and age - Total population 70 to 74 years
dp0010017	integer	DPSF1. Sex and age - Total population 75 to 79 years
dp0010018	integer	DPSF1. Sex and age - Total population 80 to 84 years
dp0010019	integer	DPSF1. Sex and age - Total population 85 years and over
dp0010020	integer	DPSF1. Sex and age - Male population total
dp0010021	integer	DPSF1. Sex and age - Male population under 5 years
dp0010022	integer	DPSF1. Sex and age - Male population 5 to 9 years
dp0010023	integer	DPSF1. Sex and age - Male population 10 to 14 years
dp0010024	integer	DPSF1. Sex and age - Male population 15 to 19 years
dp0010025	integer	DPSF1. Sex and age - Male population 20 to 24 years
dp0010026	integer	DPSF1. Sex and age - Male population 25 to 29 years
dp0010027	integer	DPSF1. Sex and age - Male population 30 to 34 years
dp0010028	integer	DPSF1. Sex and age - Male population 35 to 39 years
dp0010029	integer	DPSF1. Sex and age - Male population 40 to 44 years
dp0010030	integer	DPSF1. Sex and age - Male population 45 to 49 years
dp0010031	integer	DPSF1. Sex and age - Male population 50 to 54 years
dp0010032	integer	DPSF1. Sex and age - Male population 55 to 59 years
dp0010033	integer	DPSF1. Sex and age - Male population 60 to 64 years
dp0010034	integer	DPSF1. Sex and age - Male population 65 to 69 years
dp0010035	integer	DPSF1. Sex and age - Male population 70 to 74 years
dp0010036	integer	DPSF1. Sex and age - Male population 75 to 79 years
dp0010037	integer	DPSF1. Sex and age - Male population 80 to 84 years
dp0010038	integer	DPSF1. Sex and age - Male population 85 years and over
dp0010039	integer	DPSF1. Sex and age - Female population total
dp0010040	integer	DPSF1. Sex and age - Female population under 5 years
dp0010041	integer	DPSF1. Sex and age - Female population 5 to 9 years
dp0010042	integer	DPSF1. Sex and age - Female population 10 to 14 years
dp0010043	integer	DPSF1. Sex and age - Female population 15 to 19 years
dp0010044	integer	DPSF1. Sex and age - Female population 20 to 24 years
dp0010045	integer	DPSF1. Sex and age - Female population 25 to 29 years
dp0010046	integer	DPSF1. Sex and age - Female population 30 to 34 years
dp0010047	integer	DPSF1. Sex and age - Female population 35 to 39 years
dp0010048	integer	DPSF1. Sex and age - Female population 40 to 44 years
dp0010049	integer	DPSF1. Sex and age - Female population 45 to 49 years
dp0010050	integer	DPSF1. Sex and age - Female population 50 to 54 years
dp0010051	integer	DPSF1. Sex and age - Female population 55 to 59 years
dp0010052	integer	DPSF1. Sex and age - Female population 60 to 64 years
dp0010053	integer	DPSF1. Sex and age - Female population 65 to 69 years
dp0010054	integer	DPSF1. Sex and age - Female population 70 to 74 years
dp0010055	integer	DPSF1. Sex and age - Female population 75 to 79 years
dp0010056	integer	DPSF1. Sex and age - Female population 80 to 84 years
dp0010057	integer	DPSF1. Sex and age - Female population 85 years and over
dp0020001	number	DPSF2. Median age by sex - Both sexes
dp0020002	number	DPSF2. Median age by sex - Male
dp0020003	number	DPSF2. Median age by sex - Female
dp0030001	integer	DPSF3. Sex for population 16 years and over - Total
dp0030002	integer	DPSF3. Sex for population 16 years and over - Male
dp0030003	integer	DPSF3. Sex for population 16 years and over - Female
dp0040001	integer	DPSF4. Sex for population 18 years and over - Total
dp0040002	integer	DPSF4. Sex for population 18 years and over - Male
dp0040003	integer	DPSF4. Sex for population 18 years and over - Female
dp0050001	integer	DPSF5. Sex for population 21 years and over - Total
dp0050002	integer	DPSF5. Sex for population 21 years and over - Male
dp0050003	integer	DPSF5. Sex for population 21 years and over - Female
dp0060001	integer	DPSF6. Sex for population 62 years and over - Total
dp0060002	integer	DPSF6. Sex for population 62 years and over - Male
dp0060003	integer	DPSF6. Sex for population 62 years and over - Female
dp0070001	integer	DPSF7. Sex for population 65 years and over - Total
dp0070002	integer	DPSF7. Sex for population 65 years and over - Male
dp0070003	integer	DPSF7. Sex for population 65 years and over - Female
dp0080001	integer	DPSF8. Race - Total population
dp0080002	integer	DPSF8. Race - Population of one race
dp0080003	integer	DPSF8. Race - White
dp0080004	integer	DPSF8. Race - Black or African American
dp0080005	integer	DPSF8. Race - American Indian and Alaska Native
dp0080006	integer	DPSF8. Race - Asian total
dp0080007	integer	DPSF8. Race - Asian Indian
dp0080008	integer	DPSF8. Race - Chinese
dp0080009	integer	DPSF8. Race - Filipino
dp0080010	integer	DPSF8. Race - Japanese
dp0080011	integer	DPSF8. Race - Korean
dp0080012	integer	DPSF8. Race - Vietnamese
dp0080013	integer	DPSF8. Race - Other Asian
dp0080014	integer	DPSF8. Race - Native Hawaiian and Other Pacific Islander total
dp0080015	integer	DPSF8. Race - Native Hawaiian
dp0080016	integer	DPSF8. Race - Guamanian or Chamorro
dp0080017	integer	DPSF8. Race - Samoan
dp0080018	integer	DPSF8. Race - Other Pacific Islander
dp0080019	integer	DPSF8. Race - Some Other Race
dp0080020	integer	DPSF8. Race - Population of Two or More Races
dp0080021	integer	DPSF8. Race - White; American Indian and Alaska Native
dp0080022	integer	DPSF8. Race - White; Asian
dp0080023	integer	DPSF8. Race - White; Black or African American
dp0080024	integer	DPSF8. Race - White; Some Other Race
dp0090001	integer	DPSF9. Race (total races tallied) - White alone or in combination with one or more other races
dp0090002	integer	DPSF9. Race (total races tallied) - Black or African American alone or in combination with one or more other races
dp0090003	integer	DPSF9. Race (total races tallied) - American Indian and Alaska Native alone or in combination with one or more other races
dp0090004	integer	DPSF9. Race (total races tallied) - Asian alone or in combination with one or more other races
dp0090005	integer	DPSF9. Race (total races tallied) - Native Hawaiian and Other Pacific Islander alone or in combination with one or more other races
dp0090006	integer	DPSF9. Race (total races tallied) - Some Other Race alone or in combination with one or more other races
dp0100001	integer	DPSF10. Hispanic or Latino by specific origin - Total population
dp0100002	integer	DPSF10. Hispanic or Latino by specific origin - Hispanic or Latino (of any race)
dp0100003	integer	DPSF10. Hispanic or Latino by specific origin - Mexican
dp0100004	integer	DPSF10. Hispanic or Latino by specific origin - Puerto Rican
dp0100005	integer	DPSF10. Hispanic or Latino by specific origin - Cuban
dp0100006	integer	DPSF10. Hispanic or Latino by specific origin - Other Hispanic or Latino
dp0100007	integer	DPSF10. Hispanic or Latino by specific origin - Not Hispanic or Latino
dp0110001	integer	DPSF11. Hispanic or Latino and race - Total population
dp0110002	integer	DPSF11. Hispanic or Latino and race - Hispanic or Latino total
dp0110003	integer	DPSF11. Hispanic or Latino and race - Hispanic or Latino: White alone
dp0110004	integer	DPSF11. Hispanic or Latino and race - Hispanic or Latino: Black or African American alone
dp0110005	integer	DPSF11. Hispanic or Latino and race - Hispanic or Latino: American Indian and Alaska Native alone
dp0110006	integer	DPSF11. Hispanic or Latino and race - Hispanic or Latino: Asian alone
dp0110007	integer	DPSF11. Hispanic or Latino and race - Hispanic or Latino: Native Hawaiian and Other Pacific Islander alone
dp0110008	integer	DPSF11. Hispanic or Latino and race - Hispanic or Latino: Some Other Race alone
dp0110009	integer	DPSF11. Hispanic or Latino and race - Hispanic or Latino: Two or More Races
dp0110010	integer	DPSF11. Hispanic or Latino and race - Not Hispanic or Latino total
dp0110011	integer	DPSF11. Hispanic or Latino and race - Not Hispanic or Latino: White alone
dp0110012	integer	DPSF11. Hispanic or Latino and race - Not Hispanic or Latino: Black or African American alone
dp0110013	integer	DPSF11. Hispanic or Latino and race - Not Hispanic or Latino: American Indian and Alaska Native alone
dp0110014	integer	DPSF11. Hispanic or Latino and race - Not Hispanic or Latino: Asian alone
dp0110015	integer	DPSF11. Hispanic or Latino and race - Not Hispanic or Latino: Native Hawaiian and Other Pacific Islander alone
dp0110016	integer	DPSF11. Hispanic or Latino and race - Not Hispanic or Latino: Some Other Race alone
dp0110017	integer	DPSF11. Hispanic or Latino and race - Not Hispanic or Latino: Two or More Races
dp0120001	integer	DPSF12. Relationship - Total population
dp0120002	integer	DPSF12. Relationship - In households
dp0120003	integer	DPSF12. Relationship - Householder
dp0120004	integer	DPSF12. Relationship - Spouse
dp0120005	integer	DPSF12. Relationship - Child
dp0120006	integer	DPSF12. Relationship - Own child under 18 years
dp0120007	integer	DPSF12. Relationship - Other relatives
dp0120008	integer	DPSF12. Relationship - Other relatives under 18 years
dp0120009	integer	DPSF12. Relationship - Other relatives 65 years and over
dp0120010	integer	DPSF12. Relationship - Nonrelatives
dp0120011	integer	DPSF12. Relationship - Nonrelatives under 18 years
dp0120012	integer	DPSF12. Relationship - Nonrelatives 65 years and over
dp0120013	integer	DPSF12. Relationship - Unmarried partner
dp0120014	integer	DPSF12. Relationship - In group quarters
dp0120015	integer	DPSF12. Relationship - Institutionalized population
dp0120016	integer	DPSF12. Relationship - Institutionalized population: Male
dp0120017	integer	DPSF12. Relationship - Institutionalized population: Female
dp0120018	integer	DPSF12. Relationship - Noninstitutionalized population
dp0120019	integer	DPSF12. Relationship - Noninstitutionalized population: Male
dp0120020	integer	DPSF12. Relationship - Noninstitutionalized population: Female
dp0130001	integer	DPSF13. Households by type - Total households
dp0130002	integer	DPSF13. Households by type - Family households (families)
dp0130003	integer	DPSF13. Households by type - Family households with own children under 18 years
dp0130004	integer	DPSF13. Households by type - Husband-wife family
dp0130005	integer	DPSF13. Households by type - Husband-wife family with own children under 18 years
dp0130006	integer	DPSF13. Households by type - Male householder, no wife present
dp0130007	integer	DPSF13. Households by type - Male householder, no wife present, with own children under 18 years
dp0130008	integer	DPSF13. Households by type - Female householder, no husband present
dp0130009	integer	DPSF13. Households by type - Female householder, no husband present, with own children under 18 years
dp0130010	integer	DPSF13. Households by type - Nonfamily households
dp0130011	integer	DPSF13. Households by type - Householder living alone
dp0130012	integer	DPSF13. Households by type - Householder living alone: Male
dp0130013	integer	DPSF13. Households by type - Householder living alone: Male 65 years and over
dp0130014	integer	DPSF13. Households by type - Householder living alone: Female
dp0130015	integer	DPSF13. Households by type - Householder living alone: Female 65 years and over
dp0140001	integer	DPSF14. Households with individuals under 18 years - Total
dp0150001	integer	DPSF15. Households with individuals 65 years and over - Total
dp0160001	number	DPSF16. Average household size - Average household size
dp0170001	number	DPSF17. Average family size - Average family size
dp0180001	integer	DPSF18. Housing occupancy - Total housing units
dp0180002	integer	DPSF18. Housing occupancy - Occupied housing units
dp0180003	integer	DPSF18. Housing occupancy - Vacant housing units
dp0180004	integer	DPSF18. Housing occupancy - Vacant housing units for rent
dp0180005	integer	DPSF18. Housing occupancy - Vacant housing units rented, not occupied
dp0180006	integer	DPSF18. Housing occupancy - Vacant housing units for sale only
dp0180007	integer	DPSF18. Housing occupancy - Vacant housing units sold, not occupied
dp0180008	integer	DPSF18. Housing occupancy - Vacant housing units for seasonal, recreational, or occasional use
dp0180009	integer	DPSF18. Housing occupancy - All other vacant housing units
dp0190001	number	DPSF19. Homeowner vacancy rate - Homeowner vacancy rate (percent)
dp0200001	number	DPSF20. Rental vacancy rate - Rental vacancy rate (percent)
dp0210001	integer	DPSF21. Housing tenure - Total occupied housing units
dp0210002	integer	DPSF21. Housing tenure - Owner-occupied housing units
dp0210003	integer	DPSF21. Housing tenure - Renter-occupied housing units
dp0220001	integer	DPSF22. Population in occupied housing units by tenure - Owner-occupied housing units
dp0220002	integer	DPSF22. Population in occupied housing units by tenure - Renter-occupied housing units
dp0230001	number	DPSF23. Average household size by tenure - Owner occupied
dp0230002	number	DPSF23. Average household size by tenure - Renter occupied
geometry	geometry	Geospatial representation of the feature.
internal_point_latitude	number	Internal point latitude in decimal degrees.
internal_point_longitude	number	Internal point longitude in decimal degrees.
land_area	number	Land area in square meters.
shape_area	number	Area of the feature in square degrees.
shape_length	number	Length of the feature’s perimeter in degrees.
state	string	Two letter US state abbreviation.
state_id_fips	string	Two digit state FIPS code.
state_name	string	Full name of the state.
water_area	number	Water area in square meters.

out_censusdp1tract__tracts#

Noticed any irregularities with this table? File a report.

Census tract-level summarized US Census demographic information, along with a geometry describing tract boundaries.

Most-recent data:: 2010
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: Census DP1 – Profile of General Demographic Characteristics
Primary key:: tract_id_fips

Usage Warnings#

The original data is no longer being collected or reported in this way.

Additional Details#

Based on the 2010 Census. Used as a convenient source of geospatial information within PUDL, but should not be treated as authoritative.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_censusdp1tract__tracts.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_censusdp1tract__tracts.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_censusdp1tract__tracts.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_censusdp1tract__tracts.parquet"
)

Columns#

Field Name	Type	Description
dp0010001	integer	DPSF1. Sex and age - Total population
dp0010002	integer	DPSF1. Sex and age - Total population under 5 years
dp0010003	integer	DPSF1. Sex and age - Total population 5 to 9 years
dp0010004	integer	DPSF1. Sex and age - Total population 10 to 14 years
dp0010005	integer	DPSF1. Sex and age - Total population 15 to 19 years
dp0010006	integer	DPSF1. Sex and age - Total population 20 to 24 years
dp0010007	integer	DPSF1. Sex and age - Total population 25 to 29 years
dp0010008	integer	DPSF1. Sex and age - Total population 30 to 34 years
dp0010009	integer	DPSF1. Sex and age - Total population 35 to 39 years
dp0010010	integer	DPSF1. Sex and age - Total population 40 to 44 years
dp0010011	integer	DPSF1. Sex and age - Total population 45 to 49 years
dp0010012	integer	DPSF1. Sex and age - Total population 50 to 54 years
dp0010013	integer	DPSF1. Sex and age - Total population 55 to 59 years
dp0010014	integer	DPSF1. Sex and age - Total population 60 to 64 years
dp0010015	integer	DPSF1. Sex and age - Total population 65 to 69 years
dp0010016	integer	DPSF1. Sex and age - Total population 70 to 74 years
dp0010017	integer	DPSF1. Sex and age - Total population 75 to 79 years
dp0010018	integer	DPSF1. Sex and age - Total population 80 to 84 years
dp0010019	integer	DPSF1. Sex and age - Total population 85 years and over
dp0010020	integer	DPSF1. Sex and age - Male population total
dp0010021	integer	DPSF1. Sex and age - Male population under 5 years
dp0010022	integer	DPSF1. Sex and age - Male population 5 to 9 years
dp0010023	integer	DPSF1. Sex and age - Male population 10 to 14 years
dp0010024	integer	DPSF1. Sex and age - Male population 15 to 19 years
dp0010025	integer	DPSF1. Sex and age - Male population 20 to 24 years
dp0010026	integer	DPSF1. Sex and age - Male population 25 to 29 years
dp0010027	integer	DPSF1. Sex and age - Male population 30 to 34 years
dp0010028	integer	DPSF1. Sex and age - Male population 35 to 39 years
dp0010029	integer	DPSF1. Sex and age - Male population 40 to 44 years
dp0010030	integer	DPSF1. Sex and age - Male population 45 to 49 years
dp0010031	integer	DPSF1. Sex and age - Male population 50 to 54 years
dp0010032	integer	DPSF1. Sex and age - Male population 55 to 59 years
dp0010033	integer	DPSF1. Sex and age - Male population 60 to 64 years
dp0010034	integer	DPSF1. Sex and age - Male population 65 to 69 years
dp0010035	integer	DPSF1. Sex and age - Male population 70 to 74 years
dp0010036	integer	DPSF1. Sex and age - Male population 75 to 79 years
dp0010037	integer	DPSF1. Sex and age - Male population 80 to 84 years
dp0010038	integer	DPSF1. Sex and age - Male population 85 years and over
dp0010039	integer	DPSF1. Sex and age - Female population total
dp0010040	integer	DPSF1. Sex and age - Female population under 5 years
dp0010041	integer	DPSF1. Sex and age - Female population 5 to 9 years
dp0010042	integer	DPSF1. Sex and age - Female population 10 to 14 years
dp0010043	integer	DPSF1. Sex and age - Female population 15 to 19 years
dp0010044	integer	DPSF1. Sex and age - Female population 20 to 24 years
dp0010045	integer	DPSF1. Sex and age - Female population 25 to 29 years
dp0010046	integer	DPSF1. Sex and age - Female population 30 to 34 years
dp0010047	integer	DPSF1. Sex and age - Female population 35 to 39 years
dp0010048	integer	DPSF1. Sex and age - Female population 40 to 44 years
dp0010049	integer	DPSF1. Sex and age - Female population 45 to 49 years
dp0010050	integer	DPSF1. Sex and age - Female population 50 to 54 years
dp0010051	integer	DPSF1. Sex and age - Female population 55 to 59 years
dp0010052	integer	DPSF1. Sex and age - Female population 60 to 64 years
dp0010053	integer	DPSF1. Sex and age - Female population 65 to 69 years
dp0010054	integer	DPSF1. Sex and age - Female population 70 to 74 years
dp0010055	integer	DPSF1. Sex and age - Female population 75 to 79 years
dp0010056	integer	DPSF1. Sex and age - Female population 80 to 84 years
dp0010057	integer	DPSF1. Sex and age - Female population 85 years and over
dp0020001	number	DPSF2. Median age by sex - Both sexes
dp0020002	number	DPSF2. Median age by sex - Male
dp0020003	number	DPSF2. Median age by sex - Female
dp0030001	integer	DPSF3. Sex for population 16 years and over - Total
dp0030002	integer	DPSF3. Sex for population 16 years and over - Male
dp0030003	integer	DPSF3. Sex for population 16 years and over - Female
dp0040001	integer	DPSF4. Sex for population 18 years and over - Total
dp0040002	integer	DPSF4. Sex for population 18 years and over - Male
dp0040003	integer	DPSF4. Sex for population 18 years and over - Female
dp0050001	integer	DPSF5. Sex for population 21 years and over - Total
dp0050002	integer	DPSF5. Sex for population 21 years and over - Male
dp0050003	integer	DPSF5. Sex for population 21 years and over - Female
dp0060001	integer	DPSF6. Sex for population 62 years and over - Total
dp0060002	integer	DPSF6. Sex for population 62 years and over - Male
dp0060003	integer	DPSF6. Sex for population 62 years and over - Female
dp0070001	integer	DPSF7. Sex for population 65 years and over - Total
dp0070002	integer	DPSF7. Sex for population 65 years and over - Male
dp0070003	integer	DPSF7. Sex for population 65 years and over - Female
dp0080001	integer	DPSF8. Race - Total population
dp0080002	integer	DPSF8. Race - Population of one race
dp0080003	integer	DPSF8. Race - White
dp0080004	integer	DPSF8. Race - Black or African American
dp0080005	integer	DPSF8. Race - American Indian and Alaska Native
dp0080006	integer	DPSF8. Race - Asian total
dp0080007	integer	DPSF8. Race - Asian Indian
dp0080008	integer	DPSF8. Race - Chinese
dp0080009	integer	DPSF8. Race - Filipino
dp0080010	integer	DPSF8. Race - Japanese
dp0080011	integer	DPSF8. Race - Korean
dp0080012	integer	DPSF8. Race - Vietnamese
dp0080013	integer	DPSF8. Race - Other Asian
dp0080014	integer	DPSF8. Race - Native Hawaiian and Other Pacific Islander total
dp0080015	integer	DPSF8. Race - Native Hawaiian
dp0080016	integer	DPSF8. Race - Guamanian or Chamorro
dp0080017	integer	DPSF8. Race - Samoan
dp0080018	integer	DPSF8. Race - Other Pacific Islander
dp0080019	integer	DPSF8. Race - Some Other Race
dp0080020	integer	DPSF8. Race - Population of Two or More Races
dp0080021	integer	DPSF8. Race - White; American Indian and Alaska Native
dp0080022	integer	DPSF8. Race - White; Asian
dp0080023	integer	DPSF8. Race - White; Black or African American
dp0080024	integer	DPSF8. Race - White; Some Other Race
dp0090001	integer	DPSF9. Race (total races tallied) - White alone or in combination with one or more other races
dp0090002	integer	DPSF9. Race (total races tallied) - Black or African American alone or in combination with one or more other races
dp0090003	integer	DPSF9. Race (total races tallied) - American Indian and Alaska Native alone or in combination with one or more other races
dp0090004	integer	DPSF9. Race (total races tallied) - Asian alone or in combination with one or more other races
dp0090005	integer	DPSF9. Race (total races tallied) - Native Hawaiian and Other Pacific Islander alone or in combination with one or more other races
dp0090006	integer	DPSF9. Race (total races tallied) - Some Other Race alone or in combination with one or more other races
dp0100001	integer	DPSF10. Hispanic or Latino by specific origin - Total population
dp0100002	integer	DPSF10. Hispanic or Latino by specific origin - Hispanic or Latino (of any race)
dp0100003	integer	DPSF10. Hispanic or Latino by specific origin - Mexican
dp0100004	integer	DPSF10. Hispanic or Latino by specific origin - Puerto Rican
dp0100005	integer	DPSF10. Hispanic or Latino by specific origin - Cuban
dp0100006	integer	DPSF10. Hispanic or Latino by specific origin - Other Hispanic or Latino
dp0100007	integer	DPSF10. Hispanic or Latino by specific origin - Not Hispanic or Latino
dp0110001	integer	DPSF11. Hispanic or Latino and race - Total population
dp0110002	integer	DPSF11. Hispanic or Latino and race - Hispanic or Latino total
dp0110003	integer	DPSF11. Hispanic or Latino and race - Hispanic or Latino: White alone
dp0110004	integer	DPSF11. Hispanic or Latino and race - Hispanic or Latino: Black or African American alone
dp0110005	integer	DPSF11. Hispanic or Latino and race - Hispanic or Latino: American Indian and Alaska Native alone
dp0110006	integer	DPSF11. Hispanic or Latino and race - Hispanic or Latino: Asian alone
dp0110007	integer	DPSF11. Hispanic or Latino and race - Hispanic or Latino: Native Hawaiian and Other Pacific Islander alone
dp0110008	integer	DPSF11. Hispanic or Latino and race - Hispanic or Latino: Some Other Race alone
dp0110009	integer	DPSF11. Hispanic or Latino and race - Hispanic or Latino: Two or More Races
dp0110010	integer	DPSF11. Hispanic or Latino and race - Not Hispanic or Latino total
dp0110011	integer	DPSF11. Hispanic or Latino and race - Not Hispanic or Latino: White alone
dp0110012	integer	DPSF11. Hispanic or Latino and race - Not Hispanic or Latino: Black or African American alone
dp0110013	integer	DPSF11. Hispanic or Latino and race - Not Hispanic or Latino: American Indian and Alaska Native alone
dp0110014	integer	DPSF11. Hispanic or Latino and race - Not Hispanic or Latino: Asian alone
dp0110015	integer	DPSF11. Hispanic or Latino and race - Not Hispanic or Latino: Native Hawaiian and Other Pacific Islander alone
dp0110016	integer	DPSF11. Hispanic or Latino and race - Not Hispanic or Latino: Some Other Race alone
dp0110017	integer	DPSF11. Hispanic or Latino and race - Not Hispanic or Latino: Two or More Races
dp0120001	integer	DPSF12. Relationship - Total population
dp0120002	integer	DPSF12. Relationship - In households
dp0120003	integer	DPSF12. Relationship - Householder
dp0120004	integer	DPSF12. Relationship - Spouse
dp0120005	integer	DPSF12. Relationship - Child
dp0120006	integer	DPSF12. Relationship - Own child under 18 years
dp0120007	integer	DPSF12. Relationship - Other relatives
dp0120008	integer	DPSF12. Relationship - Other relatives under 18 years
dp0120009	integer	DPSF12. Relationship - Other relatives 65 years and over
dp0120010	integer	DPSF12. Relationship - Nonrelatives
dp0120011	integer	DPSF12. Relationship - Nonrelatives under 18 years
dp0120012	integer	DPSF12. Relationship - Nonrelatives 65 years and over
dp0120013	integer	DPSF12. Relationship - Unmarried partner
dp0120014	integer	DPSF12. Relationship - In group quarters
dp0120015	integer	DPSF12. Relationship - Institutionalized population
dp0120016	integer	DPSF12. Relationship - Institutionalized population: Male
dp0120017	integer	DPSF12. Relationship - Institutionalized population: Female
dp0120018	integer	DPSF12. Relationship - Noninstitutionalized population
dp0120019	integer	DPSF12. Relationship - Noninstitutionalized population: Male
dp0120020	integer	DPSF12. Relationship - Noninstitutionalized population: Female
dp0130001	integer	DPSF13. Households by type - Total households
dp0130002	integer	DPSF13. Households by type - Family households (families)
dp0130003	integer	DPSF13. Households by type - Family households with own children under 18 years
dp0130004	integer	DPSF13. Households by type - Husband-wife family
dp0130005	integer	DPSF13. Households by type - Husband-wife family with own children under 18 years
dp0130006	integer	DPSF13. Households by type - Male householder, no wife present
dp0130007	integer	DPSF13. Households by type - Male householder, no wife present, with own children under 18 years
dp0130008	integer	DPSF13. Households by type - Female householder, no husband present
dp0130009	integer	DPSF13. Households by type - Female householder, no husband present, with own children under 18 years
dp0130010	integer	DPSF13. Households by type - Nonfamily households
dp0130011	integer	DPSF13. Households by type - Householder living alone
dp0130012	integer	DPSF13. Households by type - Householder living alone: Male
dp0130013	integer	DPSF13. Households by type - Householder living alone: Male 65 years and over
dp0130014	integer	DPSF13. Households by type - Householder living alone: Female
dp0130015	integer	DPSF13. Households by type - Householder living alone: Female 65 years and over
dp0140001	integer	DPSF14. Households with individuals under 18 years - Total
dp0150001	integer	DPSF15. Households with individuals 65 years and over - Total
dp0160001	number	DPSF16. Average household size - Average household size
dp0170001	number	DPSF17. Average family size - Average family size
dp0180001	integer	DPSF18. Housing occupancy - Total housing units
dp0180002	integer	DPSF18. Housing occupancy - Occupied housing units
dp0180003	integer	DPSF18. Housing occupancy - Vacant housing units
dp0180004	integer	DPSF18. Housing occupancy - Vacant housing units for rent
dp0180005	integer	DPSF18. Housing occupancy - Vacant housing units rented, not occupied
dp0180006	integer	DPSF18. Housing occupancy - Vacant housing units for sale only
dp0180007	integer	DPSF18. Housing occupancy - Vacant housing units sold, not occupied
dp0180008	integer	DPSF18. Housing occupancy - Vacant housing units for seasonal, recreational, or occasional use
dp0180009	integer	DPSF18. Housing occupancy - All other vacant housing units
dp0190001	number	DPSF19. Homeowner vacancy rate - Homeowner vacancy rate (percent)
dp0200001	number	DPSF20. Rental vacancy rate - Rental vacancy rate (percent)
dp0210001	integer	DPSF21. Housing tenure - Total occupied housing units
dp0210002	integer	DPSF21. Housing tenure - Owner-occupied housing units
dp0210003	integer	DPSF21. Housing tenure - Renter-occupied housing units
dp0220001	integer	DPSF22. Population in occupied housing units by tenure - Owner-occupied housing units
dp0220002	integer	DPSF22. Population in occupied housing units by tenure - Renter-occupied housing units
dp0230001	number	DPSF23. Average household size by tenure - Owner occupied
dp0230002	number	DPSF23. Average household size by tenure - Renter occupied
geometry	geometry	Geospatial representation of the feature.
internal_point_latitude	number	Internal point latitude in decimal degrees.
internal_point_longitude	number	Internal point longitude in decimal degrees.
land_area	number	Land area in square meters.
shape_area	number	Area of the feature in square degrees.
shape_length	number	Length of the feature’s perimeter in degrees.
tract_id_fips	string	Census tract 10-digit FIPS code
tract_name	string	Census tract legal/statistical area description
water_area	number	Water area in square meters.

out_eia860__yearly_emissions_control_equipment#

Noticed any irregularities with this table? File a report.

Annual time series of the cost, type, operating status, retirement date, and install year of emissions control equipment reported to EIA.

Most-recent data:: 2024
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: EIA Form 860 – Annual Electric Generator Report
Primary key:: report_date, plant_id_eia, emission_control_id_pudl

Additional Details#

Includes control ids for sulfur dioxide (SO2), particulate matter, mercury, nitrogen oxide (NOX), and acid (HCl) gas monitoring.

This denormalized version includes plant name, utility id, pudl id, and utility name columns.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia860__yearly_emissions_control_equipment.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia860__yearly_emissions_control_equipment.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia860__yearly_emissions_control_equipment.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia860__yearly_emissions_control_equipment.parquet"
)

Columns#

Field Name	Type	Description
acid_gas_control	boolean	Indicates whether the emissions control equipment controls acid (HCl) gas.
data_maturity	string	Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.
emission_control_equipment_cost	number	The total cost to install a piece of emission control equipment.
emission_control_equipment_type_code	string	Short code indicating the type of emission control equipment installed.
emission_control_id_pudl	number	A PUDL-generated ID used to distinguish emission control units in the same report year and plant id. This ID should not be used to track units over time or between plants.
emission_control_operating_date	date	The date a piece of emissions control equipment began operating. Derived from month and year columns in the raw data.
emission_control_retirement_date	date	The expected or actual retirement date for a piece of emissions control equipment. Derived from month and year columns in the raw data.
mercury_control_id_eia	string	Mercury control identification number. This ID is not a unique identifier.
nox_control_id_eia	string	Nitrogen oxide control identification number. This ID is not a unique identifier.
operational_status	string	The operating status of the asset. For generators this is based on which tab the generator was listed in in EIA 860.
operational_status_code	string	The operating status of the asset.
particulate_control_id_eia	string	Particulate matter control identification number. This ID is not a unique identifier.
plant_id_eia	integer	The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.
plant_id_pudl	integer	A manually assigned PUDL plant ID. May not be constant over time.
plant_name_eia	string	Plant name.
report_date	date	Date reported.
so2_control_id_eia	string	Sulfur dioxide control identification number. This ID is not a unique identifier.
utility_id_eia	integer	The EIA Utility Identification number.
utility_id_pudl	integer	A manually assigned PUDL utility ID. May not be stable over time.
utility_name_eia	string	The name of the utility.

out_eia860__yearly_ownership#

Noticed any irregularities with this table? File a report.

Annual time series of generator ownership. Includes only jointly or third-party owned generators.

Most-recent data:: 2024
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: EIA Form 860 – Annual Electric Generator Report (Schedule 4)
Primary key:: report_date, plant_id_eia, generator_id, owner_utility_id_eia

Additional Details#

Denormalized to include plant and utility names and other associated IDs.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia860__yearly_ownership.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia860__yearly_ownership.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia860__yearly_ownership.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia860__yearly_ownership.parquet"
)

Columns#

Field Name	Type	Description
data_maturity	string	Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.
fraction_owned	number	Proportion of generator ownership attributable to this utility.
generator_id	string	Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string!
operator_utility_id_eia	integer	The EIA utility Identification number for the operator utility.
owner_city	string	City of owner.
owner_country	string	Three letter ISO-3166 country code.
owner_state	string	Two letter ISO-3166 political subdivision code.
owner_street_address	string	Steet address of owner.
owner_utility_id_eia	integer	The EIA utility Identification number for the owner company that is responsible for the day-to-day operations of the generator, not the operator utility.
owner_utility_name_eia	string	The name of the EIA owner utility.
owner_zip_code	string	Zip code of owner.
plant_id_eia	integer	The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.
plant_id_pudl	integer	A manually assigned PUDL plant ID. May not be constant over time.
plant_name_eia	string	Plant name.
report_date	date	Date reported.
utility_id_pudl	integer	A manually assigned PUDL utility ID for the owner company that is responsible for the day-to-day operations of the generator, not the operator utility. May not be stable over time.

out_eia861__yearly_balancing_authority_service_territory#

Noticed any irregularities with this table? File a report.

Annual time series of counties in balancing authority service territories.

Most-recent data:: 2024
Processing:: Data is ready for use in analyses, but for practical reasons has not been denormalized and remains in narrow format.
Source:: EIA Form 861 – Annual Electric Power Industry Report
Primary key:: balancing_authority_id_eia, report_date, county_id_fips, county

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia861__yearly_balancing_authority_service_territory.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia861__yearly_balancing_authority_service_territory.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia861__yearly_balancing_authority_service_territory.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia861__yearly_balancing_authority_service_territory.parquet"
)

Columns#

Field Name	Type	Description
area_km2	number	County area in km2.
balancing_authority_id_eia	integer	EIA balancing authority ID. This is often (but not always!) the same as the utility ID associated with the same legal entity.
county	string	County name.
county_id_fips	string	County ID from the Federal Information Processing Standard Publication 6-4.
county_name_census	string	County name as specified in Census DP1 Data.
population	number	County population, sourced from Census DP1 data.
report_date	date	Date reported.
state	string	Two letter US state abbreviation.
state_id_fips	string	Two digit state FIPS code.

out_eia861__yearly_utility_service_territory#

Noticed any irregularities with this table? File a report.

Annual time series of counties in utility service territories.

Most-recent data:: 2024
Processing:: Data is ready for use in analyses, but for practical reasons has not been denormalized and remains in narrow format.
Source:: EIA Form 861 – Annual Electric Power Industry Report
Primary key:: utility_id_eia, report_date, county_id_fips

Additional Details#

Contains additional information about counties.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia861__yearly_utility_service_territory.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia861__yearly_utility_service_territory.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia861__yearly_utility_service_territory.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia861__yearly_utility_service_territory.parquet"
)

Columns#

Field Name	Type	Description
area_km2	number	County area in km2.
county	string	County name.
county_id_fips	string	County ID from the Federal Information Processing Standard Publication 6-4.
county_name_census	string	County name as specified in Census DP1 Data.
population	number	County population, sourced from Census DP1 data.
report_date	date	Date reported.
state	string	Two letter US state abbreviation.
state_id_fips	string	Two digit state FIPS code.
utility_id_eia	integer	The EIA Utility Identification number.

out_eia923__boiler_fuel#

Noticed any irregularities with this table? File a report.

Monthly time series of boiler fuel consumption and emissions.

Most-recent data:: 2025
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: EIA Form 923 – Power Plant Operations Report (Schedule 3)
Primary key:: plant_id_eia, boiler_id, energy_source_code, prime_mover_code, report_date

Usage Warnings#

Date column arbitrarily uses the first of the month.
A small number of respondents only report annual fuel consumption, and all of it is reported in December.
Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.

Additional Details#

Reports the quantity of each type of fuel consumed by each boiler on a monthly basis, as well as the sulfur and ash content of those fuels. Fuel quantity is reported in standard EIA fuel units (tons, barrels, Mcf). Heat content per unit of fuel is also reported, making this table useful for calculating the thermal efficiency (heat rate) of various generation units.

This table provides better coverage of the entire fleet of generators than the core_eia923__monthly_generation_fuel table, but the fuel consumption reported here is not directly associated with a generator. This complicates the heat rate calculation, since the associations between individual boilers and generators are incomplete and can be complex.

This table inherits canonicalized values for plants and utilities. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__boiler_fuel.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__boiler_fuel.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia923__boiler_fuel.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__boiler_fuel.parquet"
)

Columns#

Field Name	Type	Description
ash_content_pct	number	Ash content percentage by weight to the nearest 0.1 percent.
boiler_id	string	Alphanumeric boiler ID.
data_maturity	string	Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.
energy_source_code	string	A 2-3 letter code indicating the energy source (e.g. fuel type) associated with the record.
fuel_consumed_mmbtu	number	Total consumption of fuel in physical unit, year to date. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production.
fuel_consumed_units	number	Consumption of the fuel type in physical unit. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production.
fuel_mmbtu_per_unit	number	Heat content of the fuel in millions of Btus per physical unit.
fuel_type_code_pudl	string	Simplified fuel type code used in PUDL
plant_id_eia	integer	The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.
plant_id_pudl	integer	A manually assigned PUDL plant ID. May not be constant over time.
plant_name_eia	string	Plant name.
prime_mover_code	string	Code for the type of prime mover (e.g. CT, CG)
report_date	date	Date reported.
sulfur_content_pct	number	Sulfur content percentage by weight to the nearest 0.01 percent.
unit_id_pudl	integer	Dynamically assigned PUDL unit id. WARNING: This ID is not guaranteed to be static long term as the input data and algorithm may evolve over time.
utility_id_eia	integer	The EIA Utility Identification number.
utility_id_pudl	integer	A manually assigned PUDL utility ID. May not be stable over time.
utility_name_eia	string	The name of the utility.

out_eia923__fuel_receipts_costs#

Noticed any irregularities with this table? File a report.

Individual fuel deliveries to power plants, organized by fuel type and supplier.

Most-recent data:: 2025
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: EIA Form 923 – Power Plant Operations Report (Schedule 2 - Part A)
Primary key:: This table has no primary key. Each record describes an individual fuel delivery. There can be multiple deliveries of the same type of fuel from the same supplier to the same plant in a single month, so the table has no natural primary key.

Usage Warnings#

Date column arbitrarily uses the first of the month.
Some values have been redacted.
Time of fuel deliveries is not necessarily connected with time of fuel consumption.
Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.
Contains estimated values.

Additional Details#

There can be a significant delay between the receipt of fuel and its consumption, so using this table to infer monthly attributes associated with power generation may not be entirely accurate. However, this is the most granular data we have describing fuel costs, and we use it in calculating the marginal cost of electricity for individual generation units.

Under some circumstances utilities are allowed to treat the price of fuel as proprietary business data, meaning it is redacted from the publicly available spreadsheets. It’s still reported to EIA and influences the aggregated (state, region, annual, etc.) fuel prices they publish. From 2009-2021 about 1/3 of all prices are redacted. The missing data is not randomly distributed. Deregulated markets dominated by merchant generators (independent power producers) redact much more data, and natural gas is by far the most likely fuel to have its price redacted. This means, for instance, that the entire Northeastern US reports essentially no fine-grained data about its natural gas prices.

Additional data which we haven’t yet integrated is available in a similar format from 2002-2008 via the EIA-423, and going back as far as 1972 from the FERC-423.

This table inherits canonicalized values for plants and utilities. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__fuel_receipts_costs.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__fuel_receipts_costs.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia923__fuel_receipts_costs.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__fuel_receipts_costs.parquet"
)

Columns#

Field Name	Type	Description
ash_content_pct	number	Ash content percentage by weight to the nearest 0.1 percent.
bulk_agg_fuel_cost_per_mmbtu	number	Fuel cost per MMBTU reported in the EIA bulk electricity data. This is an aggregate average fuel price for a whole state, region, month, sector, etc. Used to fill in missing fuel prices.
chlorine_content_ppm	number	For coal only: the chlorine content in parts per million (ppm) to the nearest 0.001 ppm. If lab tests of the coal do not include the chlorine content, this field contains the amount specified in the contract with the supplier.
coalmine_county_id_fips	string	County ID from the Federal Information Processing Standard Publication 6-4. This is the county where the coal mine is located.
contract_expiration_date	date	Date contract expires.Format: MMYY.
contract_type_code	string	Purchase type under which receipts occurred in the reporting month. C: Contract, NC: New Contract, S: Spot Purchase, T: Tolling Agreement.
data_maturity	string	Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.
energy_source_code	string	A 2-3 letter code indicating the energy source (e.g. fuel type) associated with the record.
fuel_consumed_mmbtu	number	Total consumption of fuel in physical unit, year to date. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production.
fuel_cost_per_mmbtu	number	Average fuel cost per MMBTU of heat content in nominal USD.
fuel_cost_per_mmbtu_source	string	Indicates the source of the values in the fuel_cost_per_mmbtu column. The fuel cost either comes directly from the EIA forms (original), was filled in from the EIA’s API using state-level averages (eiaapi), was filled in using a rolling average (rolling_avg) or When the records get aggregated together and contain multiple sources (mixed).
fuel_group_code	string	Fuel groups used in the Electric Power Monthly
fuel_mmbtu_per_unit	number	Heat content of the fuel in millions of Btus per physical unit.
fuel_received_units	number	Quantity of fuel received in tons, barrel, or Mcf.
fuel_type_code_pudl	string	Simplified fuel type code used in PUDL
mercury_content_ppm	number	Mercury content in parts per million (ppm) to the nearest 0.001 ppm.
mine_id_msha	integer	MSHA issued mine identifier.
mine_name	string	Coal mine name.
mine_state	string	State where the coal mine is located. Two letter abbreviation.
mine_type_code	string	Type of coal mine.
moisture_content_pct	number	For coal only: the moisture content of the fuel in terms of moisture percentage by weight. Reported to the nearest 0.01 percent.
natural_gas_delivery_contract_type_code	string	Contract type for natural gas delivery service:
natural_gas_transport_code	string	Contract type for natural gas transportation service.
plant_id_eia	integer	The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.
plant_id_pudl	integer	A manually assigned PUDL plant ID. May not be constant over time.
plant_name_eia	string	Plant name.
primary_transportation_mode_code	string	Transportation mode for the longest distance transported.
report_date	date	Date reported.
secondary_transportation_mode_code	string	Transportation mode for the second longest distance transported.
state	string	Two letter US state abbreviation.
sulfur_content_pct	number	Sulfur content percentage by weight to the nearest 0.01 percent.
supplier_name	string	Company that sold the fuel to the plant or, in the case of Natural Gas, pipeline owner.
total_fuel_cost	number	Total annual reported fuel costs for the plant part. Includes costs from all fuels.
utility_id_eia	integer	The EIA Utility Identification number.
utility_id_pudl	integer	A manually assigned PUDL utility ID. May not be stable over time.
utility_name_eia	string	The name of the utility.

out_eia923__generation#

Noticed any irregularities with this table? File a report.

Monthly time series of net electricity generated by power plant generators.

Most-recent data:: 2025
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: EIA Form 923 – Power Plant Operations Report (Schedule 3)
Primary key:: plant_id_eia, generator_id, report_date

Usage Warnings#

Date column arbitrarily uses the first of the month.
Not all IDs are present.
Table has known low coverage - either geographic or temporal or otherwise.
A small number of respondents only report annual fuel consumption, and all of it is reported in December.
Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.

Additional Details#

This is the most granular information we have about how much electricity individual generators are producing, but only about half of all the generation reported in the core_eia923__monthly_generation_fuel appears in this table due to the different reporting requirements imposed on different types and sizes of generators.

Whenever possible, we use this generator-level net generation to estimate the heat rates of generation units and the marginal cost of electricity on a per-generator basis, but those calculations depend on plant-level fuel costs and sometimes uncertain or incomplete boiler-generator associations.

This table inherits canonicalized values for plants and utilities. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__generation.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__generation.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia923__generation.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__generation.parquet"
)

Columns#

Field Name	Type	Description
data_maturity	string	Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.
generator_id	string	Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string!
net_generation_mwh	number	Net electricity generation for the specified period in megawatt-hours (MWh).
plant_id_eia	integer	The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.
plant_id_pudl	integer	A manually assigned PUDL plant ID. May not be constant over time.
plant_name_eia	string	Plant name.
report_date	date	Date reported.
unit_id_pudl	integer	Dynamically assigned PUDL unit id. WARNING: This ID is not guaranteed to be static long term as the input data and algorithm may evolve over time.
utility_id_eia	integer	The EIA Utility Identification number.
utility_id_pudl	integer	A manually assigned PUDL utility ID. May not be stable over time.
utility_name_eia	string	The name of the utility.

out_eia923__generation_fuel_combined#

Noticed any irregularities with this table? File a report.

Monthly time series of generation and fuel consumption of all generation units.

Most-recent data:: 2025
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: EIA Form 923 – Power Plant Operations Report (Schedule 3)
Primary key:: plant_id_eia, report_date, prime_mover_code, energy_source_code

Usage Warnings#

Date column arbitrarily uses the first of the month.
A small number of respondents only report annual fuel consumption, and all of it is reported in December.
Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.

Additional Details#

Denormalized, combined data from the core_eia923__monthly_generation_fuel and core_eia923__monthly_generation_fuel_nuclear with nuclear generation aggregated from the nuclear generation unit level up to the plant prime mover level, so as to be compatible with fossil fuel generation data.

This data can’t be easily linked to individual boilers, generators, and generation units, but it is provides the most complete coverage of fuel consumption and electricity generation for the entire generation fleet. We use the primary fuels and prime movers reported for each generator along with their capacities to attribute fuel consumption and generation when it isn’t directly reported in the core_eia923__monthly_generation and core_eia923__monthly_boiler_fuel tables in order to calculate capacity factors, heat rates, and the marginal cost of electricity.

The table makes a distinction between all fuel consumed and fuel consumed for electricity generation because some units are also combined heat and power (CHP) units, and also provide high temperature process heat at the expense of net electricity generation.

This table inherits canonicalized values for plants and utilities. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__generation_fuel_combined.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__generation_fuel_combined.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia923__generation_fuel_combined.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__generation_fuel_combined.parquet"
)

Columns#

Field Name	Type	Description
data_maturity	string	Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.
energy_source_code	string	A 2-3 letter code indicating the energy source (e.g. fuel type) associated with the record.
fuel_consumed_for_electricity_mmbtu	number	Total consumption of fuel to produce electricity, in physical unit, year to date.
fuel_consumed_for_electricity_units	number	Consumption for electric generation of the fuel type in physical unit.
fuel_consumed_mmbtu	number	Total consumption of fuel in physical unit, year to date. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production.
fuel_consumed_units	number	Consumption of the fuel type in physical unit. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production.
fuel_mmbtu_per_unit	number	Heat content of the fuel in millions of Btus per physical unit.
fuel_type_code_agg	string	A partial aggregation of the reported fuel type codes into larger categories used by EIA in, for example, the Annual Energy Review (AER) or Monthly Energy Review (MER). Two or three letter alphanumeric.
fuel_type_code_pudl	string	Simplified fuel type code used in PUDL
net_generation_mwh	number	Net electricity generation for the specified period in megawatt-hours (MWh).
plant_id_eia	integer	The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.
plant_id_pudl	integer	A manually assigned PUDL plant ID. May not be constant over time.
plant_name_eia	string	Plant name.
prime_mover_code	string	Code for the type of prime mover (e.g. CT, CG)
report_date	date	Date reported.
utility_id_eia	integer	The EIA Utility Identification number.
utility_id_pudl	integer	A manually assigned PUDL utility ID. May not be stable over time.
utility_name_eia	string	The name of the utility.

out_eia923__monthly_boiler_fuel#

Noticed any irregularities with this table? File a report.

Monthly time series of boiler fuel consumption and emissions.

Most-recent data:: 2025
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: EIA Form 923 – Power Plant Operations Report (Schedule 3)
Primary key:: plant_id_eia, boiler_id, energy_source_code, prime_mover_code, report_date

Usage Warnings#

Date column arbitrarily uses the first of the month.
A small number of respondents only report annual fuel consumption, and all of it is reported in December.
Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.

Additional Details#

Reports the quantity of each type of fuel consumed by each boiler on a monthly basis, as well as the sulfur and ash content of those fuels. Fuel quantity is reported in standard EIA fuel units (tons, barrels, Mcf). Heat content per unit of fuel is also reported, making this table useful for calculating the thermal efficiency (heat rate) of various generation units.

This table provides better coverage of the entire fleet of generators than the core_eia923__monthly_generation_fuel table, but the fuel consumption reported here is not directly associated with a generator. This complicates the heat rate calculation, since the associations between individual boilers and generators are incomplete and can be complex.

This table inherits canonicalized values for plants and utilities. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__monthly_boiler_fuel.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__monthly_boiler_fuel.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia923__monthly_boiler_fuel.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__monthly_boiler_fuel.parquet"
)

Columns#

Field Name	Type	Description
ash_content_pct	number	Ash content percentage by weight to the nearest 0.1 percent.
boiler_id	string	Alphanumeric boiler ID.
data_maturity	string	Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.
energy_source_code	string	A 2-3 letter code indicating the energy source (e.g. fuel type) associated with the record.
fuel_consumed_mmbtu	number	Total consumption of fuel in physical unit, year to date. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production.
fuel_consumed_units	number	Consumption of the fuel type in physical unit. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production.
fuel_mmbtu_per_unit	number	Heat content of the fuel in millions of Btus per physical unit.
fuel_type_code_pudl	string	Simplified fuel type code used in PUDL
plant_id_eia	integer	The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.
plant_id_pudl	integer	A manually assigned PUDL plant ID. May not be constant over time.
plant_name_eia	string	Plant name.
prime_mover_code	string	Code for the type of prime mover (e.g. CT, CG)
report_date	date	Date reported.
sulfur_content_pct	number	Sulfur content percentage by weight to the nearest 0.01 percent.
unit_id_pudl	integer	Dynamically assigned PUDL unit id. WARNING: This ID is not guaranteed to be static long term as the input data and algorithm may evolve over time.
utility_id_eia	integer	The EIA Utility Identification number.
utility_id_pudl	integer	A manually assigned PUDL utility ID. May not be stable over time.
utility_name_eia	string	The name of the utility.

out_eia923__monthly_fuel_receipts_costs#

Noticed any irregularities with this table? File a report.

Monthly time series of fuel deliveries to power plants aggregated by plant, fuel type, and time period.

Most-recent data:: 2025
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: EIA Form 923 – Power Plant Operations Report (Schedule 2 - Part A)
Primary key:: plant_id_eia, fuel_type_code_pudl, report_date

Usage Warnings#

Date column arbitrarily uses the first of the month.
Contains estimated values.
Some values have been redacted.
Time of fuel deliveries is not necessarily connected with time of fuel consumption.
Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.

Additional Details#

This table is an aggregation of the more detailed data in the core_eia923__fuel_receipts_costs table. It provides a tidy timeseries of deliveries by fuel type for each plant. However, not all values in the original table can be aggregated meaningfully, so this table contains only a subset of the source table columns – primarily numerical values and a handful of categorical variables, plus additional attributes that are constant within each plant-fuel-time period grouping and associated with the plant or utility. When aggregating numerical values any sum that contains an NA value is treated as NA.

This table inherits canonicalized values for plants and utilities. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__monthly_fuel_receipts_costs.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__monthly_fuel_receipts_costs.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia923__monthly_fuel_receipts_costs.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__monthly_fuel_receipts_costs.parquet"
)

Columns#

Field Name	Type	Description
ash_content_pct	number	Ash content percentage by weight to the nearest 0.1 percent.
chlorine_content_ppm	number	For coal only: the chlorine content in parts per million (ppm) to the nearest 0.001 ppm. If lab tests of the coal do not include the chlorine content, this field contains the amount specified in the contract with the supplier.
data_maturity	string	Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.
fuel_consumed_mmbtu	number	Total consumption of fuel in physical unit, year to date. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production.
fuel_cost_per_mmbtu	number	Average fuel cost per MMBTU of heat content in nominal USD.
fuel_cost_per_mmbtu_source	string	Indicates the source of the values in the fuel_cost_per_mmbtu column. The fuel cost either comes directly from the EIA forms (original), was filled in from the EIA’s API using state-level averages (eiaapi), was filled in using a rolling average (rolling_avg) or When the records get aggregated together and contain multiple sources (mixed).
fuel_mmbtu_per_unit	number	Heat content of the fuel in millions of Btus per physical unit.
fuel_received_units	number	Quantity of fuel received in tons, barrel, or Mcf.
fuel_type_code_pudl	string	Simplified fuel type code used in PUDL
mercury_content_ppm	number	Mercury content in parts per million (ppm) to the nearest 0.001 ppm.
moisture_content_pct	number	For coal only: the moisture content of the fuel in terms of moisture percentage by weight. Reported to the nearest 0.01 percent.
plant_id_eia	integer	The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.
plant_id_pudl	integer	A manually assigned PUDL plant ID. May not be constant over time.
plant_name_eia	string	Plant name.
report_date	date	Date reported.
state	string	Two letter US state abbreviation.
sulfur_content_pct	number	Sulfur content percentage by weight to the nearest 0.01 percent.
total_fuel_cost	number	Total annual reported fuel costs for the plant part. Includes costs from all fuels.
utility_id_eia	integer	The EIA Utility Identification number.
utility_id_pudl	integer	A manually assigned PUDL utility ID. May not be stable over time.
utility_name_eia	string	The name of the utility.

out_eia923__monthly_generation#

Noticed any irregularities with this table? File a report.

Monthly time series of net electricity generated by power plant generators.

Most-recent data:: 2025
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: EIA Form 923 – Power Plant Operations Report (Schedule 3)
Primary key:: plant_id_eia, generator_id, report_date

Usage Warnings#

Date column arbitrarily uses the first of the month.
Not all IDs are present.
Table has known low coverage - either geographic or temporal or otherwise.
A small number of respondents only report annual fuel consumption, and all of it is reported in December.
Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.

Additional Details#

This is the most granular information we have about how much electricity individual generators are producing, but only about half of all the generation reported in the core_eia923__monthly_generation_fuel appears in this table due to the different reporting requirements imposed on different types and sizes of generators.

Whenever possible, we use this generator-level net generation to estimate the heat rates of generation units and the marginal cost of electricity on a per-generator basis, but those calculations depend on plant-level fuel costs and sometimes uncertain or incomplete boiler-generator associations.

This table inherits canonicalized values for plants and utilities. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__monthly_generation.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__monthly_generation.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia923__monthly_generation.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__monthly_generation.parquet"
)

Columns#

Field Name	Type	Description
data_maturity	string	Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.
generator_id	string	Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string!
net_generation_mwh	number	Net electricity generation for the specified period in megawatt-hours (MWh).
plant_id_eia	integer	The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.
plant_id_pudl	integer	A manually assigned PUDL plant ID. May not be constant over time.
plant_name_eia	string	Plant name.
report_date	date	Date reported.
unit_id_pudl	integer	Dynamically assigned PUDL unit id. WARNING: This ID is not guaranteed to be static long term as the input data and algorithm may evolve over time.
utility_id_eia	integer	The EIA Utility Identification number.
utility_id_pudl	integer	A manually assigned PUDL utility ID. May not be stable over time.
utility_name_eia	string	The name of the utility.

out_eia923__monthly_generation_fuel_by_generator#

Noticed any irregularities with this table? File a report.

Monthly time series of of estimated net generation and fuel consumption by generator.

Most-recent data:: 2024
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: EIA Form 923 – Power Plant Operations Report (Schedule 3)
Primary key:: report_date, plant_id_eia, generator_id

Usage Warnings#

Contains estimated values.
This downscaling process used to create this table does not distinguish between primary and secondary energy_sources for generators (see below for implications).
Date column arbitrarily uses the first of the month.
A small number of respondents only report annual fuel consumption, and all of it is reported in December.
Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.
Contains information from multiple raw inputs.

Additional Details#

Based on allocating net electricity generation and fuel consumption reported in the EIA-923 generation and generation_fuel tables to individual generators.

The net generation and fuel consumption allocation method PUDL employs begins with the following context of the originally reported EIA-860 and EIA-923 data:

The core_eia923__monthly_generation_fuel table is the authoritative source of information about how much generation and fuel consumption is attributable to an entire plant. This table has the most complete data coverage, but it is not the most granular data reported.
The core_eia923__monthly_generation table contains the most granular net generation data. It is reported at the plant_id_eia, generator_id and report_date level. This table includes only ~40% of the total MWhs reported in the core_eia923__monthly_generation_fuel table.
The core_eia923__monthly_boiler_fuel table contains the most granular fuel consumption data. It is reported at the boiler/prime mover/energy source level. This table includes only ~40% of the total MMBTUs reported in the core_eia923__monthly_generation_fuel table.
The core_eia860__scd_generators table provides an exhaustive list of all generators whose generation is being reported in the core_eia923__monthly_generation_fuel table.

In this table, PUDL aggregates the net generation and fuel consumption that has been allocated to the generator_id/energy_source_code/prime_mover_code level in the:ref:out_eia923__monthly_generation_fuel_by_generator_energy_source to the generator level.This process does not distinguish between primary and secondary energy_sources for generators. Net generation is allocated equally between energy source codes, so if a plant has multiple generators with the same prime_mover_code but different energy source codes the core_eia923__monthly_generation_fuel records will be associated similarly between these two generators. Allocated net generation will still be proportional to each generator’s net generation or capacity.

This table inherits canonicalized values for generators, plants, and utilities. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__monthly_generation_fuel_by_generator.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__monthly_generation_fuel_by_generator.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia923__monthly_generation_fuel_by_generator.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__monthly_generation_fuel_by_generator.parquet"
)

Columns#

Field Name	Type	Description
fuel_consumed_for_electricity_mmbtu	number	Total consumption of fuel to produce electricity, in physical unit, year to date.
fuel_consumed_mmbtu	number	Total consumption of fuel in physical unit, year to date. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production.
generator_id	string	Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string!
net_generation_mwh	number	Net electricity generation for the specified period in megawatt-hours (MWh).
plant_id_eia	integer	The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.
plant_id_pudl	integer	A manually assigned PUDL plant ID. May not be constant over time.
plant_name_eia	string	Plant name.
report_date	date	Date reported.
unit_id_pudl	integer	Dynamically assigned PUDL unit id. WARNING: This ID is not guaranteed to be static long term as the input data and algorithm may evolve over time.
utility_id_eia	integer	The EIA Utility Identification number.
utility_id_pudl	integer	A manually assigned PUDL utility ID. May not be stable over time.
utility_name_eia	string	The name of the utility.

out_eia923__monthly_generation_fuel_by_generator_energy_source#

Noticed any irregularities with this table? File a report.

Monthly time series of of estimated net generation and fuel consumption associated with each combination of generator, energy source, and prime mover.

Most-recent data:: 2024
Processing:: Data is ready for use in analyses, but for practical reasons has not been denormalized and remains in narrow format.
Source:: EIA Form 923 – Power Plant Operations Report (Schedule 3)
Primary key:: report_date, plant_id_eia, generator_id, prime_mover_code, energy_source_code

Usage Warnings#

Contains estimated values.
This downscaling process used to create this table does not distinguish between primary and secondary energy_sources for generators (see below for implications).
Date column arbitrarily uses the first of the month.
A small number of respondents only report annual fuel consumption, and all of it is reported in December.
Contains information from multiple raw inputs.

Additional Details#

The net generation and fuel consumption allocation method PUDL employs begins with the following context of the originally reported EIA-860 and EIA-923 data:

The core_eia923__monthly_generation_fuel table is the authoritative source of information about how much generation and fuel consumption is attributable to an entire plant. This table has the most complete data coverage, but it is not the most granular data reported.
The core_eia923__monthly_generation table contains the most granular net generation data. It is reported at the plant_id_eia, generator_id and report_date level. This table includes only ~40% of the total MWhs reported in the core_eia923__monthly_generation_fuel table.
The core_eia923__monthly_boiler_fuel table contains the most granular fuel consumption data. It is reported at the boiler/prime mover/energy source level. This table includes only ~40% of the total MMBTUs reported in the core_eia923__monthly_generation_fuel table.
The core_eia860__scd_generators table provides an exhaustive list of all generators whose generation is being reported in the core_eia923__monthly_generation_fuel table.

In this table, PUDL has allocated the net electricity generation and fuel consumption from core_eia923__monthly_generation_fuel to the generator_id/energy_source_code/prime_mover_code level.

The allocation process entails generating a fraction for each record based on the net generation in the core_eia923__monthly_generation table and the capacity from the core_eia860__scd_generators table. This process does not distinguish between primary and secondary energy_sources for generators. Net generation is allocated equally between energy source codes, so if a plant has multiple generators with the same prime_mover_code but different energy source codes the core_eia923__monthly_generation_fuel records will be associated similarly between these two generators. Allocated net generation will still be proportional to each generator’s net generation or capacity.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__monthly_generation_fuel_by_generator_energy_source.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__monthly_generation_fuel_by_generator_energy_source.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia923__monthly_generation_fuel_by_generator_energy_source.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__monthly_generation_fuel_by_generator_energy_source.parquet"
)

Columns#

Field Name	Type	Description
energy_source_code	string	A 2-3 letter code indicating the energy source (e.g. fuel type) associated with the record.
energy_source_code_num	string	Name of the energy_source_code_N column that this energy source code was reported in for the generator referenced in the same record.
fuel_consumed_for_electricity_mmbtu	number	Total consumption of fuel to produce electricity, in physical unit, year to date.
fuel_consumed_mmbtu	number	Total consumption of fuel in physical unit, year to date. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production.
generator_id	string	Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string!
net_generation_mwh	number	Net electricity generation for the specified period in megawatt-hours (MWh).
plant_id_eia	integer	The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.
prime_mover_code	string	Code for the type of prime mover (e.g. CT, CG)
report_date	date	Date reported.

out_eia923__monthly_generation_fuel_combined#

Noticed any irregularities with this table? File a report.

Monthly time series of generation and fuel consumption of all generation units.

Most-recent data:: 2025
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: EIA Form 923 – Power Plant Operations Report (Schedule 3)
Primary key:: plant_id_eia, report_date, prime_mover_code, energy_source_code

Usage Warnings#

Date column arbitrarily uses the first of the month.
A small number of respondents only report annual fuel consumption, and all of it is reported in December.
Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.

Additional Details#

Denormalized, combined data from the core_eia923__monthly_generation_fuel and core_eia923__monthly_generation_fuel_nuclear with nuclear generation aggregated from the nuclear generation unit level up to the plant prime mover level, so as to be compatible with fossil fuel generation data.

This data can’t be easily linked to individual boilers, generators, and generation units, but it is provides the most complete coverage of fuel consumption and electricity generation for the entire generation fleet. We use the primary fuels and prime movers reported for each generator along with their capacities to attribute fuel consumption and generation when it isn’t directly reported in the core_eia923__monthly_generation and core_eia923__monthly_boiler_fuel tables in order to calculate capacity factors, heat rates, and the marginal cost of electricity.

The table makes a distinction between all fuel consumed and fuel consumed for electricity generation because some units are also combined heat and power (CHP) units, and also provide high temperature process heat at the expense of net electricity generation.

This table inherits canonicalized values for plants and utilities. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__monthly_generation_fuel_combined.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__monthly_generation_fuel_combined.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia923__monthly_generation_fuel_combined.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__monthly_generation_fuel_combined.parquet"
)

Columns#

Field Name	Type	Description
data_maturity	string	Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.
energy_source_code	string	A 2-3 letter code indicating the energy source (e.g. fuel type) associated with the record.
fuel_consumed_for_electricity_mmbtu	number	Total consumption of fuel to produce electricity, in physical unit, year to date.
fuel_consumed_for_electricity_units	number	Consumption for electric generation of the fuel type in physical unit.
fuel_consumed_mmbtu	number	Total consumption of fuel in physical unit, year to date. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production.
fuel_consumed_units	number	Consumption of the fuel type in physical unit. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production.
fuel_mmbtu_per_unit	number	Heat content of the fuel in millions of Btus per physical unit.
fuel_type_code_pudl	string	Simplified fuel type code used in PUDL
net_generation_mwh	number	Net electricity generation for the specified period in megawatt-hours (MWh).
plant_id_eia	integer	The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.
plant_id_pudl	integer	A manually assigned PUDL plant ID. May not be constant over time.
plant_name_eia	string	Plant name.
prime_mover_code	string	Code for the type of prime mover (e.g. CT, CG)
report_date	date	Date reported.
utility_id_eia	integer	The EIA Utility Identification number.
utility_id_pudl	integer	A manually assigned PUDL utility ID. May not be stable over time.
utility_name_eia	string	The name of the utility.

out_eia923__yearly_boiler_fuel#

Noticed any irregularities with this table? File a report.

Annual time series of boiler fuel consumption and emissions.

Most-recent data:: 2024
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: EIA Form 923 – Power Plant Operations Report (Schedule 3)
Primary key:: plant_id_eia, boiler_id, energy_source_code, prime_mover_code, report_date

Usage Warnings#

Date column arbitrarily uses the first of the month.
A small number of respondents only report annual fuel consumption, and all of it is reported in December.
Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.

Additional Details#

Reports the quantity of each type of fuel consumed by each boiler on a monthly basis, as well as the sulfur and ash content of those fuels. Fuel quantity is reported in standard EIA fuel units (tons, barrels, Mcf). Heat content per unit of fuel is also reported, making this table useful for calculating the thermal efficiency (heat rate) of various generation units.

This table provides better coverage of the entire fleet of generators than the core_eia923__monthly_generation_fuel table, but the fuel consumption reported here is not directly associated with a generator. This complicates the heat rate calculation, since the associations between individual boilers and generators are incomplete and can be complex.

This table inherits canonicalized values for plants and utilities. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__yearly_boiler_fuel.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__yearly_boiler_fuel.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia923__yearly_boiler_fuel.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__yearly_boiler_fuel.parquet"
)

Columns#

Field Name	Type	Description
ash_content_pct	number	Ash content percentage by weight to the nearest 0.1 percent.
boiler_id	string	Alphanumeric boiler ID.
data_maturity	string	Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.
energy_source_code	string	A 2-3 letter code indicating the energy source (e.g. fuel type) associated with the record.
fuel_consumed_mmbtu	number	Total consumption of fuel in physical unit, year to date. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production.
fuel_consumed_units	number	Consumption of the fuel type in physical unit. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production.
fuel_mmbtu_per_unit	number	Heat content of the fuel in millions of Btus per physical unit.
fuel_type_code_pudl	string	Simplified fuel type code used in PUDL
plant_id_eia	integer	The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.
plant_id_pudl	integer	A manually assigned PUDL plant ID. May not be constant over time.
plant_name_eia	string	Plant name.
prime_mover_code	string	Code for the type of prime mover (e.g. CT, CG)
report_date	date	Date reported.
sulfur_content_pct	number	Sulfur content percentage by weight to the nearest 0.01 percent.
unit_id_pudl	integer	Dynamically assigned PUDL unit id. WARNING: This ID is not guaranteed to be static long term as the input data and algorithm may evolve over time.
utility_id_eia	integer	The EIA Utility Identification number.
utility_id_pudl	integer	A manually assigned PUDL utility ID. May not be stable over time.
utility_name_eia	string	The name of the utility.

out_eia923__yearly_fuel_receipts_costs#

Noticed any irregularities with this table? File a report.

Annual time series of fuel deliveries to power plants aggregated by plant, fuel type, and time period.

Most-recent data:: 2024
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: EIA Form 923 – Power Plant Operations Report (Schedule 2 - Part A)
Primary key:: plant_id_eia, fuel_type_code_pudl, report_date

Usage Warnings#

Date column arbitrarily uses the first of the month.
Contains estimated values.
Some values have been redacted.
Time of fuel deliveries is not necessarily connected with time of fuel consumption.
Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.

Additional Details#

This table is an aggregation of the more detailed data in the core_eia923__fuel_receipts_costs table. It provides a tidy timeseries of deliveries by fuel type for each plant. However, not all values in the original table can be aggregated meaningfully, so this table contains only a subset of the source table columns – primarily numerical values and a handful of categorical variables, plus additional attributes that are constant within each plant-fuel-time period grouping and associated with the plant or utility. When aggregating numerical values any sum that contains an NA value is treated as NA.

This table inherits canonicalized values for plants and utilities. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__yearly_fuel_receipts_costs.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__yearly_fuel_receipts_costs.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia923__yearly_fuel_receipts_costs.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__yearly_fuel_receipts_costs.parquet"
)

Columns#

Field Name	Type	Description
ash_content_pct	number	Ash content percentage by weight to the nearest 0.1 percent.
chlorine_content_ppm	number	For coal only: the chlorine content in parts per million (ppm) to the nearest 0.001 ppm. If lab tests of the coal do not include the chlorine content, this field contains the amount specified in the contract with the supplier.
data_maturity	string	Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.
fuel_consumed_mmbtu	number	Total consumption of fuel in physical unit, year to date. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production.
fuel_cost_per_mmbtu	number	Average fuel cost per MMBTU of heat content in nominal USD.
fuel_cost_per_mmbtu_source	string	Indicates the source of the values in the fuel_cost_per_mmbtu column. The fuel cost either comes directly from the EIA forms (original), was filled in from the EIA’s API using state-level averages (eiaapi), was filled in using a rolling average (rolling_avg) or When the records get aggregated together and contain multiple sources (mixed).
fuel_mmbtu_per_unit	number	Heat content of the fuel in millions of Btus per physical unit.
fuel_received_units	number	Quantity of fuel received in tons, barrel, or Mcf.
fuel_type_code_pudl	string	Simplified fuel type code used in PUDL
mercury_content_ppm	number	Mercury content in parts per million (ppm) to the nearest 0.001 ppm.
moisture_content_pct	number	For coal only: the moisture content of the fuel in terms of moisture percentage by weight. Reported to the nearest 0.01 percent.
plant_id_eia	integer	The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.
plant_id_pudl	integer	A manually assigned PUDL plant ID. May not be constant over time.
plant_name_eia	string	Plant name.
report_date	date	Date reported.
state	string	Two letter US state abbreviation.
sulfur_content_pct	number	Sulfur content percentage by weight to the nearest 0.01 percent.
total_fuel_cost	number	Total annual reported fuel costs for the plant part. Includes costs from all fuels.
utility_id_eia	integer	The EIA Utility Identification number.
utility_id_pudl	integer	A manually assigned PUDL utility ID. May not be stable over time.
utility_name_eia	string	The name of the utility.

out_eia923__yearly_generation#

Noticed any irregularities with this table? File a report.

Annual time series of net electricity generated by power plant generators.

Most-recent data:: 2024
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: EIA Form 923 – Power Plant Operations Report (Schedule 3)
Primary key:: plant_id_eia, generator_id, report_date

Usage Warnings#

Date column arbitrarily uses the first of the month.
Not all IDs are present.
Table has known low coverage - either geographic or temporal or otherwise.
A small number of respondents only report annual fuel consumption, and all of it is reported in December.
Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.

Additional Details#

This is the most granular information we have about how much electricity individual generators are producing, but only about half of all the generation reported in the core_eia923__monthly_generation_fuel appears in this table due to the different reporting requirements imposed on different types and sizes of generators.

Whenever possible, we use this generator-level net generation to estimate the heat rates of generation units and the marginal cost of electricity on a per-generator basis, but those calculations depend on plant-level fuel costs and sometimes uncertain or incomplete boiler-generator associations.

This table inherits canonicalized values for plants and utilities. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__yearly_generation.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__yearly_generation.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia923__yearly_generation.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__yearly_generation.parquet"
)

Columns#

Field Name	Type	Description
data_maturity	string	Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.
generator_id	string	Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string!
net_generation_mwh	number	Net electricity generation for the specified period in megawatt-hours (MWh).
plant_id_eia	integer	The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.
plant_id_pudl	integer	A manually assigned PUDL plant ID. May not be constant over time.
plant_name_eia	string	Plant name.
report_date	date	Date reported.
unit_id_pudl	integer	Dynamically assigned PUDL unit id. WARNING: This ID is not guaranteed to be static long term as the input data and algorithm may evolve over time.
utility_id_eia	integer	The EIA Utility Identification number.
utility_id_pudl	integer	A manually assigned PUDL utility ID. May not be stable over time.
utility_name_eia	string	The name of the utility.

out_eia923__yearly_generation_fuel_by_generator#

Noticed any irregularities with this table? File a report.

Annual time series of of estimated net generation and fuel consumption by generator.

Most-recent data:: 2024
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: EIA Form 923 – Power Plant Operations Report (Schedule 3)
Primary key:: report_date, plant_id_eia, generator_id

Usage Warnings#

Contains estimated values.
This downscaling process used to create this table does not distinguish between primary and secondary energy_sources for generators (see below for implications).
Date column arbitrarily uses the first of the month.
A small number of respondents only report annual fuel consumption, and all of it is reported in December.
Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.
Contains information from multiple raw inputs.

Additional Details#

Based on allocating net electricity generation and fuel consumption reported in the EIA-923 generation and generation_fuel tables to individual generators.

The net generation and fuel consumption allocation method PUDL employs begins with the following context of the originally reported EIA-860 and EIA-923 data:

The core_eia923__monthly_generation_fuel table is the authoritative source of information about how much generation and fuel consumption is attributable to an entire plant. This table has the most complete data coverage, but it is not the most granular data reported.
The core_eia923__monthly_generation table contains the most granular net generation data. It is reported at the plant_id_eia, generator_id and report_date level. This table includes only ~40% of the total MWhs reported in the core_eia923__monthly_generation_fuel table.
The core_eia923__monthly_boiler_fuel table contains the most granular fuel consumption data. It is reported at the boiler/prime mover/energy source level. This table includes only ~40% of the total MMBTUs reported in the core_eia923__monthly_generation_fuel table.
The core_eia860__scd_generators table provides an exhaustive list of all generators whose generation is being reported in the core_eia923__monthly_generation_fuel table.

In this table, PUDL aggregates the net generation and fuel consumption that has been allocated to the generator_id/energy_source_code/prime_mover_code level in the:ref:out_eia923__yearly_generation_fuel_by_generator_energy_source to the generator level.This process does not distinguish between primary and secondary energy_sources for generators. Net generation is allocated equally between energy source codes, so if a plant has multiple generators with the same prime_mover_code but different energy source codes the core_eia923__monthly_generation_fuel records will be associated similarly between these two generators. Allocated net generation will still be proportional to each generator’s net generation or capacity.

This table inherits canonicalized values for generators, plants, and utilities. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__yearly_generation_fuel_by_generator.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__yearly_generation_fuel_by_generator.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia923__yearly_generation_fuel_by_generator.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__yearly_generation_fuel_by_generator.parquet"
)

Columns#

Field Name	Type	Description
fuel_consumed_for_electricity_mmbtu	number	Total consumption of fuel to produce electricity, in physical unit, year to date.
fuel_consumed_mmbtu	number	Total consumption of fuel in physical unit, year to date. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production.
generator_id	string	Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string!
net_generation_mwh	number	Net electricity generation for the specified period in megawatt-hours (MWh).
plant_id_eia	integer	The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.
plant_id_pudl	integer	A manually assigned PUDL plant ID. May not be constant over time.
plant_name_eia	string	Plant name.
report_date	date	Date reported.
unit_id_pudl	integer	Dynamically assigned PUDL unit id. WARNING: This ID is not guaranteed to be static long term as the input data and algorithm may evolve over time.
utility_id_eia	integer	The EIA Utility Identification number.
utility_id_pudl	integer	A manually assigned PUDL utility ID. May not be stable over time.
utility_name_eia	string	The name of the utility.

out_eia923__yearly_generation_fuel_by_generator_energy_source#

Noticed any irregularities with this table? File a report.

Annual time series of of estimated net generation and fuel consumption associated with each combination of generator, energy source, and prime mover.

Most-recent data:: 2024
Processing:: Data is ready for use in analyses, but for practical reasons has not been denormalized and remains in narrow format.
Source:: EIA Form 923 – Power Plant Operations Report (Schedule 3)
Primary key:: report_date, plant_id_eia, generator_id, prime_mover_code, energy_source_code

Usage Warnings#

Contains estimated values.
This downscaling process used to create this table does not distinguish between primary and secondary energy_sources for generators (see below for implications).
Date column arbitrarily uses the first of the month.
A small number of respondents only report annual fuel consumption, and all of it is reported in December.
Contains information from multiple raw inputs.

Additional Details#

The net generation and fuel consumption allocation method PUDL employs begins with the following context of the originally reported EIA-860 and EIA-923 data:

The core_eia923__monthly_generation_fuel table is the authoritative source of information about how much generation and fuel consumption is attributable to an entire plant. This table has the most complete data coverage, but it is not the most granular data reported.
The core_eia923__monthly_generation table contains the most granular net generation data. It is reported at the plant_id_eia, generator_id and report_date level. This table includes only ~40% of the total MWhs reported in the core_eia923__monthly_generation_fuel table.
The core_eia923__monthly_boiler_fuel table contains the most granular fuel consumption data. It is reported at the boiler/prime mover/energy source level. This table includes only ~40% of the total MMBTUs reported in the core_eia923__monthly_generation_fuel table.
The core_eia860__scd_generators table provides an exhaustive list of all generators whose generation is being reported in the core_eia923__monthly_generation_fuel table.

In this table, PUDL has allocated the net electricity generation and fuel consumption from core_eia923__monthly_generation_fuel to the generator_id/energy_source_code/prime_mover_code level.

The allocation process entails generating a fraction for each record based on the net generation in the core_eia923__monthly_generation table and the capacity from the core_eia860__scd_generators table. This process does not distinguish between primary and secondary energy_sources for generators. Net generation is allocated equally between energy source codes, so if a plant has multiple generators with the same prime_mover_code but different energy source codes the core_eia923__monthly_generation_fuel records will be associated similarly between these two generators. Allocated net generation will still be proportional to each generator’s net generation or capacity.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__yearly_generation_fuel_by_generator_energy_source.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__yearly_generation_fuel_by_generator_energy_source.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia923__yearly_generation_fuel_by_generator_energy_source.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__yearly_generation_fuel_by_generator_energy_source.parquet"
)

Columns#

Field Name	Type	Description
energy_source_code	string	A 2-3 letter code indicating the energy source (e.g. fuel type) associated with the record.
energy_source_code_num	string	Name of the energy_source_code_N column that this energy source code was reported in for the generator referenced in the same record.
fuel_consumed_for_electricity_mmbtu	number	Total consumption of fuel to produce electricity, in physical unit, year to date.
fuel_consumed_mmbtu	number	Total consumption of fuel in physical unit, year to date. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production.
generator_id	string	Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string!
net_generation_mwh	number	Net electricity generation for the specified period in megawatt-hours (MWh).
plant_id_eia	integer	The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.
prime_mover_code	string	Code for the type of prime mover (e.g. CT, CG)
report_date	date	Date reported.

out_eia923__yearly_generation_fuel_by_generator_energy_source_owner#

Noticed any irregularities with this table? File a report.

Annual time series of of estimated net generation and fuel consumption for each generator, associated with each combination of generator, energy source, prime mover, and owner.

Most-recent data:: 2024
Processing:: Data is ready for use in analyses, but for practical reasons has not been denormalized and remains in narrow format.
Source:: EIA Form 923 – Power Plant Operations Report (Schedule 3)
Primary key:: report_date, plant_id_eia, generator_id, prime_mover_code, energy_source_code, utility_id_eia, ownership_record_type

Usage Warnings#

Contains estimated values.
This downscaling process used to create this table does not distinguish between primary and secondary energy_sources for generators (see below for implications).
Date column arbitrarily uses the first of the month.
Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.
Contains information from multiple raw inputs.

Additional Details#

The net generation and fuel consumption allocation method PUDL employs begins with the following context of the originally reported EIA-860 and EIA-923 data:

The core_eia923__monthly_generation_fuel table is the authoritative source of information about how much generation and fuel consumption is attributable to an entire plant. This table has the most complete data coverage, but it is not the most granular data reported.
The core_eia923__monthly_generation table contains the most granular net generation data. It is reported at the plant_id_eia, generator_id and report_date level. This table includes only ~40% of the total MWhs reported in the core_eia923__monthly_generation_fuel table.
The core_eia923__monthly_boiler_fuel table contains the most granular fuel consumption data. It is reported at the boiler/prime mover/energy source level. This table includes only ~40% of the total MMBTUs reported in the core_eia923__monthly_generation_fuel table.
The core_eia860__scd_generators table provides an exhaustive list of all generators whose generation is being reported in the core_eia923__monthly_generation_fuel table.

In this table, PUDL has scaled the net electricity generation and fuel consumption from core_eia923__monthly_generation_fuel by ownership from out_eia860__yearly_ownership. Note that the utility_id_eia in this table refers to the OWNER of the generator, not the operator.

This process does not distinguish between primary and secondary energy_sources for generators. Net generation is allocated equally between energy source codes, so if a plant has multiple generators with the same prime_mover_code but different energy source codes the core_eia923__monthly_generation_fuel records will be associated similarly between these two generators. Allocated net generation will still be proportional to each generator’s net generation or capacity.

This table inherits canonicalized values for generators and plants. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__yearly_generation_fuel_by_generator_energy_source_owner.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__yearly_generation_fuel_by_generator_energy_source_owner.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia923__yearly_generation_fuel_by_generator_energy_source_owner.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__yearly_generation_fuel_by_generator_energy_source_owner.parquet"
)

Columns#

Field Name	Type	Description
capacity_mw	number	Total installed (nameplate) capacity, in megawatts.
energy_source_code	string	A 2-3 letter code indicating the energy source (e.g. fuel type) associated with the record.
energy_source_code_num	string	Name of the energy_source_code_N column that this energy source code was reported in for the generator referenced in the same record.
fraction_owned	number	Proportion of generator ownership attributable to this utility.
fuel_consumed_for_electricity_mmbtu	number	Total consumption of fuel to produce electricity, in physical unit, year to date.
fuel_consumed_mmbtu	number	Total consumption of fuel in physical unit, year to date. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production.
generator_id	string	Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string!
net_generation_mwh	number	Net electricity generation for the specified period in megawatt-hours (MWh).
ownership_record_type	string	Whether each generator record is for one owner or represents a total of all ownerships.
plant_id_eia	integer	The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.
prime_mover_code	string	Code for the type of prime mover (e.g. CT, CG)
report_date	date	Date reported.
utility_id_eia	integer	The EIA Utility Identification number.

out_eia923__yearly_generation_fuel_combined#

Noticed any irregularities with this table? File a report.

Annual time series of generation and fuel consumption of all generation units.

Most-recent data:: 2024
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: EIA Form 923 – Power Plant Operations Report (Schedule 3)
Primary key:: plant_id_eia, report_date, prime_mover_code, energy_source_code

Usage Warnings#

Date column arbitrarily uses the first of the month.
A small number of respondents only report annual fuel consumption, and all of it is reported in December.
Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.

Additional Details#

Denormalized, combined data from the core_eia923__monthly_generation_fuel and core_eia923__monthly_generation_fuel_nuclear with nuclear generation aggregated from the nuclear generation unit level up to the plant prime mover level, so as to be compatible with fossil fuel generation data.

This data can’t be easily linked to individual boilers, generators, and generation units, but it is provides the most complete coverage of fuel consumption and electricity generation for the entire generation fleet. We use the primary fuels and prime movers reported for each generator along with their capacities to attribute fuel consumption and generation when it isn’t directly reported in the core_eia923__monthly_generation and core_eia923__monthly_boiler_fuel tables in order to calculate capacity factors, heat rates, and the marginal cost of electricity.

The table makes a distinction between all fuel consumed and fuel consumed for electricity generation because some units are also combined heat and power (CHP) units, and also provide high temperature process heat at the expense of net electricity generation.

This table inherits canonicalized values for plants and utilities. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__yearly_generation_fuel_combined.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__yearly_generation_fuel_combined.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia923__yearly_generation_fuel_combined.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__yearly_generation_fuel_combined.parquet"
)

Columns#

Field Name	Type	Description
data_maturity	string	Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.
energy_source_code	string	A 2-3 letter code indicating the energy source (e.g. fuel type) associated with the record.
fuel_consumed_for_electricity_mmbtu	number	Total consumption of fuel to produce electricity, in physical unit, year to date.
fuel_consumed_for_electricity_units	number	Consumption for electric generation of the fuel type in physical unit.
fuel_consumed_mmbtu	number	Total consumption of fuel in physical unit, year to date. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production.
fuel_consumed_units	number	Consumption of the fuel type in physical unit. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production.
fuel_mmbtu_per_unit	number	Heat content of the fuel in millions of Btus per physical unit.
fuel_type_code_pudl	string	Simplified fuel type code used in PUDL
net_generation_mwh	number	Net electricity generation for the specified period in megawatt-hours (MWh).
plant_id_eia	integer	The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.
plant_id_pudl	integer	A manually assigned PUDL plant ID. May not be constant over time.
plant_name_eia	string	Plant name.
prime_mover_code	string	Code for the type of prime mover (e.g. CT, CG)
report_date	date	Date reported.
utility_id_eia	integer	The EIA Utility Identification number.
utility_id_pudl	integer	A manually assigned PUDL utility ID. May not be stable over time.
utility_name_eia	string	The name of the utility.

out_eia930__hourly_aggregated_demand#

Noticed any irregularities with this table? File a report.

Hourly time series of aggregated balancing authority demand by EIA region, interconnect, and continental US.

Most-recent data:: 2026
Processing:: Data is ready for use in analyses, but for practical reasons has not been denormalized and remains in narrow format.
Source:: EIA Form 930 – Hourly and Daily Balancing Authority Operations Report
Primary key:: datetime_utc, aggregation_level, aggregation_group

Usage Warnings#

Some columns contain subtotals; use caution when choosing columns to aggregate.
This table is experimental and/or a work in progress and may change in the future.

Additional Details#

The spatial granularity of each record is indicated by aggregation_level.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia930__hourly_aggregated_demand.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia930__hourly_aggregated_demand.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia930__hourly_aggregated_demand.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia930__hourly_aggregated_demand.parquet"
)

Columns#

Field Name	Type	Description
aggregation_group	string	Label identifying a group of balancing authorities to be used in aggregating demand E.g. a region of the US or a whole interconnect.
aggregation_level	string	Indicates the spacial granularity of aggregated value.
datetime_utc	datetime	Date and time converted to Coordinated Universal Time (UTC).
demand_imputed_pudl_mwh	number	Electricity demand calculated by subtracting BA interchange from net generation, with outliers and missing values imputed in PUDL.

out_eia930__hourly_operations#

Noticed any irregularities with this table? File a report.

Hourly time series of balancing authority net generation, interchange, and demand with imputed demand.

Most-recent data:: 2026
Processing:: Data is ready for use in analyses, but for practical reasons has not been denormalized and remains in narrow format.
Source:: EIA Form 930 – Hourly and Daily Balancing Authority Operations Report
Primary key:: datetime_utc, balancing_authority_code_eia

Usage Warnings#

Contains rows where missing values were imputed.
This table is experimental and/or a work in progress and may change in the future.

Additional Details#

This table is based on core_eia930__hourly_operations, but adds imputed demand where the original data was missing or anomalous. Codes explaining why values have been imputed can be found in the core_pudl__codes_imputation_reasons table.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia930__hourly_operations.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia930__hourly_operations.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia930__hourly_operations.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia930__hourly_operations.parquet"
)

Columns#

Field Name	Type	Description
balancing_authority_code_eia	string	EIA short code identifying a balancing authority. May include Canadian and Mexican BAs.
datetime_utc	datetime	Date and time converted to Coordinated Universal Time (UTC).
demand_adjusted_mwh	number	Electricity demand adjusted by EIA to reflect non-physical commercial transfers through pseudo-ties and dynamic scheduling.
demand_forecast_mwh	number	Day ahead demand forecast.
demand_imputed_eia_mwh	number	Electricity demand calculated by subtracting BA interchange from net generation, with outliers and missing values imputed by EIA.
demand_imputed_pudl_mwh	number	Electricity demand calculated by subtracting BA interchange from net generation, with outliers and missing values imputed in PUDL.
demand_imputed_pudl_mwh_imputation_code	string	Code describing why a demand value was flagged for imputation.
demand_reported_mwh	number	Originally reported electricity demand, calculated by taking the net generation within the BA and subtracting the interchange with adjacent BAs.
interchange_adjusted_mwh	number	Energy interchange between adjacent balancing authorities, adjusted by EIA to reflect non-physical commercial transfers through pseudo-ties and dynamic scheduling.
interchange_imputed_eia_mwh	number	Energy interchange between adjacent balancing authorities, with outliers and missing values imputed by EIA.
interchange_reported_mwh	number	Original reported energy interchange between adjacent balancing authorities.
net_generation_adjusted_mwh	number	Reported net generation adjusted by EIA to reflect non-physical commercial transfers through pseudo-ties and dynamic scheduling.
net_generation_imputed_eia_mwh	number	Reported net generation with outlying values removed and missing values imputed by EIA.
net_generation_reported_mwh	number	Unaltered originally reported net generation for the specified period.

out_eia930__hourly_subregion_demand#

Noticed any irregularities with this table? File a report.

Hourly time series of balancing authority subregion demand with imputed demand.

Most-recent data:: 2026
Processing:: Data is ready for use in analyses, but for practical reasons has not been denormalized and remains in narrow format.
Source:: EIA Form 930 – Hourly and Daily Balancing Authority Operations Report
Primary key:: datetime_utc, balancing_authority_code_eia, balancing_authority_subregion_code_eia

Usage Warnings#

Contains rows where missing values were imputed.
This table is experimental and/or a work in progress and may change in the future.

Additional Details#

This table is based on core_eia930__hourly_subregion_demand, but adds imputed demand where the original data was missing or anomalous. Codes explaining why values have been imputed can be found in the core_pudl__codes_imputation_reasons table.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia930__hourly_subregion_demand.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia930__hourly_subregion_demand.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia930__hourly_subregion_demand.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia930__hourly_subregion_demand.parquet"
)

Columns#

Field Name	Type	Description
balancing_authority_code_eia	string	EIA short code identifying a balancing authority. May include Canadian and Mexican BAs.
balancing_authority_subregion_code_eia	string	Code identifying subregions of larger balancing authorities.
datetime_utc	datetime	Date and time converted to Coordinated Universal Time (UTC).
demand_imputed_pudl_mwh	number	Electricity demand calculated by subtracting BA interchange from net generation, with outliers and missing values imputed in PUDL.
demand_imputed_pudl_mwh_imputation_code	string	Code describing why a demand value was flagged for imputation.
demand_reported_mwh	number	Originally reported electricity demand, calculated by taking the net generation within the BA and subtracting the interchange with adjacent BAs.

out_eia__monthly_generators#

Noticed any irregularities with this table? File a report.

Monthly time series of all generator attributes including calculated capacity factor, heat rate, fuel cost per MMBTU and fuel cost per MWh.

Most-recent data:: 2026
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: EIA – Mix of multiple EIA Forms
Primary key:: report_date, plant_id_eia, generator_id

Usage Warnings#

Contains estimated values.
Due to coverage problems in other tables, some generators have heat rate estimates but not fuel cost estimates.
Not all columns are originally reported in or calculable from the input tables. Expect nulls.
Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.
Contains information from multiple raw inputs.

Additional Details#

This table includes all monthly attributes for all generators reported to EIA-860 and EIA-923.

To provide a complete picture of generator data, this table compiles data from many different EIA tables, including directly reported, imputed and calculated columns. We suggest using this table if you want to explore monthly attributes about generators and would rather use a more complete and denormalized table. If you are more interested in the originally reported values, we recommend searching for core_eia with the column you are most interested in.

The calculations of capacity factor, heat rate, fuel cost per MMBTU and fuel cost per MWh are based on the allocation of net generation reported on the basis of plant, prime mover and energy source to individual generators - which comes from this table: out_eia923__monthly_generation_fuel_by_generator.

Heat rates by generator-monthly are estimated by using allocated estimates for per-generator net generation and fuel consumption as well as the core_eia923__monthly_boiler_fuel table, which reports fuel consumed by boiler. Heat rates are necessary to estimate the amount of fuel consumed by a generation unit, and thus the fuel cost per MWh generated.

Plant specific fuel prices are taken from the core_eia923__fuel_receipts_costs table, which only has ~70% coverage, leading to some generators with heat rate estimates still lacking fuel cost estimates.

This table inherits canonicalized values for generators, plants, and utilities. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia__monthly_generators.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia__monthly_generators.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia__monthly_generators.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia__monthly_generators.parquet"
)

Columns#

Field Name	Type	Description
associated_combined_heat_power	boolean	Indicates whether the generator is associated with a combined heat and power system
balancing_authority_code_eia	string	EIA short code identifying a balancing authority. May include Canadian and Mexican BAs.
balancing_authority_name_eia	string	Name of the balancing authority.
bga_source	string	The source from where the unit_id_pudl is compiled. The unit_id_pudl comes directly from EIA 860, or string association (which looks at all the boilers and generators that are not associated with a unit and tries to find a matching string in the respective collection of boilers or generator), or from a unit connection (where the unit_id_eia is employed to find additional boiler generator connections).
bypass_heat_recovery	boolean	Can this generator operate while bypassing the heat recovery steam generator?
can_burn_multiple_fuels	boolean	Whether the generator can burn multiple fuels.
can_cofire_fuels	boolean	Whether the generator can co-fire fuels.
can_switch_oil_gas	boolean	Whether the generator can switch between oil and natural gas.
can_switch_when_operating	boolean	Indicates whether a fuel switching generator can switch fuels while operating.
capacity_factor	number	Fraction of potential generation that was actually reported for a plant part. Energy generated over time period / nameplate capacity * time period (hours/years/etc.).
capacity_mw	number	Total installed (nameplate) capacity, in megawatts.
carbon_capture	boolean	Indicates whether the generator uses carbon capture technology.
city	string	Name of the city.
county	string	County name.
current_planned_generator_operating_date	date	The most recently updated effective date on which the generator is scheduled to start operation
data_maturity	string	Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.
deliver_power_transgrid	boolean	Indicate whether the generator can deliver power to the transmission grid.
distributed_generation	boolean	Whether the generator is considered distributed generation
duct_burners	boolean	Indicates whether the unit has duct-burners for supplementary firing of the turbine exhaust gas
energy_source_1_transport_1	string	Primary mode of transport for energy source 1.
energy_source_1_transport_2	string	Secondary mode of transport for energy source 1.
energy_source_1_transport_3	string	Tertiary mode of transport for energy source 1.
energy_source_2_transport_1	string	Primary mode of transport for energy source 2.
energy_source_2_transport_2	string	Secondary mode of transport for energy source 2.
energy_source_2_transport_3	string	Tertiary mode of transport for energy source 2.
energy_source_code_1	string	The code representing the most predominant type of energy that fuels the generator.
energy_source_code_2	string	The code representing the second most predominant type of energy that fuels the generator
energy_source_code_3	string	The code representing the third most predominant type of energy that fuels the generator
energy_source_code_4	string	The code representing the fourth most predominant type of energy that fuels the generator
energy_source_code_5	string	The code representing the fifth most predominant type of energy that fuels the generator
energy_source_code_6	string	The code representing the sixth most predominant type of energy that fuels the generator
energy_storage_capacity_mwh	number	Energy storage capacity in MWh (e.g. for batteries).
ferc_qualifying_facility	boolean	Indicates whether or not a generator is a qualifying FERC cogeneration facility.
fluidized_bed_tech	boolean	Indicates whether the generator uses fluidized bed technology
fuel_cost_per_mmbtu	number	Average fuel cost per MMBTU of heat content in nominal USD.
fuel_cost_per_mmbtu_source	string	Indicates the source of the values in the fuel_cost_per_mmbtu column. The fuel cost either comes directly from the EIA forms (original), was filled in from the EIA’s API using state-level averages (eiaapi), was filled in using a rolling average (rolling_avg) or When the records get aggregated together and contain multiple sources (mixed).
fuel_cost_per_mwh	number	Derived from MCOE, a unit level value. Average fuel cost per MWh of heat content in nominal USD.
fuel_type_code_pudl	string	Simplified fuel type code used in PUDL
fuel_type_count	integer	A count of how many different simple energy sources there are associated with a generator.
generator_id	string	Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string!
generator_operating_date	date	Date the generator began commercial operation. If harvested values are inconsistent, we default to using the most recently reported date.
generator_retirement_date	date	Date of the scheduled or effected retirement of the generator.
latitude	number	Latitude of the plant’s location, in degrees.
longitude	number	Longitude of the plant’s location, in degrees.
minimum_load_mw	number	The minimum load at which the generator can operate at continuosuly.
nameplate_power_factor	number	The nameplate power factor of the generator.
net_capacity_mwdc	number	Generation capacity in megawatts of direct current that is subject to a net metering agreement. Typically used for behind-the-meter solar PV.
net_generation_mwh	number	Net electricity generation for the specified period in megawatt-hours (MWh).
operational_status	string	The operating status of the asset. For generators this is based on which tab the generator was listed in in EIA 860.
operational_status_code	string	The operating status of the asset.
original_planned_generator_operating_date	date	The date the generator was originally scheduled to be operational
other_combustion_tech	boolean	Indicates whether the generator uses other combustion technologies
other_modifications_date	date	Planned effective date that the generator is scheduled to enter commercial operation after any other planned modification is complete.
other_planned_modifications	boolean	Indicates whether there are there other modifications planned for the generator.
owned_by_non_utility	boolean	Whether any part of generator is owned by a nonutilty
ownership_code	string	Identifies the ownership for each generator.
planned_derate_date	date	Planned effective month that the generator is scheduled to enter operation after the derate modification.
planned_energy_source_code_1	string	New energy source code for the planned repowered generator.
planned_generator_retirement_date	date	Planned effective date of the scheduled retirement of the generator.
planned_modifications	boolean	Indicates whether there are any planned capacity uprates/derates, repowering, other modifications, or generator retirements scheduled for the next 5 years.
planned_net_summer_capacity_derate_mw	number	Decrease in summer capacity expected to be realized from the derate modification to the equipment.
planned_net_summer_capacity_uprate_mw	number	Increase in summer capacity expected to be realized from the modification to the equipment.
planned_net_winter_capacity_derate_mw	number	Decrease in winter capacity expected to be realized from the derate modification to the equipment.
planned_net_winter_capacity_uprate_mw	number	Increase in winter capacity expected to be realized from the uprate modification to the equipment.
planned_new_capacity_mw	number	The expected new namplate capacity for the generator.
planned_new_prime_mover_code	string	New prime mover for the planned repowered generator.
planned_repower_date	date	Planned effective date that the generator is scheduled to enter operation after the repowering is complete.
planned_uprate_date	date	Planned effective date that the generator is scheduled to enter operation after the uprate modification.
plant_id_eia	integer	The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.
plant_id_pudl	integer	A manually assigned PUDL plant ID. May not be constant over time.
plant_name_eia	string	Plant name.
previously_canceled	boolean	Indicates whether the generator was previously reported as indefinitely postponed or canceled
prime_mover_code	string	Code for the type of prime mover (e.g. CT, CG)
pulverized_coal_tech	boolean	Indicates whether the generator uses pulverized coal technology
reactive_power_output_mvar	number	Reactive Power Output (MVAr)
report_date	date	Date reported.
rto_iso_lmp_node_id	string	The designation used to identify the price node in RTO/ISO Locational Marginal Price reports
rto_iso_location_wholesale_reporting_id	string	The designation used to report the specific location of the wholesale sales transactions to FERC for the Electric Quarterly Report
solid_fuel_gasification	boolean	Indicates whether the generator is part of a solid fuel gasification system
startup_source_code_1	string	The code representing the first, second, third or fourth start-up and flame stabilization energy source used by the combustion unit(s) associated with this generator.
startup_source_code_2	string	The code representing the first, second, third or fourth start-up and flame stabilization energy source used by the combustion unit(s) associated with this generator.
startup_source_code_3	string	The code representing the first, second, third or fourth start-up and flame stabilization energy source used by the combustion unit(s) associated with this generator.
startup_source_code_4	string	The code representing the first, second, third or fourth start-up and flame stabilization energy source used by the combustion unit(s) associated with this generator.
state	string	Two letter US state abbreviation.
stoker_tech	boolean	Indicates whether the generator uses stoker technology
street_address	string	Physical street address.
subcritical_tech	boolean	Indicates whether the generator uses subcritical technology
summer_capacity_estimate	boolean	Whether the summer capacity value was an estimate
summer_capacity_mw	number	The net summer capacity.
summer_estimated_capability_mw	number	EIA estimated summer capacity (in MWh).
supercritical_tech	boolean	Indicates whether the generator uses supercritical technology
synchronized_transmission_grid	boolean	Indicates whether standby generators (SB status) can be synchronized to the grid.
technology_description	string	High level description of the technology used by the generator to produce electricity.
time_cold_shutdown_full_load_code	string	The minimum amount of time required to bring the unit to full load from shutdown.
timezone	string	IANA timezone name
topping_bottoming_code	string	If the generator is associated with a combined heat and power system, indicates whether the generator is part of a topping cycle or a bottoming cycle
total_fuel_cost	number	Total annual reported fuel costs for the plant part. Includes costs from all fuels.
total_mmbtu	number	Total annual heat content of fuel consumed by a plant part record in the plant parts list.
turbines_inverters_hydrokinetics	integer	Number of wind turbines, or hydrokinetic buoys.
turbines_num	integer	Number of wind turbines, or hydrokinetic buoys.
ultrasupercritical_tech	boolean	Indicates whether the generator uses ultra-supercritical technology
unit_heat_rate_mmbtu_per_mwh	number	Fuel content per unit of electricity generated. Coming from MCOE calculation.
unit_id_pudl	integer	Dynamically assigned PUDL unit id. WARNING: This ID is not guaranteed to be static long term as the input data and algorithm may evolve over time.
uprate_derate_completed_date	date	The date when the uprate or derate was completed.
uprate_derate_during_year	boolean	Was an uprate or derate completed on this generator during the reporting year?
utility_id_eia	integer	The EIA Utility Identification number.
utility_id_pudl	integer	A manually assigned PUDL utility ID. May not be stable over time.
utility_name_eia	string	The name of the utility.
winter_capacity_estimate	boolean	Whether the winter capacity value was an estimate
winter_capacity_mw	number	The net winter capacity.
winter_estimated_capability_mw	number	EIA estimated winter capacity (in MWh).
zip_code	string	Five digit US Zip Code.

out_eia__yearly_assn_plant_parts_plant_gen#

Noticed any irregularities with this table? File a report.

Annual Association table providing connections between plant parts (identified as record_id_eia) and ‘plant_gen’ records (identified as record_id_eia_plant_gen) from the out_eia__yearly_plant_parts table.

Most-recent data:: 2026
Processing:: Data is ready for use in analyses, but for practical reasons has not been denormalized and remains in narrow format.
Source:: EIA – Mix of multiple EIA Forms
Primary key:: record_id_eia, record_id_eia_plant_gen

Usage Warnings#

Some columns contain subtotals; use caution when choosing columns to aggregate.
Contains information from multiple raw inputs.

Additional Details#

The purpose of this table is to help determine what generator records are associated with every plant part record. The plant part records have a one to many relationship to the associated ‘plant_gen’ records. The ‘plant_gen’ records are also included in the plant part records, so for these records there is a one to one relationship and the record_id_eia and the record_id_eia_plant_gen are the same.

All of the columns in this table that have a suffix of ‘_plant_gen’ are attributes of the record_id_eia_plant_gen record.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia__yearly_assn_plant_parts_plant_gen.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia__yearly_assn_plant_parts_plant_gen.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia__yearly_assn_plant_parts_plant_gen.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia__yearly_assn_plant_parts_plant_gen.parquet"
)

Columns#

Field Name	Type	Description
energy_source_code_1_plant_gen	string	Code representing the most predominant type of energy that fuels the record_id_eia_plant_gen’s generator.
ferc1_generator_agg_id_plant_gen	integer	ID dynamically assigned by PUDL to EIA records with multiple matches to a single FERC ID in the FERC-EIA manual matching process. This ID is associated with the record_id_eia_plant_gen record. It depends on ferc1_generator_agg_id, which has not been updated since 2020, but only affects a couple hundred records total across all years.
ferc_acct_name_plant_gen	string	Name of FERC account, derived from technology description and prime mover code. This name is associated with the record_id_eia_plant_gen record.
generator_id_plant_gen	string	Generator ID of the record_id_eia_plant_gen record. This is usually numeric, but sometimes includes letters. Make sure you treat it as a string!
generator_operating_year_plant_gen	integer	The year an associated plant_gen’s generator went into service.
generators_number	integer	The number of generators associated with each `record_id_eia`.
operational_status_pudl_plant_gen	string	The operating status of the asset using PUDL categories of the record_id_eia_plant_gen record .
ownership_record_type	string	Whether each generator record is for one owner or represents a total of all ownerships.
plant_id_eia	integer	The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.
prime_mover_code_plant_gen	string	Code for the type of prime mover (e.g. CT, CG) associated with the record_id_eia_plant_gen.
record_id_eia	string	Identifier for EIA plant parts analysis records.
record_id_eia_plant_gen	string	Identifier for EIA plant parts analysis records which is at the plant_part level of plant_gen - meaning each record pertains to one generator.
report_date	date	Date reported.
technology_description_plant_gen	string	High level description of the technology used by the record_id_eia_plant_gen’s generator to produce electricity.
unit_id_pudl_plant_gen	integer	Dynamically assigned PUDL unit id of the record_id_eia_plant_gen. WARNING: This ID is not guaranteed to be static long term as the input data and algorithm may evolve over time.
utility_id_eia	integer	The EIA Utility Identification number.

out_eia__yearly_boilers#

Noticed any irregularities with this table? File a report.

Annual time series of all EIA boiler attributes.

Most-recent data:: 2025
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: EIA – Mix of multiple EIA Forms
Primary key:: plant_id_eia, boiler_id, report_date

Usage Warnings#

Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.
Contains information from multiple raw inputs.

Additional Details#

This table inherits canonicalized values for boilers, plants, and utilities. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia__yearly_boilers.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia__yearly_boilers.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia__yearly_boilers.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia__yearly_boilers.parquet"
)

Columns#

Field Name	Type	Description
air_flow_100pct_load_cubic_feet_per_minute	number	Total air flow including excess air at 100 percent load, reported at standard temperature and pressure (i.e. 68 F and one atmosphere pressure).
boiler_fuel_code_1	string	The code representing the most predominant type of energy that fuels the boiler.
boiler_fuel_code_2	string	The code representing the second most predominant type of energy that fuels the boiler.
boiler_fuel_code_3	string	The code representing the third most predominant type of energy that fuels the boiler.
boiler_fuel_code_4	string	The code representing the fourth most predominant type of energy that fuels the boiler.
boiler_id	string	Alphanumeric boiler ID.
boiler_manufacturer	string	Name of boiler manufacturer.
boiler_manufacturer_code	string	EIA short code for boiler manufacturer.
boiler_operating_date	date	Date the boiler began or is planned to begin commercial operation.
boiler_retirement_date	date	Date of the scheduled or effected retirement of the boiler.
boiler_status	string	EIA short code identifying boiler operational status.
boiler_type	string	EIA short code indicating the standards under which the boiler is operating as described in the U.S. EPA regulation under 40 CFR.
city	string	Name of the city.
compliance_year_mercury	integer	Year boiler was or is expected to be in compliance with federal, state and/or local regulations for mercury emissions.
compliance_year_nox	integer	Year boiler was or is expected to be in compliance with federal, state and/or local regulations for nitrogen oxide emissions.
compliance_year_particulate	integer	Year boiler was or is expected to be in compliance with federal, state and/or local regulations for particulate matter emissions.
compliance_year_so2	integer	Year boiler was or is expected to be in compliance with federal, state and/or local regulations for sulfur dioxide emissions.
county	string	County name.
data_maturity	string	Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.
efficiency_100pct_load	number	Boiler efficiency percentage when burning at 100 percent load to the nearest 0.1 percent.
efficiency_50pct_load	number	Boiler efficiency percentage when burning at 50 percent load to the nearest 0.1 percent.
firing_rate_using_coal_tons_per_hour	number	Design firing rate at maximum continuous steam flow for coal to the nearest 0.1 ton per hour.
firing_rate_using_gas_mcf_per_hour	number	Design firing rate at maximum continuous steam flow for gas to the nearest 0.1 cubic feet per hour.
firing_rate_using_oil_bbls_per_hour	number	Design firing rate at maximum continuous steam flow for pet coke to the nearest 0.1 barrels per hour.
firing_rate_using_other_fuels	number	Design firing rate at maximum continuous steam flow for energy sources other than coal, petroleum, or natural gas.
firing_type_1	string	EIA short code indicating the type of firing used by this boiler.
firing_type_2	string	EIA short code indicating the type of firing used by this boiler.
firing_type_3	string	EIA short code indicating the type of firing used by this boiler.
fly_ash_reinjection	boolean	Indicates whether the boiler is capable of re-injecting fly ash.
hrsg	boolean	indicates if the boiler is a heat recovery steam generator (HRSG).
latitude	number	Latitude of the plant’s location, in degrees.
longitude	number	Longitude of the plant’s location, in degrees.
max_steam_flow_lbs_per_hour	number	Maximum continuous steam flow at 100 percent load.
mercury_control_existing_strategy_1	string	Existing strategy to comply with the most stringent mercury regulation.
mercury_control_existing_strategy_2	string	Existing strategy to comply with the most stringent mercury regulation.
mercury_control_existing_strategy_3	string	Existing strategy to comply with the most stringent mercury regulation.
mercury_control_existing_strategy_4	string	Existing strategy to comply with the most stringent mercury regulation.
mercury_control_existing_strategy_5	string	Existing strategy to comply with the most stringent mercury regulation.
mercury_control_existing_strategy_6	string	Existing strategy to comply with the most stringent mercury regulation.
mercury_control_proposed_strategy_1	string	Proposed strategy to comply with the most stringent mercury regulation.
mercury_control_proposed_strategy_2	string	Proposed strategy to comply with the most stringent mercury regulation.
mercury_control_proposed_strategy_3	string	Proposed strategy to comply with the most stringent mercury regulation.
new_source_review	boolean	Indicates whether the boiler is subject to New Source Review requirements.
new_source_review_date	date	Month of issued New Source Review permit.
new_source_review_permit	string	New Source Review permit number.
nox_control_existing_caaa_compliance_strategy_1	string	Existing strategies to meet the nitrogen oxide requirements of Title IV of the Clean Air Act Amendment of 1990.
nox_control_existing_caaa_compliance_strategy_2	string	Existing strategies to meet the nitrogen oxide requirements of Title IV of the Clean Air Act Amendment of 1990.
nox_control_existing_caaa_compliance_strategy_3	string	Existing strategies to meet the nitrogen oxide requirements of Title IV of the Clean Air Act Amendment of 1990.
nox_control_existing_strategy_1	string	Existing strategy to comply with the most stringent nitrogen oxide regulation.
nox_control_existing_strategy_2	string	Existing strategy to comply with the most stringent nitrogen oxide regulation.
nox_control_existing_strategy_3	string	Existing strategy to comply with the most stringent nitrogen oxide regulation.
nox_control_manufacturer	string	Name of nitrogen oxide control manufacturer.
nox_control_manufacturer_code	string	Code indicating the nitrogen oxide control burner manufacturer.
nox_control_out_of_compliance_strategy_1	string	If boiler is not in compliance with nitrogen oxide regulations, strategy for compliance.
nox_control_out_of_compliance_strategy_2	string	If boiler is not in compliance with nitrogen oxide regulations, strategy for compliance.
nox_control_out_of_compliance_strategy_3	string	If boiler is not in compliance with nitrogen oxide regulations, strategy for compliance.
nox_control_planned_caaa_compliance_strategy_1	string	Planned strategies to meet the nitrogen oxide requirements of Title IV of the Clean Air Act Amendment of 1990.
nox_control_planned_caaa_compliance_strategy_2	string	Planned strategies to meet the nitrogen oxide requirements of Title IV of the Clean Air Act Amendment of 1990.
nox_control_planned_caaa_compliance_strategy_3	string	Planned strategies to meet the nitrogen oxide requirements of Title IV of the Clean Air Act Amendment of 1990.
nox_control_proposed_strategy_1	string	Proposed strategy to comply with the most stringent nitrogen oxide regulation.
nox_control_proposed_strategy_2	string	Proposed strategy to comply with the most stringent nitrogen oxide regulation.
nox_control_proposed_strategy_3	string	Proposed strategy to comply with the most stringent nitrogen oxide regulation.
nox_control_status_code	string	Nitrogen oxide control status code.
particulate_control_out_of_compliance_strategy_1	string	If boiler is not in compliance with particulate matter regulations, strategy for compliance.
particulate_control_out_of_compliance_strategy_2	string	If boiler is not in compliance with particulate matter regulations, strategy for compliance.
particulate_control_out_of_compliance_strategy_3	string	If boiler is not in compliance with particulate matter regulations, strategy for compliance.
plant_id_eia	integer	The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.
plant_id_pudl	integer	A manually assigned PUDL plant ID. May not be constant over time.
plant_name_eia	string	Plant name.
regulation_mercury	string	Most stringent type of statute or regulation code under which the boiler is operating for mercury control standards.
regulation_nox	string	EIA short code for most stringent type of statute or regulation code under which the boiler is operating for nitrogen oxide control standards.
regulation_particulate	string	EIA short code for most stringent type of statute or regulation code under which the boiler is operating for particulate matter control standards.
regulation_so2	string	EIA short code for most stringent type of statute or regulation code under which the boiler is operating for sulfur dioxide control standards.
report_date	date	Date reported.
so2_control_existing_caaa_compliance_strategy_1	string	Existing strategies to meet the sulfur dioxide requirements of Title IV of the Clean Air Act Amendment of 1990.
so2_control_existing_caaa_compliance_strategy_2	string	Existing strategies to meet the sulfur dioxide requirements of Title IV of the Clean Air Act Amendment of 1990.
so2_control_existing_caaa_compliance_strategy_3	string	Existing strategies to meet the sulfur dioxide requirements of Title IV of the Clean Air Act Amendment of 1990.
so2_control_existing_strategy_1	string	Existing strategy to comply with the most stringent sulfur dioxide regulation.
so2_control_existing_strategy_2	string	Existing strategy to comply with the most stringent sulfur dioxide regulation.
so2_control_existing_strategy_3	string	Existing strategy to comply with the most stringent sulfur dioxide regulation.
so2_control_out_of_compliance_strategy_1	string	If boiler is not in compliance with sulfur dioxide regulations, strategy for compliance.
so2_control_out_of_compliance_strategy_2	string	If boiler is not in compliance with sulfur dioxide regulations, strategy for compliance.
so2_control_out_of_compliance_strategy_3	string	If boiler is not in compliance with sulfur dioxide regulations, strategy for compliance.
so2_control_planned_caaa_compliance_strategy_1	string	Planned strategies to meet the sulfur dioxide requirements of Title IV of the Clean Air Act Amendment of 1990.
so2_control_planned_caaa_compliance_strategy_2	string	Planned strategies to meet the sulfur dioxide requirements of Title IV of the Clean Air Act Amendment of 1990.
so2_control_planned_caaa_compliance_strategy_3	string	Planned strategies to meet the sulfur dioxide requirements of Title IV of the Clean Air Act Amendment of 1990.
so2_control_proposed_strategy_1	string	Proposed strategy to comply with the most stringent sulfur dioxide regulation.
so2_control_proposed_strategy_2	string	Proposed strategy to comply with the most stringent sulfur dioxide regulation.
so2_control_proposed_strategy_3	string	Proposed strategy to comply with the most stringent sulfur dioxide regulation.
standard_nox_rate	number	Numeric value for the unit of measurement specified for nitrogen oxide.
standard_particulate_rate	number	Numeric value for the unit of measurement specified for particulate matter.
standard_so2_percent_scrubbed	number	The percent of sulfur dioxide to be scrubbed specified by the most stringent sulfur dioxide regulation.
standard_so2_rate	number	Numeric value for the unit of measurement specified for sulfur dioxide.
state	string	Two letter US state abbreviation.
street_address	string	Physical street address.
timezone	string	IANA timezone name
turndown_ratio	number	The turndown ratio for the boiler.
unit_id_pudl	integer	Dynamically assigned PUDL unit id. WARNING: This ID is not guaranteed to be static long term as the input data and algorithm may evolve over time.
unit_nox	string	Numeric value for the unit of measurement specified for nitrogen oxide.
unit_particulate	string	Numeric value for the unit of measurement specified for particulate matter.
unit_so2	string	Numeric value for the unit of measurement specified for sulfur dioxide.
utility_id_eia	integer	The EIA Utility Identification number.
utility_id_pudl	integer	A manually assigned PUDL utility ID. May not be stable over time.
utility_name_eia	string	The name of the utility.
waste_heat_input_mmbtu_per_hour	number	Design waste-heat input rate at maximum continuous steam flow where a waste-heat boiler is a boiler that receives all or a substantial portion of its energy input from the noncumbustible exhaust gases of a separate fuel-burning process (MMBTU per hour).
wet_dry_bottom	string	Wet or Dry Bottom where Wet Bottom is defined as slag tanks that are installed at furnace throat to contain and remove molten ash from the furnace, and Dry Bottom is defined as having no slag tanks at furnace throat area, throat area is clear, and bottom ash drops through throat to bottom ash water hoppers.
zip_code	string	Five digit US Zip Code.

out_eia__yearly_generators#

Noticed any irregularities with this table? File a report.

Annual time series of all generator attributes including calculated capacity factor, heat rate, fuel cost per MMBTU and fuel cost per MWh.

Most-recent data:: 2026
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: EIA – Mix of multiple EIA Forms
Primary key:: report_date, plant_id_eia, generator_id

Usage Warnings#

Contains estimated values.
Due to coverage problems in other tables, some generators have heat rate estimates but not fuel cost estimates.
Not all columns are originally reported in or calculable from the input tables. Expect nulls.
Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.
Contains information from multiple raw inputs.

Additional Details#

This table includes all yearly attributes for all generators reported to EIA-860 and EIA-923.

To provide a complete picture of generator data, this table compiles data from many different EIA tables, including directly reported, imputed and calculated columns. We suggest using this table if you want to explore yearly attributes about generators and would rather use a more complete and denormalized table. If you are more interested in the originally reported values, we recommend searching for core_eia with the column you are most interested in.

The calculations of capacity factor, heat rate, fuel cost per MMBTU and fuel cost per MWh are based on the allocation of net generation reported on the basis of plant, prime mover and energy source to individual generators - which comes from this table: out_eia923__yearly_generation_fuel_by_generator.

Heat rates by generator-yearly are estimated by using allocated estimates for per-generator net generation and fuel consumption as well as the core_eia923__monthly_boiler_fuel table, which reports fuel consumed by boiler. Heat rates are necessary to estimate the amount of fuel consumed by a generation unit, and thus the fuel cost per MWh generated.

Plant specific fuel prices are taken from the core_eia923__fuel_receipts_costs table, which only has ~70% coverage, leading to some generators with heat rate estimates still lacking fuel cost estimates.

This table inherits canonicalized values for generators, plants, and utilities. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia__yearly_generators.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia__yearly_generators.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia__yearly_generators.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia__yearly_generators.parquet"
)

Columns#

Field Name	Type	Description
associated_combined_heat_power	boolean	Indicates whether the generator is associated with a combined heat and power system
balancing_authority_code_eia	string	EIA short code identifying a balancing authority. May include Canadian and Mexican BAs.
balancing_authority_name_eia	string	Name of the balancing authority.
bga_source	string	The source from where the unit_id_pudl is compiled. The unit_id_pudl comes directly from EIA 860, or string association (which looks at all the boilers and generators that are not associated with a unit and tries to find a matching string in the respective collection of boilers or generator), or from a unit connection (where the unit_id_eia is employed to find additional boiler generator connections).
bypass_heat_recovery	boolean	Can this generator operate while bypassing the heat recovery steam generator?
can_burn_multiple_fuels	boolean	Whether the generator can burn multiple fuels.
can_cofire_fuels	boolean	Whether the generator can co-fire fuels.
can_switch_oil_gas	boolean	Whether the generator can switch between oil and natural gas.
can_switch_when_operating	boolean	Indicates whether a fuel switching generator can switch fuels while operating.
capacity_factor	number	Fraction of potential generation that was actually reported for a plant part. Energy generated over time period / nameplate capacity * time period (hours/years/etc.).
capacity_mw	number	Total installed (nameplate) capacity, in megawatts.
carbon_capture	boolean	Indicates whether the generator uses carbon capture technology.
city	string	Name of the city.
county	string	County name.
current_planned_generator_operating_date	date	The most recently updated effective date on which the generator is scheduled to start operation
data_maturity	string	Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.
deliver_power_transgrid	boolean	Indicate whether the generator can deliver power to the transmission grid.
distributed_generation	boolean	Whether the generator is considered distributed generation
duct_burners	boolean	Indicates whether the unit has duct-burners for supplementary firing of the turbine exhaust gas
energy_source_1_transport_1	string	Primary mode of transport for energy source 1.
energy_source_1_transport_2	string	Secondary mode of transport for energy source 1.
energy_source_1_transport_3	string	Tertiary mode of transport for energy source 1.
energy_source_2_transport_1	string	Primary mode of transport for energy source 2.
energy_source_2_transport_2	string	Secondary mode of transport for energy source 2.
energy_source_2_transport_3	string	Tertiary mode of transport for energy source 2.
energy_source_code_1	string	The code representing the most predominant type of energy that fuels the generator.
energy_source_code_2	string	The code representing the second most predominant type of energy that fuels the generator
energy_source_code_3	string	The code representing the third most predominant type of energy that fuels the generator
energy_source_code_4	string	The code representing the fourth most predominant type of energy that fuels the generator
energy_source_code_5	string	The code representing the fifth most predominant type of energy that fuels the generator
energy_source_code_6	string	The code representing the sixth most predominant type of energy that fuels the generator
energy_storage_capacity_mwh	number	Energy storage capacity in MWh (e.g. for batteries).
ferc_qualifying_facility	boolean	Indicates whether or not a generator is a qualifying FERC cogeneration facility.
fluidized_bed_tech	boolean	Indicates whether the generator uses fluidized bed technology
fuel_cost_per_mmbtu	number	Average fuel cost per MMBTU of heat content in nominal USD.
fuel_cost_per_mmbtu_source	string	Indicates the source of the values in the fuel_cost_per_mmbtu column. The fuel cost either comes directly from the EIA forms (original), was filled in from the EIA’s API using state-level averages (eiaapi), was filled in using a rolling average (rolling_avg) or When the records get aggregated together and contain multiple sources (mixed).
fuel_cost_per_mwh	number	Derived from MCOE, a unit level value. Average fuel cost per MWh of heat content in nominal USD.
fuel_type_code_pudl	string	Simplified fuel type code used in PUDL
fuel_type_count	integer	A count of how many different simple energy sources there are associated with a generator.
generator_id	string	Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string!
generator_operating_date	date	Date the generator began commercial operation. If harvested values are inconsistent, we default to using the most recently reported date.
generator_retirement_date	date	Date of the scheduled or effected retirement of the generator.
latitude	number	Latitude of the plant’s location, in degrees.
longitude	number	Longitude of the plant’s location, in degrees.
minimum_load_mw	number	The minimum load at which the generator can operate at continuosuly.
nameplate_power_factor	number	The nameplate power factor of the generator.
net_capacity_mwdc	number	Generation capacity in megawatts of direct current that is subject to a net metering agreement. Typically used for behind-the-meter solar PV.
net_generation_mwh	number	Net electricity generation for the specified period in megawatt-hours (MWh).
operational_status	string	The operating status of the asset. For generators this is based on which tab the generator was listed in in EIA 860.
operational_status_code	string	The operating status of the asset.
original_planned_generator_operating_date	date	The date the generator was originally scheduled to be operational
other_combustion_tech	boolean	Indicates whether the generator uses other combustion technologies
other_modifications_date	date	Planned effective date that the generator is scheduled to enter commercial operation after any other planned modification is complete.
other_planned_modifications	boolean	Indicates whether there are there other modifications planned for the generator.
owned_by_non_utility	boolean	Whether any part of generator is owned by a nonutilty
ownership_code	string	Identifies the ownership for each generator.
planned_derate_date	date	Planned effective month that the generator is scheduled to enter operation after the derate modification.
planned_energy_source_code_1	string	New energy source code for the planned repowered generator.
planned_generator_retirement_date	date	Planned effective date of the scheduled retirement of the generator.
planned_modifications	boolean	Indicates whether there are any planned capacity uprates/derates, repowering, other modifications, or generator retirements scheduled for the next 5 years.
planned_net_summer_capacity_derate_mw	number	Decrease in summer capacity expected to be realized from the derate modification to the equipment.
planned_net_summer_capacity_uprate_mw	number	Increase in summer capacity expected to be realized from the modification to the equipment.
planned_net_winter_capacity_derate_mw	number	Decrease in winter capacity expected to be realized from the derate modification to the equipment.
planned_net_winter_capacity_uprate_mw	number	Increase in winter capacity expected to be realized from the uprate modification to the equipment.
planned_new_capacity_mw	number	The expected new namplate capacity for the generator.
planned_new_prime_mover_code	string	New prime mover for the planned repowered generator.
planned_repower_date	date	Planned effective date that the generator is scheduled to enter operation after the repowering is complete.
planned_uprate_date	date	Planned effective date that the generator is scheduled to enter operation after the uprate modification.
plant_id_eia	integer	The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.
plant_id_pudl	integer	A manually assigned PUDL plant ID. May not be constant over time.
plant_name_eia	string	Plant name.
previously_canceled	boolean	Indicates whether the generator was previously reported as indefinitely postponed or canceled
prime_mover_code	string	Code for the type of prime mover (e.g. CT, CG)
pulverized_coal_tech	boolean	Indicates whether the generator uses pulverized coal technology
reactive_power_output_mvar	number	Reactive Power Output (MVAr)
report_date	date	Date reported.
rto_iso_lmp_node_id	string	The designation used to identify the price node in RTO/ISO Locational Marginal Price reports
rto_iso_location_wholesale_reporting_id	string	The designation used to report the specific location of the wholesale sales transactions to FERC for the Electric Quarterly Report
solid_fuel_gasification	boolean	Indicates whether the generator is part of a solid fuel gasification system
startup_source_code_1	string	The code representing the first, second, third or fourth start-up and flame stabilization energy source used by the combustion unit(s) associated with this generator.
startup_source_code_2	string	The code representing the first, second, third or fourth start-up and flame stabilization energy source used by the combustion unit(s) associated with this generator.
startup_source_code_3	string	The code representing the first, second, third or fourth start-up and flame stabilization energy source used by the combustion unit(s) associated with this generator.
startup_source_code_4	string	The code representing the first, second, third or fourth start-up and flame stabilization energy source used by the combustion unit(s) associated with this generator.
state	string	Two letter US state abbreviation.
stoker_tech	boolean	Indicates whether the generator uses stoker technology
street_address	string	Physical street address.
subcritical_tech	boolean	Indicates whether the generator uses subcritical technology
summer_capacity_estimate	boolean	Whether the summer capacity value was an estimate
summer_capacity_mw	number	The net summer capacity.
summer_estimated_capability_mw	number	EIA estimated summer capacity (in MWh).
supercritical_tech	boolean	Indicates whether the generator uses supercritical technology
synchronized_transmission_grid	boolean	Indicates whether standby generators (SB status) can be synchronized to the grid.
technology_description	string	High level description of the technology used by the generator to produce electricity.
time_cold_shutdown_full_load_code	string	The minimum amount of time required to bring the unit to full load from shutdown.
timezone	string	IANA timezone name
topping_bottoming_code	string	If the generator is associated with a combined heat and power system, indicates whether the generator is part of a topping cycle or a bottoming cycle
total_fuel_cost	number	Total annual reported fuel costs for the plant part. Includes costs from all fuels.
total_mmbtu	number	Total annual heat content of fuel consumed by a plant part record in the plant parts list.
turbines_inverters_hydrokinetics	integer	Number of wind turbines, or hydrokinetic buoys.
turbines_num	integer	Number of wind turbines, or hydrokinetic buoys.
ultrasupercritical_tech	boolean	Indicates whether the generator uses ultra-supercritical technology
unit_heat_rate_mmbtu_per_mwh	number	Fuel content per unit of electricity generated. Coming from MCOE calculation.
unit_id_pudl	integer	Dynamically assigned PUDL unit id. WARNING: This ID is not guaranteed to be static long term as the input data and algorithm may evolve over time.
uprate_derate_completed_date	date	The date when the uprate or derate was completed.
uprate_derate_during_year	boolean	Was an uprate or derate completed on this generator during the reporting year?
utility_id_eia	integer	The EIA Utility Identification number.
utility_id_pudl	integer	A manually assigned PUDL utility ID. May not be stable over time.
utility_name_eia	string	The name of the utility.
winter_capacity_estimate	boolean	Whether the winter capacity value was an estimate
winter_capacity_mw	number	The net winter capacity.
winter_estimated_capability_mw	number	EIA estimated winter capacity (in MWh).
zip_code	string	Five digit US Zip Code.

out_eia__yearly_generators_by_ownership#

Noticed any irregularities with this table? File a report.

Annual time series of all EIA generators with ownership integrated.

Most-recent data:: 2026
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: EIA – Mix of multiple EIA Forms
Primary key:: This table has no primary key. This table would have a primary key of the following columns, except there are ~200 records without utility IDs: [‘plant_id_eia’, ‘generator_id’, ‘report_date’, ‘utility_id_eia’, ‘ownership_record_type’].

Usage Warnings#

Some columns contain subtotals; use caution when choosing columns to aggregate.
Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.
Contains information from multiple raw inputs.

Additional Details#

Generators that report to EIA can have multiple utility owners. EIA reports the ownership fraction for each of the owners of generators (see core_eia860__scd_ownership). This table includes two records for each generator’s owner: one record represents a total ownership stake and the other record represents a proportional ownership stake (the ownership_record_type column labeled as “total” and “owned” respectively). All of the values reported for these generator-owner records are scaled to the ownership stake (represented by fraction_owned).

This table was created to represent every possible version of ownership slices of the EIA generators. It is primarily used as an input to out_eia__yearly_plant_parts.

This table inherits canonicalized values for generators, plants, and utilities. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia__yearly_generators_by_ownership.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia__yearly_generators_by_ownership.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia__yearly_generators_by_ownership.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia__yearly_generators_by_ownership.parquet"
)

Columns#

Field Name	Type	Description
capacity_eoy_mw	number	Total end of year installed (nameplate) capacity for a plant part, in megawatts.
capacity_factor	number	Fraction of potential generation that was actually reported for a plant part. Energy generated over time period / nameplate capacity * time period (hours/years/etc.).
capacity_mw	number	Total installed (nameplate) capacity, in megawatts.
energy_source_code_1	string	The code representing the most predominant type of energy that fuels the generator.
ferc_acct_name	string	Name of FERC account, derived from technology description and prime mover code.
fraction_owned	number	Proportion of generator ownership attributable to this utility.
fuel_cost_per_mmbtu	number	Average fuel cost per MMBTU of heat content in nominal USD.
fuel_cost_per_mmbtu_source	string	Indicates the source of the values in the fuel_cost_per_mmbtu column. The fuel cost either comes directly from the EIA forms (original), was filled in from the EIA’s API using state-level averages (eiaapi), was filled in using a rolling average (rolling_avg) or When the records get aggregated together and contain multiple sources (mixed).
fuel_cost_per_mwh	number	Derived from MCOE, a unit level value. Average fuel cost per MWh of heat content in nominal USD.
fuel_type_code_pudl	string	Simplified fuel type code used in PUDL
generator_id	string	Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string!
generator_operating_date	date	Date the generator began commercial operation. If harvested values are inconsistent, we default to using the most recently reported date.
generator_operating_year	integer	Year a generator went into service.
generator_retirement_date	date	Date of the scheduled or effected retirement of the generator.
net_generation_mwh	number	Net electricity generation for the specified period in megawatt-hours (MWh).
operational_status	string	The operating status of the asset. For generators this is based on which tab the generator was listed in in EIA 860.
operational_status_pudl	string	The operating status of the asset using PUDL categories.
ownership_record_type	string	Whether each generator record is for one owner or represents a total of all ownerships.
planned_generator_retirement_date	date	Planned effective date of the scheduled retirement of the generator.
plant_id_eia	integer	The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.
plant_id_pudl	integer	A manually assigned PUDL plant ID. May not be constant over time.
plant_name_eia	string	Plant name.
prime_mover_code	string	Code for the type of prime mover (e.g. CT, CG)
report_date	date	Date reported.
technology_description	string	High level description of the technology used by the generator to produce electricity.
total_fuel_cost	number	Total annual reported fuel costs for the plant part. Includes costs from all fuels.
total_mmbtu	number	Total annual heat content of fuel consumed by a plant part record in the plant parts list.
unit_heat_rate_mmbtu_per_mwh	number	Fuel content per unit of electricity generated. Coming from MCOE calculation.
unit_id_pudl	integer	Dynamically assigned PUDL unit id. WARNING: This ID is not guaranteed to be static long term as the input data and algorithm may evolve over time.
utility_id_eia	integer	The EIA Utility Identification number.
utility_id_pudl	integer	A manually assigned PUDL utility ID. May not be stable over time.
utility_name_eia	string	The name of the utility.

out_eia__yearly_plant_parts#

Noticed any irregularities with this table? File a report.

Annual time series of all EIA plant parts, for use in matching to FERC 1.

Most-recent data:: 2026
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: EIA – Mix of multiple EIA Forms
Primary key:: record_id_eia

Usage Warnings#

Some columns contain subtotals; use caution when choosing columns to aggregate.
Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.
Contains information from multiple raw inputs.

Additional Details#

Practically speaking, a plant is a collection of generator(s). There are many attributes of generators (i.e. prime mover, primary fuel source, technology type). We can use these generator attributes to group generator records into larger aggregate records which we call “plant parts”. A plant part is a record which corresponds to a particular collection of generators that all share an identical attribute and utility owner. E.g. all of the generators with unit_id=2, or all of the generators with coal as their primary fuel source.

The EIA data about power plants (from EIA 923 and 860) is reported in tables with records that correspond to mostly generators and plants. Other datasets (cough cough FERC1) are less well organized and include plants, generators and other plant parts all in the same table without any clear labels. This plant part table is an attempt to create records corresponding to many different plant parts in order to connect specific slices of EIA plants to other datasets.

Because generators are often owned by multiple utilities, another dimension of this plant part table involves generating two records for each owner: one for the portion of the plant part they own and one for the plant part as a whole. The portion records are labeled in the ownership_record_type column as owned and the total records are labeled as total.

This table includes A LOT of duplicative information about EIA plants. It is primarily meant for use as an input into the record linkage between FERC1 plants and EIA.

This table inherits canonicalized values for generators, plants, and utilities. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia__yearly_plant_parts.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia__yearly_plant_parts.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia__yearly_plant_parts.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia__yearly_plant_parts.parquet"
)

Columns#

Field Name	Type	Description
appro_part_label	string	Plant part of the associated true granularity record.
appro_record_id_eia	string	EIA record ID of the associated true granularity record.
capacity_eoy_mw	number	Total end of year installed (nameplate) capacity for a plant part, in megawatts.
capacity_factor	number	Fraction of potential generation that was actually reported for a plant part. Energy generated over time period / nameplate capacity * time period (hours/years/etc.).
capacity_mw	number	Total installed (nameplate) capacity, in megawatts.
construction_year	integer	Year the plant’s oldest still operational unit was built.
energy_source_code_1	string	The code representing the most predominant type of energy that fuels the generator.
ferc1_generator_agg_id	integer	ID dynamically assigned by PUDL to EIA records with multiple matches to a single FERC ID in the FERC-EIA manual matching process. The ID is manually assigned and has not been updated since 2020, but only affects a couple hundred records total across all years.
ferc_acct_name	string	Name of FERC account, derived from technology description and prime mover code.
fraction_owned	number	Proportion of generator ownership attributable to this utility.
fuel_cost_per_mmbtu	number	Average fuel cost per MMBTU of heat content in nominal USD.
fuel_cost_per_mwh	number	Derived from MCOE, a unit level value. Average fuel cost per MWh of heat content in nominal USD.
fuel_type_code_pudl	string	Simplified fuel type code used in PUDL
generator_id	string	Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string!
generator_operating_year	integer	Year a generator went into service.
generator_retirement_date	date	Date of the scheduled or effected retirement of the generator.
installation_year	integer	Year the plant’s most recently built unit was installed.
net_generation_mwh	number	Net electricity generation for the specified period in megawatt-hours (MWh).
operational_status	string	The operating status of the asset. For generators this is based on which tab the generator was listed in in EIA 860.
operational_status_pudl	string	The operating status of the asset using PUDL categories.
ownership_dupe	boolean	Whether a plant part record has a duplicate record with different ownership status.
ownership_record_type	string	Whether each generator record is for one owner or represents a total of all ownerships.
planned_generator_retirement_date	date	Planned effective date of the scheduled retirement of the generator.
plant_id_eia	integer	The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.
plant_id_pudl	integer	A manually assigned PUDL plant ID. May not be constant over time.
plant_id_report_year	string	PUDL plant ID and report year of the record.
plant_name_eia	string	Plant name.
plant_name_ppe	string	Derived plant name that includes EIA plant name and other strings associated with ID and PK columns of the plant part.
plant_part	string	The part of the plant a record corresponds to.
plant_part_id_eia	string	Contains EIA plant ID, plant part, ownership, and EIA utility id
prime_mover_code	string	Code for the type of prime mover (e.g. CT, CG)
record_count	integer	Number of distinct generator IDs that participated in the aggregation for a plant part list record.
record_id_eia	string	Identifier for EIA plant parts analysis records.
report_date	date	Date reported.
report_year	integer	Four-digit year in which the data was reported.
technology_description	string	High level description of the technology used by the generator to produce electricity.
total_fuel_cost	number	Total annual reported fuel costs for the plant part. Includes costs from all fuels.
total_mmbtu	number	Total annual heat content of fuel consumed by a plant part record in the plant parts list.
true_gran	boolean	Indicates whether a plant part list record is associated with the highest priority plant part for all identical records.
unit_heat_rate_mmbtu_per_mwh	number	Fuel content per unit of electricity generated. Coming from MCOE calculation.
unit_id_pudl	integer	Dynamically assigned PUDL unit id. WARNING: This ID is not guaranteed to be static long term as the input data and algorithm may evolve over time.
utility_id_eia	integer	The EIA Utility Identification number.
utility_id_pudl	integer	A manually assigned PUDL utility ID. May not be stable over time.
utility_name_eia	string	The name of the utility.

out_eia__yearly_plants#

Noticed any irregularities with this table? File a report.

Annual time series of all EIA plant attributes.

Most-recent data:: 2026
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: EIA – Mix of multiple EIA Forms
Primary key:: plant_id_eia, report_date

Usage Warnings#

Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.
Contains information from multiple raw inputs.

Additional Details#

This table inherits canonicalized values for plants. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia__yearly_plants.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia__yearly_plants.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia__yearly_plants.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia__yearly_plants.parquet"
)

Columns#

Field Name	Type	Description
ash_impoundment	boolean	Is there an ash impoundment (e.g. pond, reservoir) at the plant?
ash_impoundment_lined	boolean	If there is an ash impoundment at the plant, is the impoundment lined?
ash_impoundment_status	string	If there is an ash impoundment at the plant, the ash impoundment status as of December 31 of the reporting year.
balancing_authority_code_eia	string	EIA short code identifying a balancing authority. May include Canadian and Mexican BAs.
balancing_authority_code_eia_consistent_rate	number	Percentage consistency of balancing authority code across entity records.
balancing_authority_name_eia	string	Name of the balancing authority.
city	string	Name of the city.
county	string	County name.
data_maturity	string	Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.
datum	string	Geodetic coordinate system identifier (e.g. NAD27, NAD83, or WGS84).
energy_storage	boolean	Indicates if the facility has energy storage capabilities.
ferc_cogen_docket_no	string	The docket number relating to the FERC cogenerator status. See FERC Form 556.
ferc_cogen_status	boolean	Indicates whether the plant has FERC qualifying facility cogenerator status. See FERC Form 556.
ferc_exempt_wholesale_generator	boolean	Indicates whether the plant has FERC qualifying facility exempt wholesale generator status
ferc_exempt_wholesale_generator_docket_no	string	The docket number relating to the FERC qualifying facility exempt wholesale generator status.
ferc_qualifying_facility_docket_no	string	The docket number relating to the FERC qualifying facility cogenerator status. See FERC Form 556.
ferc_small_power_producer	boolean	Indicates whether the plant has FERC qualifying facility small power producer status. See FERC Form 556.
ferc_small_power_producer_docket_no	string	The docket number relating to the FERC qualifying facility small power producer status. See FERC Form 556.
grid_voltage_1_kv	number	Plant’s grid voltage at point of interconnection to transmission or distribution facilities
grid_voltage_2_kv	number	Plant’s grid voltage at point of interconnection to transmission or distribution facilities
grid_voltage_3_kv	number	Plant’s grid voltage at point of interconnection to transmission or distribution facilities
has_net_metering	boolean	Whether the plant has a net metering agreement in effect during the reporting year. (Only displayed for facilities that report the sun or wind as an energy source). This field was only reported up until 2015
iso_rto_code	string	The code of the plant’s ISO or RTO. NA if not reported in that year.
latitude	number	Latitude of the plant’s location, in degrees.
liquefied_natural_gas_storage	boolean	Indicates if the facility have the capability to store the natural gas in the form of liquefied natural gas.
longitude	number	Longitude of the plant’s location, in degrees.
natural_gas_local_distribution_company	string	Names of Local Distribution Company (LDC), connected to natural gas burning power plants.
natural_gas_pipeline_name_1	string	The name of the owner or operator of natural gas pipeline that connects directly to this facility or that connects to a lateral pipeline owned by this facility.
natural_gas_pipeline_name_2	string	The name of the owner or operator of natural gas pipeline that connects directly to this facility or that connects to a lateral pipeline owned by this facility.
natural_gas_pipeline_name_3	string	The name of the owner or operator of natural gas pipeline that connects directly to this facility or that connects to a lateral pipeline owned by this facility.
natural_gas_storage	boolean	Indicates if the facility have on-site storage of natural gas.
nerc_region	string	NERC region in which the plant is located
pipeline_notes	string	Additional owner or operator of natural gas pipeline.
plant_id_eia	integer	The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.
plant_id_pudl	integer	A manually assigned PUDL plant ID. May not be constant over time.
plant_name_eia	string	Plant name.
primary_purpose_id_naics	integer	North American Industry Classification System (NAICS) code that best describes the primary purpose of the reporting plant
regulatory_status_code	string	Indicates whether the plant is regulated or non-regulated.
report_date	date	Date reported.
reporting_frequency_code	string	Code that specifies what time period data has to be reported (i.e. monthly data or annual totals) and how often the power plant reports this data to EIA. See reporting_frequencies_eia for more details.
sector_id_eia	integer	EIA assigned sector ID, corresponding to high level NAICS sector, designated by the primary purpose, regulatory status and plant-level combined heat and power status
sector_name_eia	string	EIA assigned sector name, corresponding to high level NAICS sector, designated by the primary purpose, regulatory status and plant-level combined heat and power status
service_area	string	Service area in which plant is located; for unregulated companies, it’s the electric utility with which plant is interconnected
state	string	Two letter US state abbreviation.
street_address	string	Physical street address.
timezone	string	IANA timezone name
transmission_distribution_owner_id	integer	EIA-assigned code for owner of transmission/distribution system to which the plant is interconnected.
transmission_distribution_owner_name	string	Name of the owner of the transmission or distribution system to which the plant is interconnected.
transmission_distribution_owner_state	string	State location for owner of transmission/distribution system to which the plant is interconnected.
utility_id_eia	integer	The EIA Utility Identification number.
utility_id_pudl	integer	A manually assigned PUDL utility ID. May not be stable over time.
utility_name_eia	string	The name of the utility.
water_source	string	Name of water source associated with the plant.
zip_code	string	Five digit US Zip Code.

out_eia__yearly_utilities#

Noticed any irregularities with this table? File a report.

Annual time series of all EIA utility attributes.

Most-recent data:: 2026
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: EIA – Mix of multiple EIA Forms
Primary key:: utility_id_eia, report_date

Usage Warnings#

Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.
Contains information from multiple raw inputs.

Additional Details#

This table inherits canonicalized values for utilities. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia__yearly_utilities.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia__yearly_utilities.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia__yearly_utilities.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia__yearly_utilities.parquet"
)

Columns#

Field Name	Type	Description
address_2	string	Second line of the address.
attention_line	string	Mail attention name of the operator/owner.
city	string	Name of the city.
contact_firstname	string	First name of utility contact 1.
contact_firstname_2	string	First name of utility contact 2.
contact_lastname	string	Last name of utility contact 1.
contact_lastname_2	string	Last name of utility contact 2.
contact_title	string	Title of of utility contact 1.
contact_title_2	string	Title of utility contact 2.
data_maturity	string	Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.
entity_type	string	Entity type of principal owner.
phone_extension	string	Phone extension for utility contact 1
phone_extension_2	string	Phone extension for utility contact 2
phone_number	string	Phone number for utility contact 1.
phone_number_2	string	Phone number for utility contact 2.
plants_reported_asset_manager	boolean	Is the reporting entity an asset manager of power plants reported on Schedule 2 of the form?
plants_reported_operator	boolean	Is the reporting entity an operator of power plants reported on Schedule 2 of the form?
plants_reported_other_relationship	boolean	Does the reporting entity have any other relationship to the power plants reported on Schedule 2 of the form?
plants_reported_owner	boolean	Is the reporting entity an owner of power plants reported on Schedule 2 of the form?
report_date	date	Date reported.
state	string	Two letter US state abbreviation.
street_address	string	Physical street address.
utility_id_eia	integer	The EIA Utility Identification number.
utility_id_pudl	integer	A manually assigned PUDL utility ID. May not be stable over time.
utility_name_eia	string	The name of the utility.
zip_code	string	Five digit US Zip Code.
zip_code_4	string	Four digit US Zip Code suffix.

out_ferc1__yearly_all_plants#

Noticed any irregularities with this table? File a report.

Annual time series of plants reported in the steam, small generators, hydro, and pumped storage tables.

Most-recent data:: 2024
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: FERC Form 1 – Annual Report of Major Electric Utilities (Schedules 402, 404, 406 and 408)
Primary key:: record_id

Usage Warnings#

Not all columns are originally reported in all of the input plant tables. Expect nulls.
FERC does not restrict respondents to report unique and non-duplicative plant records. There are sporadic instances of respondents reporting portions of plants and then the total plant (ex: unit 1, unit 2 and total). Use caution when aggregating.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

This table is a concatenation of the following plant tables:

core_ferc1__yearly_steam_plants_sched402
core_ferc1__yearly_hydroelectric_plants_sched406
core_ferc1__yearly_small_plants_sched410
core_ferc1__yearly_pumped_storage_plants_sched408

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_all_plants.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_all_plants.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_all_plants.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_all_plants.parquet"
)

Columns#

Field Name	Type	Description
asset_retirement_cost	number	Asset retirement cost (USD).
avg_num_employees	number	The average number of employees assigned to each plant.
capacity_factor	number	Fraction of potential generation that was actually reported for a plant part. Energy generated over time period / nameplate capacity * time period (hours/years/etc.).
capacity_mw	number	Total installed (nameplate) capacity, in megawatts.
capex_annual_addition	number	Annual capital addition into capex_total.
capex_annual_addition_rolling	number	Year-to-date capital addition into capex_total.
capex_annual_per_kw	number	Annual capital addition into capex_total per kw.
capex_annual_per_mw	number	Annual capital addition into capex_total per MW.
capex_annual_per_mw_rolling	number	Year-to-date capital addition into capex_total per MW.
capex_annual_per_mwh	number	Annual capital addition into capex_total per MWh.
capex_annual_per_mwh_rolling	number	Year-to-date capital addition into capex_total per MWh.
capex_equipment	number	Cost of plant: equipment (USD).
capex_equipment_electric	number	Cost of plant: accessory electric equipment (USD).
capex_equipment_misc	number	Cost of plant: miscellaneous power plant equipment (USD).
capex_facilities	number	Cost of plant: reservoirs, dams, and waterways (USD).
capex_land	number	Cost of plant: land and land rights (USD).
capex_per_mw	number	Cost of plant per megawatt of installed (nameplate) capacity. Nominal USD.
capex_roads	number	Cost of plant: roads, railroads, and bridges (USD).
capex_structures	number	Cost of plant: structures and improvements (USD).
capex_total	number	Total cost of plant (USD).
capex_wheels_turbines_generators	number	Cost of plant: water wheels, turbines, and generators (USD).
capex_wo_retirement_total	number	Total cost of plant (USD) without retirements.
construction_type	string	Type of plant construction (‘outdoor’, ‘semioutdoor’, or ‘conventional’). Categorized by PUDL based on our best guess of intended value in FERC1 freeform strings.
construction_year	integer	Year the plant’s oldest still operational unit was built.
energy_used_for_pumping_mwh	number	Energy used for pumping, in megawatt-hours.
ferc_license_id	string	The FERC license ID of a project.
fuel_cost_per_mmbtu	number	Average fuel cost per MMBTU of heat content in nominal USD.
fuel_cost_per_mwh	number	Derived from MCOE, a unit level value. Average fuel cost per MWh of heat content in nominal USD.
fuel_type	string	Type of fuel.
installation_year	integer	Year the plant’s most recently built unit was installed.
license_id_ferc1	integer	FERC issued operating license ID for the facility, if available. This value is extracted from the original plant name where possible.
net_capacity_adverse_conditions_mw	number	Net plant capability under the least favorable operating conditions, in megawatts.
net_capacity_favorable_conditions_mw	number	Net plant capability under the most favorable operating conditions, in megawatts.
net_generation_mwh	number	Net electricity generation for the specified period in megawatt-hours (MWh).
net_load_mwh	number	Net output for load (net generation - energy used for pumping) in megawatt-hours.
not_water_limited_capacity_mw	number	Plant capacity in MW when not limited by condenser water.
opex_allowances	number	Allowances.
opex_boiler	number	Maintenance of boiler (or reactor) plant.
opex_coolants	number	Cost of coolants and water (nuclear plants only)
opex_dams	number	Production expenses: maintenance of reservoirs, dams, and waterways (USD).
opex_electric	number	Production expenses: electric expenses (USD).
opex_engineering	number	Production expenses: maintenance, supervision, and engineering (USD).
opex_fuel	number	Production expenses: fuel (USD).
opex_generation_misc	number	Production expenses: miscellaneous power generation expenses (USD).
opex_hydraulic	number	Production expenses: hydraulic expenses (USD).
opex_maintenance	number	Production expenses: Maintenance (USD).
opex_misc_plant	number	Production expenses: maintenance of miscellaneous hydraulic plant (USD).
opex_misc_power	number	Miscellaneous steam (or nuclear) expenses.
opex_misc_steam	number	Maintenance of miscellaneous steam (or nuclear) plant.
opex_nonfuel_per_mwh	number	Investments in non-fuel production expenses per Mwh.
opex_operations	number	Production expenses: operations, supervision, and engineering (USD).
opex_per_mwh	number	Total production expenses (USD per MWh generated).
opex_plant	number	Production expenses: maintenance of electric plant (USD).
opex_production_before_pumping	number	Total production expenses before pumping (USD).
opex_production_total	number	Total operating expenses.
opex_pumped_storage	number	Production expenses: pumped storage (USD).
opex_pumping	number	Production expenses: We are here to PUMP YOU UP! (USD).
opex_rents	number	Production expenses: rents (USD).
opex_steam	number	Steam expenses.
opex_steam_other	number	Steam from other sources.
opex_structures	number	Production expenses: maintenance of structures (USD).
opex_total	number	Total production expenses, excluding fuel (USD).
opex_total_nonfuel	number	Total production expenses, excluding fuel (USD).
opex_transfer	number	Steam transferred (Credit).
opex_water_for_power	number	Production expenses: water for power (USD).
peak_demand_mw	number	Net peak demand for 60 minutes. Note: in some cases peak demand for other time periods may have been reported instead, if hourly peak demand was unavailable.
plant_capability_mw	number	Net plant capability in megawatts.
plant_hours_connected_while_generating	number	Hours the plant was connected to load while generating in the report year.
plant_id_ferc1	integer	Algorithmically assigned PUDL FERC Plant ID. WARNING: NOT STABLE BETWEEN PUDL DB INITIALIZATIONS.
plant_id_pudl	integer	A manually assigned PUDL plant ID. May not be constant over time.
plant_name_ferc1	string	Name of the plant, as reported to FERC. This is a freeform string, not guaranteed to be consistent across references to the same plant.
plant_type	string	Type of plant.
record_id	string	Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.
report_year	integer	Four-digit year in which the data was reported.
utility_id_ferc1	integer	PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.
utility_id_ferc1_dbf	integer	FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year.
utility_id_ferc1_xbrl	string	FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year.
utility_id_pudl	integer	A manually assigned PUDL utility ID. May not be stable over time.
utility_name_ferc1	string	Name of the responding utility, as it is reported in FERC Form 1. For human readability only.
water_limited_capacity_mw	number	Plant capacity in MW when limited by condenser water.

out_ferc1__yearly_balance_sheet_assets_sched110#

Noticed any irregularities with this table? File a report.

Annual time series of utility assets and other debits.

Most-recent data:: 2024
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 110)
Primary key:: utility_id_ferc1, report_year, asset_type

Usage Warnings#

Some columns contain subtotals; use caution when choosing columns to aggregate.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_balance_sheet_assets_sched110.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_balance_sheet_assets_sched110.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_balance_sheet_assets_sched110.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_balance_sheet_assets_sched110.parquet"
)

Columns#

Field Name	Type	Description
asset_type	string	Type of asset being reported to the core_ferc1__yearly_balance_sheet_assets_sched110 table.
balance	string	Indication of whether a column is a credit or debit, as reported in the XBRL taxonomy.
ending_balance	number	Account balance at end of year.
ferc_account	string	Actual FERC Account number (e.g. ‘359.1’) if available, or a PUDL assigned ID when FERC accounts have been split or combined in reporting.
record_id	string	Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.
report_year	integer	Four-digit year in which the data was reported.
row_type_xbrl	string	Indicates whether the value reported in the row is calculated, or uniquely reported within the table.
starting_balance	number	Account balance at beginning of year.
utility_id_ferc1	integer	PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.
utility_id_ferc1_dbf	integer	FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year.
utility_id_ferc1_xbrl	string	FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year.
utility_id_pudl	integer	A manually assigned PUDL utility ID. May not be stable over time.
utility_name_ferc1	string	Name of the responding utility, as it is reported in FERC Form 1. For human readability only.
utility_type	string	Listing of utility plant types. Examples include Electric Utility, Gas Utility, and Other Utility.

out_ferc1__yearly_balance_sheet_liabilities_sched110#

Noticed any irregularities with this table? File a report.

Annual time series of utility liabilities and other credits.

Most-recent data:: 2024
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 110)
Primary key:: utility_id_ferc1, report_year, liability_type

Usage Warnings#

Some columns contain subtotals; use caution when choosing columns to aggregate.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_balance_sheet_liabilities_sched110.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_balance_sheet_liabilities_sched110.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_balance_sheet_liabilities_sched110.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_balance_sheet_liabilities_sched110.parquet"
)

Columns#

Field Name	Type	Description
balance	string	Indication of whether a column is a credit or debit, as reported in the XBRL taxonomy.
ending_balance	number	Account balance at end of year.
ferc_account	string	Actual FERC Account number (e.g. ‘359.1’) if available, or a PUDL assigned ID when FERC accounts have been split or combined in reporting.
liability_type	string	Type of liability being reported to the core_ferc1__yearly_balance_sheet_liabilities_sched110 table.
record_id	string	Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.
report_year	integer	Four-digit year in which the data was reported.
row_type_xbrl	string	Indicates whether the value reported in the row is calculated, or uniquely reported within the table.
starting_balance	number	Account balance at beginning of year.
utility_id_ferc1	integer	PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.
utility_id_ferc1_dbf	integer	FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year.
utility_id_ferc1_xbrl	string	FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year.
utility_id_pudl	integer	A manually assigned PUDL utility ID. May not be stable over time.
utility_name_ferc1	string	Name of the responding utility, as it is reported in FERC Form 1. For human readability only.
utility_type	string	Listing of utility plant types. Examples include Electric Utility, Gas Utility, and Other Utility.

out_ferc1__yearly_cash_flows_sched120#

Noticed any irregularities with this table? File a report.

Annual time series of utility cash flow.

Most-recent data:: 2024
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 120)
Primary key:: utility_id_ferc1, report_year, amount_type

Usage Warnings#

Some columns contain subtotals; use caution when choosing columns to aggregate.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_cash_flows_sched120.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_cash_flows_sched120.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_cash_flows_sched120.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_cash_flows_sched120.parquet"
)

Columns#

Field Name	Type	Description
amount	number	Reported amount of dollars. This could be a balance or a change in value.
amount_type	string	Label describing the type of amount being reported. This could be a balance or a change in value.
balance	string	Indication of whether a column is a credit or debit, as reported in the XBRL taxonomy.
record_id	string	Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.
report_year	integer	Four-digit year in which the data was reported.
row_type_xbrl	string	Indicates whether the value reported in the row is calculated, or uniquely reported within the table.
utility_id_ferc1	integer	PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.
utility_id_ferc1_dbf	integer	FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year.
utility_id_ferc1_xbrl	string	FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year.
utility_id_pudl	integer	A manually assigned PUDL utility ID. May not be stable over time.
utility_name_ferc1	string	Name of the responding utility, as it is reported in FERC Form 1. For human readability only.

out_ferc1__yearly_depreciation_by_function_sched219#

Noticed any irregularities with this table? File a report.

Annual time series of ending balances in accumulated provision for depreciation of electric utility plant.

Most-recent data:: 2024
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 219 - Section B)
Primary key:: utility_id_ferc1, report_year, depreciation_type, plant_function, plant_status, utility_type

Usage Warnings#

Some columns contain subtotals; use caution when choosing columns to aggregate.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

Electric utility plant refers to FERC Account 108. Section B includes the Balances at End of Year According to Functional Classification (plant_function).

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_depreciation_by_function_sched219.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_depreciation_by_function_sched219.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_depreciation_by_function_sched219.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_depreciation_by_function_sched219.parquet"
)

Columns#

Field Name	Type	Description
balance	string	Indication of whether a column is a credit or debit, as reported in the XBRL taxonomy.
depreciation_type	string	Type of depreciation provision within FERC Account 108, including cost ofremoval, depreciation expenses, salvage, cost of retired plant, etc.
ending_balance	number	Account balance at end of year.
plant_function	string	Functional role played by utility plant (steam production, nuclear production, distribution, transmission, etc.).
plant_status	string	Utility plant financial status (in service, future, leased, total).
record_id	string	Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.
report_year	integer	Four-digit year in which the data was reported.
row_type_xbrl	string	Indicates whether the value reported in the row is calculated, or uniquely reported within the table.
utility_id_ferc1	integer	PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.
utility_id_ferc1_dbf	integer	FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year.
utility_id_ferc1_xbrl	string	FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year.
utility_id_pudl	integer	A manually assigned PUDL utility ID. May not be stable over time.
utility_name_ferc1	string	Name of the responding utility, as it is reported in FERC Form 1. For human readability only.
utility_type	string	Listing of utility plant types. Examples include Electric Utility, Gas Utility, and Other Utility.

out_ferc1__yearly_depreciation_changes_sched219#

Noticed any irregularities with this table? File a report.

Annual time series of changes in accumulated provision for depreciation of electric utility plant.

Most-recent data:: 2024
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 219 - Section A)
Primary key:: utility_id_ferc1, report_year, depreciation_type, plant_status, utility_type

Usage Warnings#

Some columns contain subtotals; use caution when choosing columns to aggregate.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

Electric utility plant refers to FERC Account 108.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_depreciation_changes_sched219.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_depreciation_changes_sched219.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_depreciation_changes_sched219.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_depreciation_changes_sched219.parquet"
)

Columns#

Field Name	Type	Description
balance	string	Indication of whether a column is a credit or debit, as reported in the XBRL taxonomy.
depreciation_type	string	Type of depreciation provision within FERC Account 108, including cost ofremoval, depreciation expenses, salvage, cost of retired plant, etc.
dollar_value	number	Dollar value of reported income, expense, asset, or liability.
ferc_account	string	Actual FERC Account number (e.g. ‘359.1’) if available, or a PUDL assigned ID when FERC accounts have been split or combined in reporting.
plant_status	string	Utility plant financial status (in service, future, leased, total).
record_id	string	Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.
report_year	integer	Four-digit year in which the data was reported.
row_type_xbrl	string	Indicates whether the value reported in the row is calculated, or uniquely reported within the table.
utility_id_ferc1	integer	PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.
utility_id_ferc1_dbf	integer	FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year.
utility_id_ferc1_xbrl	string	FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year.
utility_id_pudl	integer	A manually assigned PUDL utility ID. May not be stable over time.
utility_name_ferc1	string	Name of the responding utility, as it is reported in FERC Form 1. For human readability only.
utility_type	string	Listing of utility plant types. Examples include Electric Utility, Gas Utility, and Other Utility.

out_ferc1__yearly_depreciation_factors_sched336#

Noticed any irregularities with this table? File a report.

Annual time series of factors used in estimating depreciation charges.

Most-recent data:: 2024
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 336 - Section C)
Primary key:: This table has no primary key. This table is too unstructured to have a primary key.

Usage Warnings#

Some columns contain subtotals; use caution when choosing columns to aggregate.
Contains columns which may appear categorical, but are actually free text.
The rate and percentage (pct) columns are reported either as values between 0-1 or 0-100. It is often consistent within a given utility-year but use caution when comparing value across different utilities and even the same utility filing in a different year.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

This table contains details at a variety of levels of granularity.There are many free-form text fields in this table which results in respondents filling out this table very differently from each other or from year to year. We recommend using this table carefully with one utility-year at a time - not attempting to perform analysis across long time-series or across utilities without much caution and cleaning. This table only contains information from Section C: Factors Used in Estimating Depreciation Charges.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_depreciation_factors_sched336.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_depreciation_factors_sched336.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_depreciation_factors_sched336.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_depreciation_factors_sched336.parquet"
)

Columns#

Field Name	Type	Description
account_num	string	Account number(s) in connection with factors used in estimating depreciation charges.
depreciable_plant_base	number	Depreciable plant balance (depreciable base) to which rates are applied.
depreciation_factors	string	Label of the factor of depreciation factors. This field contains is an unstructured, free-form strings. It often includes FERC account IDs, sometimes includes plant names and sometimes includes headers indicating the beginning of a new section - such as a plant or asset type header which is followed by sub-components like FERC account IDs or plant names depending on the section.
depreciation_rate	string	Depreciation rate applied to utility plant balance.Be aware that the formatting of this column is not expected to be standard - expect some values between 0-1 and some between 0-100.
mortality_curve_type	string	Description of the type of mortality curve selected in plant mortality studies prepared to assist in estimating average service lives.
net_salvage_pct	number	Percentage representing the estimated value of utility plant at the end of its service life. Be aware that the formatting of this column is not expected to be standard - expect some values between 0-1 and some between 0-100.
order_num	number	This field is defined in FERC-XBRL documentation as a field that is used to sequence a table.FERC-XBRL documentation notes: ‘This field is added to a table to control ordering of the items on the table.’FERC’s documentation also notes that this field should always be an integer - although there are many instances of floating point values which seem to increment by decimal points. Nonetheless, this field can be used to help understand the original order of the table. This field did not exist prior to FERC publishing Form 1 as XBRL and thus is always null prior to 2021.
record_id	string	Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.
remaining_life_avg	number	Estimated weighted average of remaining life of utility plant assets.The original DBF data was reported as years while the XBRL data was reported as strings in this time duration format: PnYnMnDTnHnMnS. PUDL converts these string time durations into years.
report_year	integer	Four-digit year in which the data was reported.
service_life_avg	number	Estimated average service life of utility plant. The original DBF data was reported as years while the XBRL data was reported as strings in this time duration format: PnYnMnDTnHnMnS. PUDL converts these string time durations into years.
utility_id_ferc1	integer	PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.
utility_id_ferc1_dbf	integer	FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year.
utility_id_ferc1_xbrl	string	FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year.
utility_id_pudl	integer	A manually assigned PUDL utility ID. May not be stable over time.
utility_name_ferc1	string	Name of the responding utility, as it is reported in FERC Form 1. For human readability only.

out_ferc1__yearly_depreciation_summary_sched336#

Noticed any irregularities with this table? File a report.

Annual time series of depreciation and amortization of electric plant.

Most-recent data:: 2024
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 336 - Section A)
Primary key:: utility_id_ferc1, report_year, plant_function, ferc_account_label

Usage Warnings#

Some columns contain subtotals; use caution when choosing columns to aggregate.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

Electric Plant refers to FERC Accounts 403, 404, and 405. This table only contains information from Section A: Summary of depreciation and amortization changes.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_depreciation_summary_sched336.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_depreciation_summary_sched336.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_depreciation_summary_sched336.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_depreciation_summary_sched336.parquet"
)

Columns#

Field Name	Type	Description
dollar_value	number	Dollar value of reported income, expense, asset, or liability.
ferc_account	string	Actual FERC Account number (e.g. ‘359.1’) if available, or a PUDL assigned ID when FERC accounts have been split or combined in reporting.
ferc_account_label	string	Long FERC account identifier derived from values reported in the XBRL taxonomies. May also refer to aggregations of individual FERC accounts.
plant_function	string	Functional role played by utility plant (steam production, nuclear production, distribution, transmission, etc.).
record_id	string	Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.
report_year	integer	Four-digit year in which the data was reported.
utility_id_ferc1	integer	PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.
utility_id_ferc1_dbf	integer	FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year.
utility_id_ferc1_xbrl	string	FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year.
utility_id_pudl	integer	A manually assigned PUDL utility ID. May not be stable over time.
utility_name_ferc1	string	Name of the responding utility, as it is reported in FERC Form 1. For human readability only.
utility_type	string	Listing of utility plant types. Examples include Electric Utility, Gas Utility, and Other Utility.

out_ferc1__yearly_detailed_balance_sheet_assets#

Noticed any irregularities with this table? File a report.

Annual time series of granular, de-duplicated accounting data of utilities’ balance sheet assets.

Most-recent data:: 2024
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: FERC Form 1 – Annual Report of Major Electric Utilities (Schedules 110, 200, 204 and 219)
Primary key:: This table has no primary key.

Usage Warnings#

The data from these xbrl_factoid tables contains nested totals and subtotals - making aggregations difficult. We used FERC 1’s reported calculations to determine the most granular, non-duplicative records. The reported calculations are not always perfect - we’ve corrected many of them, but errors could still exist.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

This table is derived from four FERC Form 1 accounting tables with nested calculations:

core_ferc1__yearly_balance_sheet_assets_sched110
core_ferc1__yearly_utility_plant_summary_sched200
core_ferc1__yearly_plant_in_service_sched204
core_ferc1__yearly_depreciation_by_function_sched219

We reconciled the nested calculations within these tables and then identified the most granular data across the tables. We applied slight modifications to three columns (utility_type, plant_function & plant_status) as compared to the originally reported values in our core tables. The modifications were applied to either provide more specificity (i.e. we converted some total utility_type’s into electric) or to condense similar categories for easier analysis (i.e. creating a hydraulic_production plant_function by combining hydraulic_production_conventional and hydraulic_production_pumped_storage). See pudl.output.ferc1.Exploder for more details. This table was made entirely with support and direction from RMI.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_detailed_balance_sheet_assets.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_detailed_balance_sheet_assets.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_detailed_balance_sheet_assets.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_detailed_balance_sheet_assets.parquet"
)

Columns#

Field Name	Type	Description
ending_balance	number	Account balance at end of year.
in_rate_base	boolean	Whether or not a record from the detailed FERC1 accounting tables should be considered allowable in a utility’s rate base based on utility accounting standards. This flag was manually compiled by RMI utility accounting experts based on the xbrl_factoid and sometimes varies based on the utility_type, plant_status or plant_function.
plant_function	string	Functional role played by utility plant (steam production, nuclear production, distribution, transmission, etc.).
plant_status	string	Utility plant financial status (in service, future, leased, total).
rate_base_category	string	A category of asset or liability that RMI compiled to use as a shorthand for various types of utility assets. These tags were compiled manually based on the xbrl_factoid and sometimes varies based on the utility_type, plant_function or plant_status as well.
report_year	integer	Four-digit year in which the data was reported.
table_name	string	The name of the PUDL database table where a given record originated from.
utility_id_ferc1	integer	PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.
utility_id_ferc1_dbf	integer	FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year.
utility_id_ferc1_xbrl	string	FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year.
utility_id_pudl	integer	A manually assigned PUDL utility ID. May not be stable over time.
utility_name_ferc1	string	Name of the responding utility, as it is reported in FERC Form 1. For human readability only.
utility_type	string	Listing of utility plant types. Examples include Electric Utility, Gas Utility, and Other Utility.
utility_type_other	string	Freeform description of type of utility reported in one of the other three other utility_type sections in the core_ferc1__yearly_utility_plant_summary_sched200 table. This field is reported only in the DBF reporting years (1994-2020).
xbrl_factoid	string	The name of type of value which is a derivative of the XBRL fact name.

out_ferc1__yearly_detailed_balance_sheet_liabilities#

Noticed any irregularities with this table? File a report.

Annual time series of granular, de-duplicated accounting data of utilities’ balance sheet liabilities.

Most-recent data:: 2024
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 110 and 118)
Primary key:: This table has no primary key.

Usage Warnings#

The data from these xbrl_factoid tables contains nested totals and subtotals - making aggregations difficult. We used FERC 1’s reported calculations to determine the most granular, non-duplicative records. The reported calculations are not always perfect - we’ve corrected many of them, but errors could still exist.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

This table is derived from two FERC Form 1 accounting tables with nested calculations:

core_ferc1__yearly_balance_sheet_liabilities_sched110
core_ferc1__yearly_retained_earnings_sched118

We reconciled the nested calculations within these tables and then identified the most granular data across the tables. We applied slight modifications to three columns (utility_type, plant_function & plant_status) as compared to the originally reported values in our core tables. The modifications were applied to either provide more specificity (i.e. we converted some total utility_type’s into electric) or to condense similar categories for easier analysis (i.e. creating a hydraulic_production plant_function by combining hydraulic_production_conventional and hydraulic_production_pumped_storage). See pudl.output.ferc1.Exploder for more details. This table was made entirely with support and direction from RMI.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_detailed_balance_sheet_liabilities.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_detailed_balance_sheet_liabilities.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_detailed_balance_sheet_liabilities.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_detailed_balance_sheet_liabilities.parquet"
)

Columns#

Field Name	Type	Description
ending_balance	number	Account balance at end of year.
in_rate_base	boolean	Whether or not a record from the detailed FERC1 accounting tables should be considered allowable in a utility’s rate base based on utility accounting standards. This flag was manually compiled by RMI utility accounting experts based on the xbrl_factoid and sometimes varies based on the utility_type, plant_status or plant_function.
rate_base_category	string	A category of asset or liability that RMI compiled to use as a shorthand for various types of utility assets. These tags were compiled manually based on the xbrl_factoid and sometimes varies based on the utility_type, plant_function or plant_status as well.
report_year	integer	Four-digit year in which the data was reported.
table_name	string	The name of the PUDL database table where a given record originated from.
utility_id_ferc1	integer	PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.
utility_id_ferc1_dbf	integer	FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year.
utility_id_ferc1_xbrl	string	FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year.
utility_id_pudl	integer	A manually assigned PUDL utility ID. May not be stable over time.
utility_name_ferc1	string	Name of the responding utility, as it is reported in FERC Form 1. For human readability only.
utility_type	string	Listing of utility plant types. Examples include Electric Utility, Gas Utility, and Other Utility.
xbrl_factoid	string	The name of type of value which is a derivative of the XBRL fact name.

out_ferc1__yearly_detailed_income_statements#

Noticed any irregularities with this table? File a report.

Annual time series of granular, de-duplicated accounting data of utilities’ income statements.

Most-recent data:: 2024
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: FERC Form 1 – Annual Report of Major Electric Utilities (Schedules 114, 300, 320 and 336)
Primary key:: This table has no primary key.

Usage Warnings#

The data from these xbrl_factoid tables contains nested totals and subtotals - making aggregations difficult. We used FERC 1’s reported calculations to determine the most granular, non-duplicative records. The reported calculations are not always perfect - we’ve corrected many of them, but errors could still exist.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

This table is derived from four FERC Form 1 accounting tables with nested calculations:

core_ferc1__yearly_income_statements_sched114
core_ferc1__yearly_depreciation_summary_sched336
core_ferc1__yearly_operating_expenses_sched320
core_ferc1__yearly_operating_revenues_sched300

We reconciled the nested calculations within these tables and then identified the most granular data across the tables. We applied slight modifications to two columns (utility_type & plant_function) as compared to the originally reported values in our core tables. The modifications were applied to either provide more specificity (i.e. we converted some total utility_type’s into electric) or to condense similar categories for easier analysis (i.e. creating a hydraulic_production plant_function by combining hydraulic_production_conventional and hydraulic_production_pumped_storage). See pudl.output.ferc1.Exploder for more details. This table was made entirely with support and direction from RMI.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_detailed_income_statements.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_detailed_income_statements.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_detailed_income_statements.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_detailed_income_statements.parquet"
)

Columns#

Field Name	Type	Description
dollar_value	number	Dollar value of reported income, expense, asset, or liability.
in_revenue_requirement	boolean	Whether or not a record from the detailed income statement data is typically included in a utility’s revenue requirement. This flag was manually compiled by RMI utility accounting experts based on the xbrl_factoid and sometimes varies based on the utility_type or plant_function.
plant_function	string	Functional role played by utility plant (steam production, nuclear production, distribution, transmission, etc.).
report_year	integer	Four-digit year in which the data was reported.
revenue_requirement_category	string	The category of revenue requirement associated with each component of utility’sincome statements. These categories were manually compiled by RMI utility accounting experts based on the xbrl_factoid and sometimes vary based on the utility_type or plant_function. This column is intended to be used to aggregate this table.
revenue_requirement_technology	string	The technology type associated with components of a utility’s revenue requirement. These categories were manually compiled by RMI utility accounting experts based on the xbrl_factoid and sometimes vary based on the utility_type or plant_function as well. This column is intended to be used to aggregate this table.
table_name	string	The name of the PUDL database table where a given record originated from.
utility_id_ferc1	integer	PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.
utility_id_ferc1_dbf	integer	FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year.
utility_id_ferc1_xbrl	string	FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year.
utility_id_pudl	integer	A manually assigned PUDL utility ID. May not be stable over time.
utility_name_ferc1	string	Name of the responding utility, as it is reported in FERC Form 1. For human readability only.
utility_type	string	Listing of utility plant types. Examples include Electric Utility, Gas Utility, and Other Utility.
xbrl_factoid	string	The name of type of value which is a derivative of the XBRL fact name.

out_ferc1__yearly_energy_dispositions_sched401#

Noticed any irregularities with this table? File a report.

Annual time series of dispositions of electric energy sold, exchanged, or stored.

Most-recent data:: 2024
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 401a)
Primary key:: utility_id_ferc1, report_year, energy_disposition_type

Usage Warnings#

Some columns contain subtotals; use caution when choosing columns to aggregate.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

Electric Energy Account, dispositions only. Schedule 401a. Electricity utilities delivered to end users, internal losses, etc.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_energy_dispositions_sched401.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_energy_dispositions_sched401.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_energy_dispositions_sched401.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_energy_dispositions_sched401.parquet"
)

Columns#

Field Name	Type	Description
energy_disposition_type	string	Type of energy disposition reported in the core_ferc1__yearly_energy_dispositions_sched401. Dispositions include sales to customers, re-sales of energy, energy used internally, losses, etc.
energy_mwh	number	Sources and uses of energy in MWh.
record_id	string	Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.
report_year	integer	Four-digit year in which the data was reported.
row_type_xbrl	string	Indicates whether the value reported in the row is calculated, or uniquely reported within the table.
utility_id_ferc1	integer	PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.
utility_id_ferc1_dbf	integer	FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year.
utility_id_ferc1_xbrl	string	FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year.
utility_id_pudl	integer	A manually assigned PUDL utility ID. May not be stable over time.
utility_name_ferc1	string	Name of the responding utility, as it is reported in FERC Form 1. For human readability only.

out_ferc1__yearly_energy_sources_sched401#

Noticed any irregularities with this table? File a report.

Annual time series of sources of electric energy generated or purchased, exchanged and wheeled.

Most-recent data:: 2024
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 401a)
Primary key:: utility_id_ferc1, report_year, energy_source_type

Usage Warnings#

Some columns contain subtotals; use caution when choosing columns to aggregate.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

Electric Energy Account, sources only. Schedule 401a. Amount of electricity the utility obtained from each of several sources.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_energy_sources_sched401.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_energy_sources_sched401.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_energy_sources_sched401.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_energy_sources_sched401.parquet"
)

Columns#

Field Name	Type	Description
energy_mwh	number	Sources and uses of energy in MWh.
energy_source_type	string	Type of energy source reported in the core_ferc1__yearly_energy_sources_sched401 table. There are three groups of energy sources: generation, power exchanges and transmission.
record_id	string	Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.
report_year	integer	Four-digit year in which the data was reported.
row_type_xbrl	string	Indicates whether the value reported in the row is calculated, or uniquely reported within the table.
utility_id_ferc1	integer	PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.
utility_id_ferc1_dbf	integer	FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year.
utility_id_ferc1_xbrl	string	FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year.
utility_id_pudl	integer	A manually assigned PUDL utility ID. May not be stable over time.
utility_name_ferc1	string	Name of the responding utility, as it is reported in FERC Form 1. For human readability only.

out_ferc1__yearly_hydroelectric_plants_sched406#

Noticed any irregularities with this table? File a report.

Annual time series of plant statistics for large hydroelectric generating plants.

Most-recent data:: 2024
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 406)
Primary key:: record_id The best approximation for primary keys for this table would be: report_year, utility_id_ferc1, plant_name_ferc1. FERC does not publish plant IDs. The main identifying column is plant_name_ferc1 but that is a free-form string field and there are duplicate records.

Usage Warnings#

The plant_type and construction_type are standardized into categorical values from free-form strings.
FERC does not restrict respondents to report unique and non-duplicative plant records. There are sporadic instances of respondents reporting portions of plants and then the total plant (ex: unit 1, unit 2 and total). Use caution when aggregating.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

Large plants have an installed nameplate capacity of more than 10 MW.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_hydroelectric_plants_sched406.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_hydroelectric_plants_sched406.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_hydroelectric_plants_sched406.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_hydroelectric_plants_sched406.parquet"
)

Columns#

Field Name	Type	Description
asset_retirement_cost	number	Asset retirement cost (USD).
avg_num_employees	number	The average number of employees assigned to each plant.
capacity_factor	number	Fraction of potential generation that was actually reported for a plant part. Energy generated over time period / nameplate capacity * time period (hours/years/etc.).
capacity_mw	number	Total installed (nameplate) capacity, in megawatts.
capex_equipment	number	Cost of plant: equipment (USD).
capex_facilities	number	Cost of plant: reservoirs, dams, and waterways (USD).
capex_land	number	Cost of plant: land and land rights (USD).
capex_per_mw	number	Cost of plant per megawatt of installed (nameplate) capacity. Nominal USD.
capex_roads	number	Cost of plant: roads, railroads, and bridges (USD).
capex_structures	number	Cost of plant: structures and improvements (USD).
capex_total	number	Total cost of plant (USD).
construction_type	string	Type of plant construction (‘outdoor’, ‘semioutdoor’, or ‘conventional’). Categorized by PUDL based on our best guess of intended value in FERC1 freeform strings.
construction_year	integer	Year the plant’s oldest still operational unit was built.
installation_year	integer	Year the plant’s most recently built unit was installed.
net_capacity_adverse_conditions_mw	number	Net plant capability under the least favorable operating conditions, in megawatts.
net_capacity_favorable_conditions_mw	number	Net plant capability under the most favorable operating conditions, in megawatts.
net_generation_mwh	number	Net electricity generation for the specified period in megawatt-hours (MWh).
opex_dams	number	Production expenses: maintenance of reservoirs, dams, and waterways (USD).
opex_electric	number	Production expenses: electric expenses (USD).
opex_engineering	number	Production expenses: maintenance, supervision, and engineering (USD).
opex_generation_misc	number	Production expenses: miscellaneous power generation expenses (USD).
opex_hydraulic	number	Production expenses: hydraulic expenses (USD).
opex_misc_plant	number	Production expenses: maintenance of miscellaneous hydraulic plant (USD).
opex_operations	number	Production expenses: operations, supervision, and engineering (USD).
opex_per_mwh	number	Total production expenses (USD per MWh generated).
opex_plant	number	Production expenses: maintenance of electric plant (USD).
opex_rents	number	Production expenses: rents (USD).
opex_structures	number	Production expenses: maintenance of structures (USD).
opex_total	number	Total production expenses, excluding fuel (USD).
opex_total_nonfuel	number	Total production expenses, excluding fuel (USD).
opex_water_for_power	number	Production expenses: water for power (USD).
peak_demand_mw	number	Net peak demand for 60 minutes. Note: in some cases peak demand for other time periods may have been reported instead, if hourly peak demand was unavailable.
plant_hours_connected_while_generating	number	Hours the plant was connected to load while generating in the report year.
plant_id_pudl	integer	A manually assigned PUDL plant ID. May not be constant over time.
plant_name_ferc1	string	Name of the plant, as reported to FERC. This is a freeform string, not guaranteed to be consistent across references to the same plant.
plant_type	string	Type of plant.
project_num	integer	FERC Licensed Project Number.
record_id	string	Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.
report_year	integer	Four-digit year in which the data was reported.
utility_id_ferc1	integer	PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.
utility_id_ferc1_dbf	integer	FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year.
utility_id_ferc1_xbrl	string	FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year.
utility_id_pudl	integer	A manually assigned PUDL utility ID. May not be stable over time.
utility_name_ferc1	string	Name of the responding utility, as it is reported in FERC Form 1. For human readability only.

out_ferc1__yearly_income_statements_sched114#

Noticed any irregularities with this table? File a report.

Annual time series of utility income statements.

Most-recent data:: 2024
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 114)
Primary key:: utility_id_ferc1, report_year, utility_type, income_type

Usage Warnings#

Some columns contain subtotals; use caution when choosing columns to aggregate.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_income_statements_sched114.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_income_statements_sched114.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_income_statements_sched114.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_income_statements_sched114.parquet"
)

Columns#

Field Name	Type	Description
balance	string	Indication of whether a column is a credit or debit, as reported in the XBRL taxonomy.
dollar_value	number	Dollar value of reported income, expense, asset, or liability.
ferc_account	string	Actual FERC Account number (e.g. ‘359.1’) if available, or a PUDL assigned ID when FERC accounts have been split or combined in reporting.
income_type	string	Type of income reported in core_ferc1__yearly_income_statements_sched114 table.
record_id	string	Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.
report_year	integer	Four-digit year in which the data was reported.
row_type_xbrl	string	Indicates whether the value reported in the row is calculated, or uniquely reported within the table.
utility_id_ferc1	integer	PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.
utility_id_ferc1_dbf	integer	FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year.
utility_id_ferc1_xbrl	string	FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year.
utility_id_pudl	integer	A manually assigned PUDL utility ID. May not be stable over time.
utility_name_ferc1	string	Name of the responding utility, as it is reported in FERC Form 1. For human readability only.
utility_type	string	Listing of utility plant types. Examples include Electric Utility, Gas Utility, and Other Utility.

out_ferc1__yearly_operating_expenses_sched320#

Noticed any irregularities with this table? File a report.

Annual time series of operating and maintenance costs associated with producing electricity.

Most-recent data:: 2024
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 320)
Primary key:: utility_id_ferc1, report_year, expense_type

Usage Warnings#

Some columns contain subtotals; use caution when choosing columns to aggregate.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_operating_expenses_sched320.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_operating_expenses_sched320.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_operating_expenses_sched320.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_operating_expenses_sched320.parquet"
)

Columns#

Field Name	Type	Description
dollar_value	number	Dollar value of reported income, expense, asset, or liability.
expense_type	string	The type of expense.
ferc_account	string	Actual FERC Account number (e.g. ‘359.1’) if available, or a PUDL assigned ID when FERC accounts have been split or combined in reporting.
record_id	string	Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.
report_year	integer	Four-digit year in which the data was reported.
row_type_xbrl	string	Indicates whether the value reported in the row is calculated, or uniquely reported within the table.
utility_id_ferc1	integer	PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.
utility_id_ferc1_dbf	integer	FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year.
utility_id_ferc1_xbrl	string	FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year.
utility_id_pudl	integer	A manually assigned PUDL utility ID. May not be stable over time.
utility_name_ferc1	string	Name of the responding utility, as it is reported in FERC Form 1. For human readability only.
utility_type	string	Listing of utility plant types. Examples include Electric Utility, Gas Utility, and Other Utility.

out_ferc1__yearly_operating_revenues_sched300#

Noticed any irregularities with this table? File a report.

Annual time series of utilities’ electric operating revenues.

Most-recent data:: 2024
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 300)
Primary key:: utility_id_ferc1, report_year, revenue_type

Usage Warnings#

Some columns contain subtotals; use caution when choosing columns to aggregate.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

This table includes only the structured part of schedule 300. There are a number of revenue_type’s that do not have sales_mwh,or avg_customers_per_month provided, in which case these columnswill be NULL.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_operating_revenues_sched300.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_operating_revenues_sched300.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_operating_revenues_sched300.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_operating_revenues_sched300.parquet"
)

Columns#

Field Name	Type	Description
avg_customers_per_month	number	Average number of customers per month.
dollar_value	number	Dollar value of reported income, expense, asset, or liability.
ferc_account	string	Actual FERC Account number (e.g. ‘359.1’) if available, or a PUDL assigned ID when FERC accounts have been split or combined in reporting.
record_id	string	Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.
report_year	integer	Four-digit year in which the data was reported.
revenue_type	string	Label describing types of revenues.
row_type_xbrl	string	Indicates whether the value reported in the row is calculated, or uniquely reported within the table.
sales_mwh	number	Quantity of electricity sold in MWh.
utility_id_ferc1	integer	PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.
utility_id_ferc1_dbf	integer	FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year.
utility_id_ferc1_xbrl	string	FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year.
utility_id_pudl	integer	A manually assigned PUDL utility ID. May not be stable over time.
utility_name_ferc1	string	Name of the responding utility, as it is reported in FERC Form 1. For human readability only.
utility_type	string	Listing of utility plant types. Examples include Electric Utility, Gas Utility, and Other Utility.

out_ferc1__yearly_other_regulatory_assets_sched232#

Noticed any irregularities with this table? File a report.

Annual time series of utilities’ other regulatory assets, including rate order docket number.

Most-recent data:: 2024
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 232)
Primary key:: This table has no primary key. Respondents are able to enter any number of assets across many rows. There are no IDs or set fields enforced in the original table.

Usage Warnings#

Contains columns which may appear categorical, but are actually free text.
Some columns contain subtotals; use caution when choosing columns to aggregate.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_other_regulatory_assets_sched232.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_other_regulatory_assets_sched232.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_other_regulatory_assets_sched232.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_other_regulatory_assets_sched232.parquet"
)

Columns#

Field Name	Type	Description
account_detail	string	Description of the account number credited from making debit adjustment to other regulatory liabilities.
additional_description	string	Additional free-form description column. In most instances, the values in this column are exactly the same as the description column,but it differs from the standard description enough that we kept both columns. This field did not exist prior to FERC publishing Form 1 as XBRL and thus is always null prior to 2021.
credits_written_off_recovered	number	Recovered amount of divestiture of other assets lacking physical substance resulting from the ratemaking actions of regulatory agencies.
debits	number	The increase (decrease) during the period in the value of other assets resulting from the ratemaking actions of regulatory agencies.
description	string	Long human-readable description of the meaning of a code/label.
ending_balance	number	Account balance at end of year.
report_year	integer	Four-digit year in which the data was reported.
starting_balance	number	Account balance at beginning of year.
utility_id_ferc1	integer	PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.
utility_id_ferc1_dbf	integer	FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year.
utility_id_ferc1_xbrl	string	FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year.
utility_id_pudl	integer	A manually assigned PUDL utility ID. May not be stable over time.
utility_name_ferc1	string	Name of the responding utility, as it is reported in FERC Form 1. For human readability only.

out_ferc1__yearly_other_regulatory_liabilities_sched278#

Noticed any irregularities with this table? File a report.

Annual time series of utilities’ other regulatory liabilities, including rate order docket number.

Most-recent data:: 2024
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 278)
Primary key:: This table has no primary key. Respondents are able to enter any number of liabilities across many rows. There are no IDs or set fields enforced in the original table.

Usage Warnings#

Contains columns which may appear categorical, but are actually free text.
Some columns contain subtotals; use caution when choosing columns to aggregate.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_other_regulatory_liabilities_sched278.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_other_regulatory_liabilities_sched278.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_other_regulatory_liabilities_sched278.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_other_regulatory_liabilities_sched278.parquet"
)

Columns#

Field Name	Type	Description
account_detail	string	Description of the account number credited from making debit adjustment to other regulatory liabilities.
decrease_in_other_regulatory_liabilities	number	The decrease during the reporting period of other regulatory liabilities.
description	string	Long human-readable description of the meaning of a code/label.
ending_balance	number	Account balance at end of year.
increase_in_other_regulatory_liabilities	number	The increase during the reporting period of other regulatory liabilities.
report_year	integer	Four-digit year in which the data was reported.
starting_balance	number	Account balance at beginning of year.
utility_id_ferc1	integer	PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.
utility_id_ferc1_dbf	integer	FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year.
utility_id_ferc1_xbrl	string	FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year.
utility_id_pudl	integer	A manually assigned PUDL utility ID. May not be stable over time.
utility_name_ferc1	string	Name of the responding utility, as it is reported in FERC Form 1. For human readability only.

out_ferc1__yearly_plant_in_service_sched204#

Noticed any irregularities with this table? File a report.

Annual time series of utilities’ balances and changes to FERC Electric Plant in Service accounts.

Most-recent data:: 2024
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 204)
Primary key:: utility_id_ferc1, report_year, ferc_account_label

Usage Warnings#

Some columns contain subtotals; use caution when choosing columns to aggregate.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

Account numbers correspond to the FERC Uniform System of Accounts for Electric Plant, which is defined in Code of Federal Regulations (CFR) Title 18, Chapter I, Subchapter C, Part 101. (See e.g. https://www.law.cornell.edu/cfr/text/18/part-101). Each FERC respondent reports starting and ending balances for each account annually. Balances are organization wide, and are not broken down on a per-plant basis. End of year balance should equal beginning year balance plus the sum of additions, retirements, adjustments, and transfers.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_plant_in_service_sched204.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_plant_in_service_sched204.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_plant_in_service_sched204.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_plant_in_service_sched204.parquet"
)

Columns#

Field Name	Type	Description
additions	number	Cost of acquisition of items classified within the account.
adjustments	number	Cost of adjustments to the account.
ending_balance	number	Account balance at end of year.
ferc_account	string	Actual FERC Account number (e.g. ‘359.1’) if available, or a PUDL assigned ID when FERC accounts have been split or combined in reporting.
ferc_account_label	string	Long FERC account identifier derived from values reported in the XBRL taxonomies. May also refer to aggregations of individual FERC accounts.
plant_status	string	Utility plant financial status (in service, future, leased, total).
record_id	string	Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.
report_year	integer	Four-digit year in which the data was reported.
retirements	number	Cost of disposal of items classified within the account.
row_type_xbrl	string	Indicates whether the value reported in the row is calculated, or uniquely reported within the table.
starting_balance	number	Account balance at beginning of year.
transfers	number	Cost of transfers into (out of) the account.
utility_id_ferc1	integer	PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.
utility_id_ferc1_dbf	integer	FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year.
utility_id_ferc1_xbrl	string	FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year.
utility_id_pudl	integer	A manually assigned PUDL utility ID. May not be stable over time.
utility_name_ferc1	string	Name of the responding utility, as it is reported in FERC Form 1. For human readability only.
utility_type	string	Listing of utility plant types. Examples include Electric Utility, Gas Utility, and Other Utility.

out_ferc1__yearly_pumped_storage_plants_sched408#

Noticed any irregularities with this table? File a report.

Annual time series of plant statistics for hydroelectric pumped storage plants with an installed nameplate capacity of 10+ MW.

Most-recent data:: 2024
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 408)
Primary key:: record_id The best approximation for primary keys for this table would be: report_year, utility_id_ferc1, plant_name_ferc1. FERC does not publish plant IDs. The main identifying column is plant_name_ferc1 but that is a free-form string field and there are duplicate records.

Usage Warnings#

FERC does not restrict respondents to report unique and non-duplicative plant records. There are sporadic instances of respondents reporting portions of plants and then the total plant (ex: unit 1, unit 2 and total). Use caution when aggregating.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

As reported in Schedule 408 of FERC Form 1.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_pumped_storage_plants_sched408.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_pumped_storage_plants_sched408.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_pumped_storage_plants_sched408.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_pumped_storage_plants_sched408.parquet"
)

Columns#

Field Name	Type	Description
asset_retirement_cost	number	Asset retirement cost (USD).
avg_num_employees	number	The average number of employees assigned to each plant.
capacity_factor	number	Fraction of potential generation that was actually reported for a plant part. Energy generated over time period / nameplate capacity * time period (hours/years/etc.).
capacity_mw	number	Total installed (nameplate) capacity, in megawatts.
capex_equipment_electric	number	Cost of plant: accessory electric equipment (USD).
capex_equipment_misc	number	Cost of plant: miscellaneous power plant equipment (USD).
capex_facilities	number	Cost of plant: reservoirs, dams, and waterways (USD).
capex_land	number	Cost of plant: land and land rights (USD).
capex_per_mw	number	Cost of plant per megawatt of installed (nameplate) capacity. Nominal USD.
capex_roads	number	Cost of plant: roads, railroads, and bridges (USD).
capex_structures	number	Cost of plant: structures and improvements (USD).
capex_total	number	Total cost of plant (USD).
capex_wheels_turbines_generators	number	Cost of plant: water wheels, turbines, and generators (USD).
construction_type	string	Type of plant construction (‘outdoor’, ‘semioutdoor’, or ‘conventional’). Categorized by PUDL based on our best guess of intended value in FERC1 freeform strings.
construction_year	integer	Year the plant’s oldest still operational unit was built.
energy_used_for_pumping_mwh	number	Energy used for pumping, in megawatt-hours.
installation_year	integer	Year the plant’s most recently built unit was installed.
net_generation_mwh	number	Net electricity generation for the specified period in megawatt-hours (MWh).
net_load_mwh	number	Net output for load (net generation - energy used for pumping) in megawatt-hours.
opex_dams	number	Production expenses: maintenance of reservoirs, dams, and waterways (USD).
opex_electric	number	Production expenses: electric expenses (USD).
opex_engineering	number	Production expenses: maintenance, supervision, and engineering (USD).
opex_generation_misc	number	Production expenses: miscellaneous power generation expenses (USD).
opex_misc_plant	number	Production expenses: maintenance of miscellaneous hydraulic plant (USD).
opex_operations	number	Production expenses: operations, supervision, and engineering (USD).
opex_per_mwh	number	Total production expenses (USD per MWh generated).
opex_plant	number	Production expenses: maintenance of electric plant (USD).
opex_production_before_pumping	number	Total production expenses before pumping (USD).
opex_pumped_storage	number	Production expenses: pumped storage (USD).
opex_pumping	number	Production expenses: We are here to PUMP YOU UP! (USD).
opex_rents	number	Production expenses: rents (USD).
opex_structures	number	Production expenses: maintenance of structures (USD).
opex_total	number	Total production expenses, excluding fuel (USD).
opex_total_nonfuel	number	Total production expenses, excluding fuel (USD).
opex_water_for_power	number	Production expenses: water for power (USD).
peak_demand_mw	number	Net peak demand for 60 minutes. Note: in some cases peak demand for other time periods may have been reported instead, if hourly peak demand was unavailable.
plant_capability_mw	number	Net plant capability in megawatts.
plant_hours_connected_while_generating	number	Hours the plant was connected to load while generating in the report year.
plant_id_pudl	integer	A manually assigned PUDL plant ID. May not be constant over time.
plant_name_ferc1	string	Name of the plant, as reported to FERC. This is a freeform string, not guaranteed to be consistent across references to the same plant.
project_num	integer	FERC Licensed Project Number.
record_id	string	Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.
report_year	integer	Four-digit year in which the data was reported.
utility_id_ferc1	integer	PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.
utility_id_ferc1_dbf	integer	FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year.
utility_id_ferc1_xbrl	string	FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year.
utility_id_pudl	integer	A manually assigned PUDL utility ID. May not be stable over time.
utility_name_ferc1	string	Name of the responding utility, as it is reported in FERC Form 1. For human readability only.

out_ferc1__yearly_purchased_power_and_exchanges_sched326#

Noticed any irregularities with this table? File a report.

Annual time series of purchased power (Account 555) including power exchanges (transactions involving a balancing of debits and credits for energy, capacity, etc.) and any settlements for imbalanced exchanges.

Most-recent data:: 2024
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 326)
Primary key:: This table has no primary key. The primary key for this table would be record_id, however a small number of these values are duplicated in 2021, 2022, and 2024.

Usage Warnings#

Contains columns which may appear categorical, but are actually free text.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

This table has data about inter-utility power purchases. This includes how much electricity was purchased, how much it cost, and who it was purchased from. Unfortunately the field describing which other utility the power was being bought from (seller_name) is poorly standardized, making it difficult to correlate with other data.

Purchased Power is considered FERC Account 555 according to FERC’s Uniform System of Accounts. Reported on pages 326-327 of FERC Form 1.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_purchased_power_and_exchanges_sched326.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_purchased_power_and_exchanges_sched326.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_purchased_power_and_exchanges_sched326.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_purchased_power_and_exchanges_sched326.parquet"
)

Columns#

Field Name	Type	Description
billing_demand_mw	number	Monthly average billing demand (for requirements purchases, and any transactions involving demand charges). In megawatts.
coincident_peak_demand_mw	number	Average monthly coincident peak (CP) demand (for requirements purchases, and any transactions involving demand charges). Monthly CP demand is the metered demand during the hour (60-minute integration) in which the supplier’s system reaches its monthly peak. In megawatts.
delivered_mwh	number	Gross megawatt-hours delivered in power exchanges and used as the basis for settlement.
demand_charges	number	Demand charges (USD).
energy_charges	number	Energy charges (USD).
non_coincident_peak_demand_mw	number	Average monthly non-coincident peak (NCP) demand (for requirements purhcases, and any transactions involving demand charges). Monthly NCP demand is the maximum metered hourly (60-minute integration) demand in a month. In megawatts.
other_charges	number	Other charges, including out-of-period adjustments (USD).
purchase_type_code	string	Categorization based on the original contractual terms and conditions of the service. Must be one of ‘requirements’, ‘long_firm’, ‘intermediate_firm’, ‘short_firm’, ‘long_unit’, ‘intermediate_unit’, ‘electricity_exchange’, ‘other_service’, or ‘adjustment’. Requirements service is ongoing high reliability service, with load integrated into system resource planning. ‘Long term’ means 5+ years. ‘Intermediate term’ is 1-5 years. ‘Short term’ is less than 1 year. ‘Firm’ means not interruptible for economic reasons. ‘unit’ indicates service from a particular designated generating unit. ‘exchange’ is an in-kind transaction.
purchased_mwh	number	Megawatt-hours shown on bills rendered to the respondent. Includes both electricity purchased for storage and non-storage purposes, which were lumped together prior to 2021.
purchased_other_than_storage_mwh	number	Number of megawatt hours purchased during the period for other than energy storage.
purchased_storage_mwh	number	Number of megawatt hours purchased during the period for energy storage.
received_mwh	number	Gross megawatt-hours received in power exchanges and used as the basis for settlement.
record_id	string	Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.
report_year	integer	Four-digit year in which the data was reported.
seller_name	string	Name of the seller, or the other party in an exchange transaction.
tariff	string	FERC Rate Schedule Number or Tariff. (Note: may be incomplete if originally reported on multiple lines.)
total_settlement	number	Sum of demand, energy, and other charges (USD). For power exchanges, the settlement amount for the net receipt of energy. If more energy was delivered than received, this amount is negative.
utility_id_ferc1	integer	PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.
utility_id_ferc1_dbf	integer	FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year.
utility_id_ferc1_xbrl	string	FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year.
utility_id_pudl	integer	A manually assigned PUDL utility ID. May not be stable over time.
utility_name_ferc1	string	Name of the responding utility, as it is reported in FERC Form 1. For human readability only.

out_ferc1__yearly_rate_base#

Noticed any irregularities with this table? File a report.

Annual time series of granular accounting data consisting of what utilities can typically include in their rate bases.

Most-recent data:: 2024
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: FERC Form 1 – Annual Report of Major Electric Utilities (Schedules 110, 118, 200, 204 219 and 320)
Primary key:: This table has no primary key.

Usage Warnings#

The data from these xbrl_factoid tables contains nested totals and subtotals - making aggregations difficult. We used FERC 1’s reported calculations to determine the most granular, non-duplicative records. The reported calculations are not always perfect - we’ve corrected many of them, but errors could still exist.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

This table is derived from seven FERC Form 1 accounting tables with nested calculations. We reconciled these nested calculations and then identified the most granular data across the tables. Here are the three direct upstream inputs - the two detailed tables have several core_ferc1 inputs each:

out_ferc1__yearly_detailed_balance_sheet_assets
out_ferc1__yearly_detailed_balance_sheet_liabilities
core_ferc1__yearly_operating_expenses_sched320

We applied slight modifications to three columns (utility_type, plant_function & plant_status) as compared to the originally reported values in our core tables. The modifications were applied to either provide more specificity (i.e. we converted some total utility_type’s into electric) or to condense similar categories for easier analysis (i.e. creating a hydraulic_production plant_function by combining hydraulic_production_conventional and hydraulic_production_pumped_storage). See pudl.output.ferc1.Exploder for more details. This table was made entirely with support and direction from RMI.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_rate_base.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_rate_base.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_rate_base.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_rate_base.parquet"
)

Columns#

Field Name	Type	Description
ending_balance	number	Account balance at end of year.
ferc_account	string	Actual FERC Account number (e.g. ‘359.1’) if available, or a PUDL assigned ID when FERC accounts have been split or combined in reporting.
is_disaggregated_in_rate_base	boolean	Indicates whether or not records with null values in the in_rate_base column were disaggregated. See documentation for process: pudl.output.ferc1.disaggregate_null_or_total_tag
is_disaggregated_utility_type	boolean	Indicates whether or not records with null or total values in the utility_type column were disaggregated. See documentation for process: pudl.output.ferc1.disaggregate_null_or_total_tag
plant_function	string	Functional role played by utility plant (steam production, nuclear production, distribution, transmission, etc.).
plant_status	string	Utility plant financial status (in service, future, leased, total).
rate_base_category	string	A category of asset or liability that RMI compiled to use as a shorthand for various types of utility assets. These tags were compiled manually based on the xbrl_factoid and sometimes varies based on the utility_type, plant_function or plant_status as well.
record_id	string	Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.
report_year	integer	Four-digit year in which the data was reported.
row_type_xbrl	string	Indicates whether the value reported in the row is calculated, or uniquely reported within the table.
table_name	string	The name of the PUDL database table where a given record originated from.
utility_id_ferc1	integer	PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.
utility_id_ferc1_dbf	integer	FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year.
utility_id_ferc1_xbrl	string	FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year.
utility_id_pudl	integer	A manually assigned PUDL utility ID. May not be stable over time.
utility_name_ferc1	string	Name of the responding utility, as it is reported in FERC Form 1. For human readability only.
utility_type	string	Listing of utility plant types.
utility_type_other	string	Freeform description of type of utility reported in one of the other three other utility_type sections in the core_ferc1__yearly_utility_plant_summary_sched200 table. This field is reported only in the DBF reporting years (1994-2020).
xbrl_factoid	string	The name of type of value which is a derivative of the XBRL fact name.

out_ferc1__yearly_retained_earnings_sched118#

Noticed any irregularities with this table? File a report.

Annual time series of utilities’ statements of retained earnings.

Most-recent data:: 2024
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 118)
Primary key:: utility_id_ferc1, report_year, earnings_type

Usage Warnings#

Some columns contain subtotals; use caution when choosing columns to aggregate.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_retained_earnings_sched118.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_retained_earnings_sched118.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_retained_earnings_sched118.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_retained_earnings_sched118.parquet"
)

Columns#

Field Name	Type	Description
balance	string	Indication of whether a column is a credit or debit, as reported in the XBRL taxonomy.
earnings_type	string	Label describing types of earnings.
ending_balance	number	Account balance at end of year.
ferc_account	string	Actual FERC Account number (e.g. ‘359.1’) if available, or a PUDL assigned ID when FERC accounts have been split or combined in reporting.
record_id	string	Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.
report_year	integer	Four-digit year in which the data was reported.
row_type_xbrl	string	Indicates whether the value reported in the row is calculated, or uniquely reported within the table.
starting_balance	number	Account balance at beginning of year.
utility_id_ferc1	integer	PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.
utility_id_ferc1_dbf	integer	FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year.
utility_id_ferc1_xbrl	string	FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year.
utility_id_pudl	integer	A manually assigned PUDL utility ID. May not be stable over time.
utility_name_ferc1	string	Name of the responding utility, as it is reported in FERC Form 1. For human readability only.
utility_type	string	Listing of utility plant types. Examples include Electric Utility, Gas Utility, and Other Utility.

out_ferc1__yearly_sales_by_rate_schedules_sched304#

Noticed any irregularities with this table? File a report.

Annual time series of utilities’ electric sales from all rate schedules in effect throughout the year.

Most-recent data:: 2024
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 304)
Primary key:: This table has no primary key.

Usage Warnings#

Some columns contain subtotals; use caution when choosing columns to aggregate.
Values in rate_schedule_description are free-form strings.
Data prior to 2021 does not include information in columns: rate_schedule_type and billing_status.
Units of revenue_per_kwh are suspected to include a mix of dollars and possibly cents.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

The pre-2021 data in this table (extracted from FoxProDB vs. XBRL) is extremely unstructured. Where the post-2020 data (from XBRL) sorts the data into rate schedule types: residential, industrial, commercial, public_lighting, public_authorities, railroads, interdepartmental, provision_for_rate_refund, commercial_and_industrial, total, and billing status: billed, unbilled, total, the pre-2021 data stuffs all of that information (if you’re lucky) into the rate_schedule_description column. There’s no point trying to parse through the pre 2021 rate_schedule_description column en masse because it’s just too messy. The contents of rate_schedule_description often contain numbers and acronyms that have little to no meaning out of context. The table is structured somewhat like the FERC1 small generators table with headings about rate structure type also embedded into the rate_schedule_description column. To all who dare, beware.

This table is a combination of one pre-2021 (DBF) table and nine post-2020 (XBRL) tables–one for each rate schedule type plus totals–hence increase in data clarity post-2020. The rate_schedule_type and billing_status columns are only relevant for post-2020 data as they can be reliably parsed from each of the tables and incorporated into columns. The rate_schedule_description is supposed to contain sub-rate_schedule_type names for charges (Ex: Residential 1, Residential 2, etc.). However, the pre-2021 data contains a little bit of everything (or nothing) and the post-2020 has some totals or wonky data thrown in. That’s to say, even when working with post-2020 data, be wary of aggregating the data. That’s what the “total” rows are for.

The values that come from from the totals table are marked with the string “total” in the rate_schedule_description column. The totals table is a product of the transition to XBRL, so these distinguishable totals are only available for data post-2020 (otherwise you could try keyword searching for “total” in rate_schedule_description). The total table contains two types of totals, the utility totals across all rate schedules in a given year (marked with rate_schedule_description = “total” and rate_schedule_type = “total”) and each of the utility’s individual rate schedule totals in a given year (marked with rate_schedule_description = “total” and rate_schedule_type = “residential” or any other rate schedule type).

The rate schedule based XBRL tables only report billed values whereas the total tables report billed, unbilled, and total values. (See the column description for more info on the difference between billed and unbilled). This is important to consider if you’re endeavoring to compare the subtotal values with the total values. We have not attempted to fix or verify any subtotals or totals that don’t add up.

Another important note is the possibility of unit discrepancies in certain columns. The revenue_per_kwh column does not specify reporting units, and closer inspection of the data reveals two clear peaks approximate two orders of magnitude apart. This indicates that values may be reported in both dollars and cents. However, because the price of energy per kwh varies so much regionally, we cannot guarantee which is which and have not put any cleaning mechanisms in place to account for this.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_sales_by_rate_schedules_sched304.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_sales_by_rate_schedules_sched304.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_sales_by_rate_schedules_sched304.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_sales_by_rate_schedules_sched304.parquet"
)

Columns#

Field Name	Type	Description
avg_customers_per_month	number	Average number of customers per month.
billing_status	string	Whether an amount is billed, unbilled, or both. Billed amounts pertain to the exchange of energy and unbilled amounts pertain to other sources of revenue such as contracts with peaker plants to keep them on standby or charging rent to host cell antennas on transmission towers.
dollar_value	number	Dollar value of reported income, expense, asset, or liability.
kwh_per_customer	number	kWh per customer.
rate_schedule_description	string	Free-form description of what the rate schedule name is. Not standardized. Often a sub-category of rate_schedule_type.
rate_schedule_type	string	Categorization of rate schedule type.
record_id	string	Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.
report_year	integer	Four-digit year in which the data was reported.
revenue_per_kwh	number	The amount of revenue per kWh by rate schedule acquired in the given report year.
sales_mwh	number	Quantity of electricity sold in MWh.
utility_id_ferc1	integer	PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.
utility_id_ferc1_dbf	integer	FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year.
utility_id_ferc1_xbrl	string	FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year.
utility_id_pudl	integer	A manually assigned PUDL utility ID. May not be stable over time.
utility_name_ferc1	string	Name of the responding utility, as it is reported in FERC Form 1. For human readability only.

out_ferc1__yearly_small_plants_sched410#

Noticed any irregularities with this table? File a report.

Annual time series of plant statistics for internal combustion plants, gas turbine-plants, conventional hydro plants, and pumped storage plants with less than 10 MW installed nameplate capacity and steam plants with less than 25 MW installed nameplate capacity.

Most-recent data:: 2024
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 410)
Primary key:: record_id The best approximation for primary keys for this table would be: report_year, utility_id_ferc1, plant_name_ferc1. FERC does not publish plant IDs. The main identifying column is plant_name_ferc1 but that is a free-form string field and there are duplicate records.

Usage Warnings#

FERC does not restrict respondents to report unique and non-duplicative plant records. There are sporadic instances of respondents reporting portions of plants and then the total plant (ex: unit 1, unit 2 and total). Use caution when aggregating.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

As reported on FERC Form 1 Schedule 410 (pages 410-411) and extracted from the FERC Visual FoxPro and XBRL. See our pudl.extract.ferc1.TABLE_NAME_MAP_FERC1 for links to the raw tables.

The raw version of this table is more like a digitized PDF than an actual data table. The rows contain lots of information in addition to what the columns might suggest. For instance, a single column may contain header rows, note rows, and total rows. This extraneous information is useful, but it prevents proper analysis when mixed in with the rest of the values data in the column. We employ a couple of data transformations to extract these rows from the data and preserve some of the information they contain (fuel type, plant type, FERC license, or general notes about the plant) in separate columns.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_small_plants_sched410.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_small_plants_sched410.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_small_plants_sched410.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_small_plants_sched410.parquet"
)

Columns#

Field Name	Type	Description
capacity_mw	number	Total installed (nameplate) capacity, in megawatts.
capex_per_mw	number	Cost of plant per megawatt of installed (nameplate) capacity. Nominal USD.
capex_total	number	Total cost of plant (USD).
construction_year	integer	Year the plant’s oldest still operational unit was built.
fuel_cost_per_mmbtu	number	Average fuel cost per MMBTU of heat content in nominal USD.
fuel_type	string	Type of fuel.
license_id_ferc1	integer	FERC issued operating license ID for the facility, if available. This value is extracted from the original plant name where possible.
net_generation_mwh	number	Net electricity generation for the specified period in megawatt-hours (MWh).
opex_fuel	number	Production expenses: fuel (USD).
opex_maintenance	number	Production expenses: Maintenance (USD).
opex_operations	number	Production expenses: operations, supervision, and engineering (USD).
opex_total	number	Total production expenses, excluding fuel (USD).
opex_total_nonfuel	number	Total production expenses, excluding fuel (USD).
peak_demand_mw	number	Net peak demand for 60 minutes. Note: in some cases peak demand for other time periods may have been reported instead, if hourly peak demand was unavailable.
plant_id_pudl	integer	A manually assigned PUDL plant ID. May not be constant over time.
plant_name_ferc1	string	Name of the plant, as reported to FERC. This is a freeform string, not guaranteed to be consistent across references to the same plant.
plant_type	string	Type of plant.
record_id	string	Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.
report_year	integer	Four-digit year in which the data was reported.
utility_id_ferc1	integer	PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.
utility_id_ferc1_dbf	integer	FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year.
utility_id_ferc1_xbrl	string	FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year.
utility_id_pudl	integer	A manually assigned PUDL utility ID. May not be stable over time.
utility_name_ferc1	string	Name of the responding utility, as it is reported in FERC Form 1. For human readability only.

out_ferc1__yearly_steam_plants_fuel_by_plant_sched402#

Noticed any irregularities with this table? File a report.

Annual time series of FERC fuel data by plant.

Most-recent data:: 2024
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 402)
Primary key:: report_year, utility_id_ferc1, plant_name_ferc1

Usage Warnings#

Some columns contain subtotals; use caution when choosing columns to aggregate.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_steam_plants_fuel_by_plant_sched402.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_steam_plants_fuel_by_plant_sched402.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_steam_plants_fuel_by_plant_sched402.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_steam_plants_fuel_by_plant_sched402.parquet"
)

Columns#

Field Name	Type	Description
coal_fraction_cost	number	Coal cost as a percentage of overall fuel cost.
coal_fraction_mmbtu	number	Coal heat content as a percentage of overall fuel heat content (MMBTU).
fuel_cost	number	Total fuel cost for plant (in $USD).
fuel_mmbtu	number	Total heat content for plant (in MMBTU).
gas_fraction_cost	number	Natural gas cost as a percentage of overall fuel cost.
gas_fraction_mmbtu	number	Natural gas heat content as a percentage of overall fuel heat content (MMBTU).
nuclear_fraction_cost	number	Nuclear cost as a percentage of overall fuel cost.
nuclear_fraction_mmbtu	number	Nuclear heat content as a percentage of overall fuel heat content (MMBTU).
oil_fraction_cost	number	Oil cost as a percentage of overall fuel cost.
oil_fraction_mmbtu	number	Oil heat content as a percentage of overall fuel heat content (MMBTU).
plant_id_pudl	integer	A manually assigned PUDL plant ID. May not be constant over time.
plant_name_ferc1	string	Name of the plant, as reported to FERC. This is a freeform string, not guaranteed to be consistent across references to the same plant.
primary_fuel_by_cost	string	Primary fuel for plant as a percentage of cost.
primary_fuel_by_mmbtu	string	Primary fuel for plant as a percentage of heat content.
report_year	integer	Four-digit year in which the data was reported.
utility_id_ferc1	integer	PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.
utility_id_ferc1_dbf	integer	FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year.
utility_id_ferc1_xbrl	string	FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year.
utility_id_pudl	integer	A manually assigned PUDL utility ID. May not be stable over time.
utility_name_ferc1	string	Name of the responding utility, as it is reported in FERC Form 1. For human readability only.
waste_fraction_cost	number	Waste-heat cost as a percentage of overall fuel cost.
waste_fraction_mmbtu	number	Waste-heat heat content as a percentage of overall fuel heat content (MMBTU).

out_ferc1__yearly_steam_plants_fuel_sched402#

Noticed any irregularities with this table? File a report.

Annual time series of fuel cost and quantity for steam plants with a capacity of 25+ MW, internal combustion and gas-turbine plants of 10+ MW, and all nuclear plants.

Most-recent data:: 2024
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 402)
Primary key:: record_id

Usage Warnings#

The fuel_type_code_pudl is inferred from a free-form string field.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

This table is a subset of the steam plant table reported on page 402 of FERC Form 1.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_steam_plants_fuel_sched402.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_steam_plants_fuel_sched402.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_steam_plants_fuel_sched402.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_steam_plants_fuel_sched402.parquet"
)

Columns#

Field Name	Type	Description
fuel_consumed_mmbtu	number	Total consumption of fuel in physical unit, year to date. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production.
fuel_consumed_total_cost	number	Total cost of consumed fuel.
fuel_consumed_units	number	Consumption of the fuel type in physical unit. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production.
fuel_cost_per_mmbtu	number	Average fuel cost per MMBTU of heat content in nominal USD.
fuel_cost_per_unit_burned	number	Average cost of fuel consumed in the report year per reported fuel unit (USD).
fuel_cost_per_unit_delivered	number	Average cost of fuel delivered in the report year per reported fuel unit (USD).
fuel_mmbtu_per_unit	number	Heat content of the fuel in millions of Btus per physical unit.
fuel_type_code_pudl	string	Simplified fuel type code used in PUDL
fuel_units	string	Reported unit of measure for fuel.
plant_id_pudl	integer	A manually assigned PUDL plant ID. May not be constant over time.
plant_name_ferc1	string	Name of the plant, as reported to FERC. This is a freeform string, not guaranteed to be consistent across references to the same plant.
record_id	string	Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.
report_year	integer	Four-digit year in which the data was reported.
utility_id_ferc1	integer	PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.
utility_id_ferc1_dbf	integer	FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year.
utility_id_ferc1_xbrl	string	FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year.
utility_id_pudl	integer	A manually assigned PUDL utility ID. May not be stable over time.
utility_name_ferc1	string	Name of the responding utility, as it is reported in FERC Form 1. For human readability only.

out_ferc1__yearly_steam_plants_sched402#

Noticed any irregularities with this table? File a report.

Annual time series of plant statistics for steam plants with a capacity of 25+ MW, internal combustion and gas-turbine plants of 10+ MW, and all nuclear plants.

Most-recent data:: 2024
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 402)
Primary key:: record_id The best approximation for primary keys for this table would be: report_year, utility_id_ferc1, plant_name_ferc1. FERC does not publish plant IDs. The main identifying column is plant_name_ferc1 but that is a free-form string field and there are duplicate records.

Usage Warnings#

FERC does not restrict respondents to report unique and non-duplicative plant records. There are sporadic instances of respondents reporting portions of plants and then the total plant (ex: unit 1, unit 2 and total). Use caution when aggregating.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_steam_plants_sched402.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_steam_plants_sched402.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_steam_plants_sched402.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_steam_plants_sched402.parquet"
)

Columns#

Field Name	Type	Description
asset_retirement_cost	number	Asset retirement cost (USD).
avg_num_employees	number	The average number of employees assigned to each plant.
capacity_factor	number	Fraction of potential generation that was actually reported for a plant part. Energy generated over time period / nameplate capacity * time period (hours/years/etc.).
capacity_mw	number	Total installed (nameplate) capacity, in megawatts.
capex_annual_addition	number	Annual capital addition into capex_total.
capex_annual_addition_rolling	number	Year-to-date capital addition into capex_total.
capex_annual_per_kw	number	Annual capital addition into capex_total per kw.
capex_annual_per_mw	number	Annual capital addition into capex_total per MW.
capex_annual_per_mw_rolling	number	Year-to-date capital addition into capex_total per MW.
capex_annual_per_mwh	number	Annual capital addition into capex_total per MWh.
capex_annual_per_mwh_rolling	number	Year-to-date capital addition into capex_total per MWh.
capex_equipment	number	Cost of plant: equipment (USD).
capex_land	number	Cost of plant: land and land rights (USD).
capex_per_mw	number	Cost of plant per megawatt of installed (nameplate) capacity. Nominal USD.
capex_structures	number	Cost of plant: structures and improvements (USD).
capex_total	number	Total cost of plant (USD).
capex_wo_retirement_total	number	Total cost of plant (USD) without retirements.
construction_type	string	Type of plant construction (‘outdoor’, ‘semioutdoor’, or ‘conventional’). Categorized by PUDL based on our best guess of intended value in FERC1 freeform strings.
construction_year	integer	Year the plant’s oldest still operational unit was built.
installation_year	integer	Year the plant’s most recently built unit was installed.
net_generation_mwh	number	Net electricity generation for the specified period in megawatt-hours (MWh).
not_water_limited_capacity_mw	number	Plant capacity in MW when not limited by condenser water.
opex_allowances	number	Allowances.
opex_boiler	number	Maintenance of boiler (or reactor) plant.
opex_coolants	number	Cost of coolants and water (nuclear plants only)
opex_electric	number	Production expenses: electric expenses (USD).
opex_engineering	number	Production expenses: maintenance, supervision, and engineering (USD).
opex_fuel	number	Production expenses: fuel (USD).
opex_fuel_per_mwh	number	Production expenses: fuel (USD) per megawatt-hour (Mwh).
opex_misc_power	number	Miscellaneous steam (or nuclear) expenses.
opex_misc_steam	number	Maintenance of miscellaneous steam (or nuclear) plant.
opex_nonfuel_per_mwh	number	Investments in non-fuel production expenses per Mwh.
opex_operations	number	Production expenses: operations, supervision, and engineering (USD).
opex_per_mwh	number	Total production expenses (USD per MWh generated).
opex_plants	number	Maintenance of electrical plant.
opex_production_total	number	Total operating expenses.
opex_rents	number	Production expenses: rents (USD).
opex_steam	number	Steam expenses.
opex_steam_other	number	Steam from other sources.
opex_structures	number	Production expenses: maintenance of structures (USD).
opex_total_nonfuel	number	Total production expenses, excluding fuel (USD).
opex_transfer	number	Steam transferred (Credit).
peak_demand_mw	number	Net peak demand for 60 minutes. Note: in some cases peak demand for other time periods may have been reported instead, if hourly peak demand was unavailable.
plant_capability_mw	number	Net plant capability in megawatts.
plant_hours_connected_while_generating	number	Hours the plant was connected to load while generating in the report year.
plant_id_ferc1	integer	Algorithmically assigned PUDL FERC Plant ID. WARNING: NOT STABLE BETWEEN PUDL DB INITIALIZATIONS.
plant_id_pudl	integer	A manually assigned PUDL plant ID. May not be constant over time.
plant_name_ferc1	string	Name of the plant, as reported to FERC. This is a freeform string, not guaranteed to be consistent across references to the same plant.
plant_type	string	Type of plant.
record_id	string	Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.
report_year	integer	Four-digit year in which the data was reported.
utility_id_ferc1	integer	PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.
utility_id_ferc1_dbf	integer	FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year.
utility_id_ferc1_xbrl	string	FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year.
utility_id_pudl	integer	A manually assigned PUDL utility ID. May not be stable over time.
utility_name_ferc1	string	Name of the responding utility, as it is reported in FERC Form 1. For human readability only.
water_limited_capacity_mw	number	Plant capacity in MW when limited by condenser water.

out_ferc1__yearly_transmission_lines_sched422#

Noticed any irregularities with this table? File a report.

Annual time series of statistics about transmission lines.

Most-recent data:: 2024
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 422)
Primary key:: This table has no primary key. Each record of this table is supposed to represent one stretch of a transmission line, but there are no IDs and many nulls in the fields which would nominally distinguish unique transmission lines.

Usage Warnings#

Contains columns which may appear categorical, but are actually free text.
Some columns contain subtotals; use caution when choosing columns to aggregate.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

Information describing transmission lines, the cost of lines, annual operating and capital expenses, etc. This table includes transmission lines having nominal voltage of 132 kilovolts or greater. Transmission lines below these voltages are required to be reported in group totals only for each voltage.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_transmission_lines_sched422.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_transmission_lines_sched422.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_transmission_lines_sched422.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_transmission_lines_sched422.parquet"
)

Columns#

Field Name	Type	Description
capex_land	number	Cost of plant: land and land rights (USD).
capex_other	number	Other costs associated with the plant (USD).
capex_total	number	Total cost of plant (USD).
conductor_size_and_material	string	Size of transmission conductor and material of the transmission line.
designed_voltage_kv	number	Manufactured (Designed) voltage, expressed in kilo-volts, for three-phase 60 cycle alternative current transmission lines
end_point	string	The end point of a transmission line.
num_transmission_circuits	integer	Number of circuits in a transmission line.
operating_voltage_kv	number	The operating voltage, expressed kilo-volts, for three-phase 60 cycle alternative current transmission lines.
opex_maintenance	number	Production expenses: Maintenance (USD).
opex_operations	number	Production expenses: operations, supervision, and engineering (USD).
opex_rents	number	Production expenses: rents (USD).
opex_total	number	Total production expenses, excluding fuel (USD).
record_id	string	Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.
report_year	integer	Four-digit year in which the data was reported.
start_point	string	The starting point of a transmission line.
supporting_structure_type	string	Supporting structure of the transmission line.
transmission_line_and_structures_length_miles	number	Length (in pole miles or circuit miles (if transmission lines are underground)) for lines that are agrregated with other lines / structures (whose cost are aggregated and combined with other structures).
transmission_line_length_miles	number	Length (in pole miles or circuit miles (if transmission lines are underground)) for lines that are stand alone structures (whose cost are reported on a stand-alone basis).
utility_id_ferc1	integer	PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.
utility_id_ferc1_dbf	integer	FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year.
utility_id_ferc1_xbrl	string	FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year.
utility_id_pudl	integer	A manually assigned PUDL utility ID. May not be stable over time.
utility_name_ferc1	string	Name of the responding utility, as it is reported in FERC Form 1. For human readability only.

out_ferc1__yearly_utility_plant_summary_sched200#

Noticed any irregularities with this table? File a report.

Annual time series of utility plant and accumulated provisions for depreciation, amortization and depletion of utility plant assets.

Most-recent data:: 2024
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 200)
Primary key:: utility_id_ferc1, report_year, utility_type, utility_plant_asset_type

Usage Warnings#

Some columns contain subtotals; use caution when choosing columns to aggregate.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_utility_plant_summary_sched200.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_utility_plant_summary_sched200.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_utility_plant_summary_sched200.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_utility_plant_summary_sched200.parquet"
)

Columns#

Field Name	Type	Description
ending_balance	number	Account balance at end of year.
record_id	string	Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.
report_year	integer	Four-digit year in which the data was reported.
row_type_xbrl	string	Indicates whether the value reported in the row is calculated, or uniquely reported within the table.
utility_id_ferc1	integer	PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.
utility_id_ferc1_dbf	integer	FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year.
utility_id_ferc1_xbrl	string	FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year.
utility_id_pudl	integer	A manually assigned PUDL utility ID. May not be stable over time.
utility_name_ferc1	string	Name of the responding utility, as it is reported in FERC Form 1. For human readability only.
utility_plant_asset_type	string	Type of utility plant asset reported in the core_ferc1__yearly_utility_plant_summary_sched200 table. Assets include those leased to others, held for future use, construction work-in-progress and details of accumulated depreciation.
utility_type	string	Listing of utility plant types. Examples include Electric Utility, Gas Utility, and Other Utility.
utility_type_other	string	Freeform description of type of utility reported in one of the other three other utility_type sections in the core_ferc1__yearly_utility_plant_summary_sched200 table. This field is reported only in the DBF reporting years (1994-2020).

out_ferc714__georeferenced_respondents#

Noticed any irregularities with this table? File a report.

Annual summary of demand and other information about FERC-714 respondents.

Most-recent data:: 2024
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: FERC Form 714 – Annual Electric Balancing Authority Area and Planning Area Report
Primary key:: respondent_id_ferc714, report_date

Usage Warnings#

Contains information from multiple raw inputs.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

This table differs from out_ferc714__summarized_demand in that it also includes a geometry column describing the respondent’s service territory in each year. These service territories are based on the counties that the corresponding EIA-861 respondent reported serving in that year. There is sometimes ambiguity as to whether a FERC-714 respondent should be interpreted as an individual utility or a balancing authority. The respodent_type column indicates which type of entity has been assumed in determining the service territory from EIA-861 data.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc714__georeferenced_respondents.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc714__georeferenced_respondents.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc714__georeferenced_respondents.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc714__georeferenced_respondents.parquet"
)

Columns#

Field Name	Type	Description
area_km2	number	County area in km2.
balancing_authority_code_eia	string	EIA short code identifying a balancing authority. May include Canadian and Mexican BAs.
balancing_authority_id_eia	integer	EIA balancing authority ID. This is often (but not always!) the same as the utility ID associated with the same legal entity.
balancing_authority_name_eia	string	Name of the balancing authority.
demand_annual_mwh	number	Annual electricity demand in a given report year.
eia_code	integer	EIA utility or balancing area authority ID associated with this FERC Form 714 respondent. Note that many utilities are also balancing authorities and in many cases EIA uses the same integer ID to identify a utility in its role as a balancing authority AND as a utility, but there is no requirement that these IDs be the same, and in a number of cases they are different.
geometry	geometry	Geospatial representation of the feature.
population	number	County population, sourced from Census DP1 data.
report_date	date	Date reported.
respondent_id_ferc714	integer	PUDL-assigned identifying a respondent to FERC Form 714. This ID associates natively reported respondent IDs from the original CSV and XBRL data sources.
respondent_name_ferc714	string	Name of the utility, balancing area authority, or planning authority responding to FERC Form 714.
respondent_type	string	Whether a respondent to the FERC form 714 is a utility or a balancing authority.
utility_id_eia	integer	The EIA Utility Identification number.
utility_name_eia	string	The name of the utility.

out_ferc714__hourly_estimated_state_demand#

Noticed any irregularities with this table? File a report.

Hourly time series of estimated electricity demand for each state, scaled such that it matches the total electricity sales by state reported in EIA 861.

Most-recent data:: 2024
Processing:: Data is ready for use in analyses, but for practical reasons has not been denormalized and remains in narrow format.
Source:: FERC Form 714 – Annual Electric Balancing Authority Area and Planning Area Report
Primary key:: state_id_fips, datetime_utc

Usage Warnings#

Contains estimated values.
Contains information from multiple raw inputs.
Large table; do not attempt to open with Excel.
Contains information from multiple raw inputs.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

This table uses hourly electricity demand and information about service territories to estimate the total hourly electricity demand for each US state. Hourly demand is drawn from the FERC 714, which reports at the balancing authority and utility level. Service territories for utilities and balancing authorities are inferred from information in the EIA 861, including the counties served by each utility and the utilities that make up each balancing authority.

This table uses the total electricity sales by state reported in the EIA 861 as a scaling factor to ensure that the magnitude of electricity sales is roughly correct, and obtains the shape of the demand curve from the hourly planning area demand reported in the FERC 714.

This table includes the state FIPS codes - see out_censusdp1tract__states for additional information about FIPS state codes.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc714__hourly_estimated_state_demand.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc714__hourly_estimated_state_demand.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc714__hourly_estimated_state_demand.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc714__hourly_estimated_state_demand.parquet"
)

Columns#

Field Name	Type	Description
datetime_utc	datetime	Date and time converted to Coordinated Universal Time (UTC).
demand_mwh	number	Electricity demand (energy) within a given timeframe.
scaled_demand_mwh	number	Estimated electricity demand scaled by the total sales within a state.
state_id_fips	string	Two digit state FIPS code.

out_ferc714__hourly_planning_area_demand#

Noticed any irregularities with this table? File a report.

Hourly time series of electricity demand by planning area.

Most-recent data:: 2024
Processing:: Data is ready for use in analyses, but for practical reasons has not been denormalized and remains in narrow format.
Source:: FERC Form 714 – Annual Electric Balancing Authority Area and Planning Area Report (Part III, Schedule 2a)
Primary key:: respondent_id_ferc714, datetime_utc

Usage Warnings#

Contains rows where missing values were imputed.
The datetime_utc timestamps have been cleaned due to inconsistent datetime reporting. See core_ferc714__hourly_planning_area_demand for additional details.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

This table is based on core_ferc714__hourly_planning_area_demand, but adds imputed demand values where the original data was missing or anomalous. Codes explaining why values have been imputed can be found in the core_pudl__codes_imputation_reasons table.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc714__hourly_planning_area_demand.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc714__hourly_planning_area_demand.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc714__hourly_planning_area_demand.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc714__hourly_planning_area_demand.parquet"
)

Columns#

Field Name	Type	Description
datetime_utc	datetime	Date and time converted to Coordinated Universal Time (UTC).
demand_imputed_pudl_mwh	number	Electricity demand calculated by subtracting BA interchange from net generation, with outliers and missing values imputed in PUDL.
demand_imputed_pudl_mwh_imputation_code	string	Code describing why a demand value was flagged for imputation.
demand_reported_mwh	number	Originally reported electricity demand, calculated by taking the net generation within the BA and subtracting the interchange with adjacent BAs.
report_date	date	Date reported.
respondent_id_ferc714	integer	PUDL-assigned identifying a respondent to FERC Form 714. This ID associates natively reported respondent IDs from the original CSV and XBRL data sources.
respondent_id_ferc714_csv	integer	FERC Form 714 respondent ID from CSV reported data - published from years: 2006-2020. This ID is linked to the newer years of reported XBRL data through the PUDL-assigned respondent_id_ferc714 ID. This ID was originally reported as respondent_id. Note that this ID does not correspond to FERC respondent IDs from other forms.
respondent_id_ferc714_xbrl	string	FERC Form 714 respondent ID from XBRL reported data - published from years: 2021-present. This ID is linked to the older years of reported CSV data through the PUDL-assigned respondent_id_ferc714 ID. This ID was originally reported as entity_id. Note that this ID does not correspond to FERC respondent IDs from other forms.
timezone	string	IANA timezone name

out_ferc714__respondents_with_fips#

Noticed any irregularities with this table? File a report.

Annual respondents with the county FIPS IDs for their service territories.

Most-recent data:: 2024
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: FERC Form 714 – Annual Electric Balancing Authority Area and Planning Area Report
Primary key:: This table has no primary key. Note that the state and county FIPS columns can be null. Otherwise the natural primary key would be: [‘respondent_id_ferc714’, ‘report_date’, ‘county_id_fips’]. The state FIPS code is not part of the primary key because the first two digits of the county FIPS code indicate the state.

Usage Warnings#

Contains information from multiple raw inputs.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc714__respondents_with_fips.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc714__respondents_with_fips.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc714__respondents_with_fips.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc714__respondents_with_fips.parquet"
)

Columns#

Field Name	Type	Description
balancing_authority_code_eia	string	EIA short code identifying a balancing authority. May include Canadian and Mexican BAs.
balancing_authority_id_eia	integer	EIA balancing authority ID. This is often (but not always!) the same as the utility ID associated with the same legal entity.
balancing_authority_name_eia	string	Name of the balancing authority.
county	string	County name.
county_id_fips	string	County ID from the Federal Information Processing Standard Publication 6-4.
eia_code	integer	EIA utility or balancing area authority ID associated with this FERC Form 714 respondent. Note that many utilities are also balancing authorities and in many cases EIA uses the same integer ID to identify a utility in its role as a balancing authority AND as a utility, but there is no requirement that these IDs be the same, and in a number of cases they are different.
report_date	date	Date reported.
respondent_id_ferc714	integer	PUDL-assigned identifying a respondent to FERC Form 714. This ID associates natively reported respondent IDs from the original CSV and XBRL data sources.
respondent_name_ferc714	string	Name of the utility, balancing area authority, or planning authority responding to FERC Form 714.
respondent_type	string	Whether a respondent to the FERC form 714 is a utility or a balancing authority.
state	string	Two letter US state abbreviation.
state_id_fips	string	Two digit state FIPS code.
utility_id_eia	integer	The EIA Utility Identification number.
utility_name_eia	string	The name of the utility.

out_ferc714__summarized_demand#

Noticed any irregularities with this table? File a report.

Summarized demand statistics and FERC-714 respondent attributes by respondent-year.

Most-recent data:: 2024
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: FERC Form 714 – Annual Electric Balancing Authority Area and Planning Area Report
Primary key:: respondent_id_ferc714, report_date

Usage Warnings#

FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc714__summarized_demand.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc714__summarized_demand.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc714__summarized_demand.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc714__summarized_demand.parquet"
)

Columns#

Field Name	Type	Description
area_km2	number	County area in km2.
balancing_authority_code_eia	string	EIA short code identifying a balancing authority. May include Canadian and Mexican BAs.
balancing_authority_id_eia	integer	EIA balancing authority ID. This is often (but not always!) the same as the utility ID associated with the same legal entity.
balancing_authority_name_eia	string	Name of the balancing authority.
demand_annual_mwh	number	Annual electricity demand in a given report year.
demand_annual_per_capita_mwh	number	Per-capita annual demand, averaged using Census county-level population estimates.
demand_density_mwh_km2	number	Annual demand per km2 of a given service territory.
eia_code	integer	EIA utility or balancing area authority ID associated with this FERC Form 714 respondent. Note that many utilities are also balancing authorities and in many cases EIA uses the same integer ID to identify a utility in its role as a balancing authority AND as a utility, but there is no requirement that these IDs be the same, and in a number of cases they are different.
population	number	County population, sourced from Census DP1 data.
population_density_km2	number	Average population per sq. km area of a service territory.
report_date	date	Date reported.
respondent_id_ferc714	integer	PUDL-assigned identifying a respondent to FERC Form 714. This ID associates natively reported respondent IDs from the original CSV and XBRL data sources.
respondent_name_ferc714	string	Name of the utility, balancing area authority, or planning authority responding to FERC Form 714.
respondent_type	string	Whether a respondent to the FERC form 714 is a utility or a balancing authority.
utility_id_eia	integer	The EIA Utility Identification number.
utility_name_eia	string	The name of the utility.

out_gridpathratoolkit__hourly_available_capacity_factor#

Noticed any irregularities with this table? File a report.

Hourly time series of capacity factors defining the capacity available from an aggregated group of generators.

Most-recent data:: 2021
Processing:: Data is ready for use in analyses, but for practical reasons has not been denormalized and remains in narrow format.
Source:: GridPath Resource Adequacy Toolkit Data
Primary key:: datetime_utc, aggregation_group

Usage Warnings#

Contains estimated values.
The original data is no longer being collected or reported in this way.

Additional Details#

This table contains a mix of profiles representing whole regions and individual plants, where the individual plants are hybrid wind or solar plus electricity storage facilities. Capacity factors are stated as a fraction of the aggregate nameplate capacity of the group.

For the hybrid facilities, the capacity factor represents the available output of only the renewable generators. Estimates of the generation profiles have been extended across a longer range of dates than the original data. For wind, the years 2015-2020 are synthesized data and for solar 2020 is synthesized. See the Appendix of the GridPath Resource Adequacy Toolkit report for details.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_gridpathratoolkit__hourly_available_capacity_factor.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_gridpathratoolkit__hourly_available_capacity_factor.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_gridpathratoolkit__hourly_available_capacity_factor.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_gridpathratoolkit__hourly_available_capacity_factor.parquet"
)

Columns#

Field Name	Type	Description
aggregation_group	string	A label identifying a group of aggregated generator capacity factors.
capacity_factor	number	Fraction of potential generation that was actually reported for a plant part. Energy generated over time period / nameplate capacity * time period (hours/years/etc.).
datetime_utc	datetime	Date and time converted to Coordinated Universal Time (UTC).

out_pudl__yearly_assn_eia_ferc1_plant_parts#

Noticed any irregularities with this table? File a report.

Annual Association table providing connections between power plant data reported in FERC Form 1 and any available EIA data related to the plant parts covered by each FERC Form 1 record.

Most-recent data:: 2024
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: The Public Utility Data Liberation (PUDL) Project
Primary key:: record_id_ferc1

Usage Warnings#

Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.
Contains information from multiple raw inputs.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

This table answers the question “What EIA data reported about plants or generators should be associated with a given plant record found in the FERC Form 1?”

Each record in this table corresponds to a single FERC Form 1 record reported in one of several tables describing power plants (large steam, hydro, small, etc.). These FERC records can correspond to an entire plant, individual generators within a plant, all generators in a plant with the same prime mover type, or just the respondent’s ownership share of any of those categories (or other categories). Furthermore, the same utility may report the same plant in different ways in different years.

The EIA data associated with each FERC plant record comes from our Plant Parts EIA table, out_eia__yearly_plant_parts. The EIA data in each record represents an aggregation of several slices of an EIA plant, across both physical characteristics and utility ownership.

This table inherits canonicalized values for generators and plants. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_pudl__yearly_assn_eia_ferc1_plant_parts.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_pudl__yearly_assn_eia_ferc1_plant_parts.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_pudl__yearly_assn_eia_ferc1_plant_parts.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_pudl__yearly_assn_eia_ferc1_plant_parts.parquet"
)

Columns#

Field Name	Type	Description
appro_part_label	string	Plant part of the associated true granularity record.
appro_record_id_eia	string	EIA record ID of the associated true granularity record.
asset_retirement_cost	number	Asset retirement cost (USD).
avg_num_employees	number	The average number of employees assigned to each plant.
capacity_factor_eia	number	Fraction of potential generation that was actually reported for a plant part. Energy generated over time period / nameplate capacity * time period (hours/years/etc.).
capacity_factor_ferc1	number	Fraction of potential generation that was actually reported for a plant part. Energy generated over time period / nameplate capacity * time period (hours/years/etc.).
capacity_mw_eia	number	Total installed (nameplate) capacity, in megawatts.
capacity_mw_ferc1	number	Total installed (nameplate) capacity, in megawatts.
capex_annual_addition	number	Annual capital addition into capex_total.
capex_annual_addition_rolling	number	Year-to-date capital addition into capex_total.
capex_annual_per_kw	number	Annual capital addition into capex_total per kw.
capex_annual_per_mw	number	Annual capital addition into capex_total per MW.
capex_annual_per_mw_rolling	number	Year-to-date capital addition into capex_total per MW.
capex_annual_per_mwh	number	Annual capital addition into capex_total per MWh.
capex_annual_per_mwh_rolling	number	Year-to-date capital addition into capex_total per MWh.
capex_equipment	number	Cost of plant: equipment (USD).
capex_equipment_electric	number	Cost of plant: accessory electric equipment (USD).
capex_equipment_misc	number	Cost of plant: miscellaneous power plant equipment (USD).
capex_facilities	number	Cost of plant: reservoirs, dams, and waterways (USD).
capex_land	number	Cost of plant: land and land rights (USD).
capex_per_mw	number	Cost of plant per megawatt of installed (nameplate) capacity. Nominal USD.
capex_roads	number	Cost of plant: roads, railroads, and bridges (USD).
capex_structures	number	Cost of plant: structures and improvements (USD).
capex_total	number	Total cost of plant (USD).
capex_wheels_turbines_generators	number	Cost of plant: water wheels, turbines, and generators (USD).
capex_wo_retirement_total	number	Total cost of plant (USD) without retirements.
construction_type	string	Type of plant construction (‘outdoor’, ‘semioutdoor’, or ‘conventional’). Categorized by PUDL based on our best guess of intended value in FERC1 freeform strings.
construction_year_eia	integer	Year the plant’s oldest still operational unit was built.
construction_year_ferc1	integer	Year the plant’s oldest still operational unit was built.
energy_source_code_1	string	The code representing the most predominant type of energy that fuels the generator.
energy_used_for_pumping_mwh	number	Energy used for pumping, in megawatt-hours.
ferc_acct_name	string	Name of FERC account, derived from technology description and prime mover code.
ferc_license_id	string	The FERC license ID of a project.
fraction_owned	number	Proportion of generator ownership attributable to this utility.
fuel_cost_per_mmbtu_eia	number	Average fuel cost per MMBTU of heat content in nominal USD.
fuel_cost_per_mmbtu_ferc1	number	Average fuel cost per MMBTU of heat content in nominal USD.
fuel_cost_per_mwh_eia	number	Derived from MCOE, a unit level value. Average fuel cost per MWh of heat content in nominal USD.
fuel_cost_per_mwh_ferc1	number	Derived from MCOE, a unit level value. Average fuel cost per MWh of heat content in nominal USD.
fuel_type	string	Type of fuel.
fuel_type_code_pudl_eia	string	Simplified fuel type code used in PUDL
fuel_type_code_pudl_ferc1	string	Simplified fuel type code used in PUDL
generator_id	string	Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string!
generator_operating_year	integer	Year a generator went into service.
installation_year_eia	integer	Year the plant’s most recently built unit was installed.
installation_year_ferc1	integer	Year the plant’s most recently built unit was installed.
license_id_ferc1	integer	FERC issued operating license ID for the facility, if available. This value is extracted from the original plant name where possible.
match_type	string	Indicates the source and validation of the match between EIA and FERC. Match types include matches was generated from the model, verified by the training data, overridden by the training data, etc.
net_capacity_adverse_conditions_mw	number	Net plant capability under the least favorable operating conditions, in megawatts.
net_capacity_favorable_conditions_mw	number	Net plant capability under the most favorable operating conditions, in megawatts.
net_generation_mwh_eia	number	Net electricity generation for the specified period in megawatt-hours (MWh).
net_generation_mwh_ferc1	number	Net electricity generation for the specified period in megawatt-hours (MWh).
net_load_mwh	number	Net output for load (net generation - energy used for pumping) in megawatt-hours.
not_water_limited_capacity_mw	number	Plant capacity in MW when not limited by condenser water.
operational_status	string	The operating status of the asset. For generators this is based on which tab the generator was listed in in EIA 860.
operational_status_pudl	string	The operating status of the asset using PUDL categories.
opex_allowances	number	Allowances.
opex_boiler	number	Maintenance of boiler (or reactor) plant.
opex_coolants	number	Cost of coolants and water (nuclear plants only)
opex_dams	number	Production expenses: maintenance of reservoirs, dams, and waterways (USD).
opex_electric	number	Production expenses: electric expenses (USD).
opex_engineering	number	Production expenses: maintenance, supervision, and engineering (USD).
opex_fuel	number	Production expenses: fuel (USD).
opex_generation_misc	number	Production expenses: miscellaneous power generation expenses (USD).
opex_hydraulic	number	Production expenses: hydraulic expenses (USD).
opex_maintenance	number	Production expenses: Maintenance (USD).
opex_misc_plant	number	Production expenses: maintenance of miscellaneous hydraulic plant (USD).
opex_misc_power	number	Miscellaneous steam (or nuclear) expenses.
opex_misc_steam	number	Maintenance of miscellaneous steam (or nuclear) plant.
opex_nonfuel_per_mwh	number	Investments in non-fuel production expenses per Mwh.
opex_operations	number	Production expenses: operations, supervision, and engineering (USD).
opex_per_mwh	number	Total production expenses (USD per MWh generated).
opex_plant	number	Production expenses: maintenance of electric plant (USD).
opex_production_before_pumping	number	Total production expenses before pumping (USD).
opex_production_total	number	Total operating expenses.
opex_pumped_storage	number	Production expenses: pumped storage (USD).
opex_pumping	number	Production expenses: We are here to PUMP YOU UP! (USD).
opex_rents	number	Production expenses: rents (USD).
opex_steam	number	Steam expenses.
opex_steam_other	number	Steam from other sources.
opex_structures	number	Production expenses: maintenance of structures (USD).
opex_total	number	Total production expenses, excluding fuel (USD).
opex_total_nonfuel	number	Total production expenses, excluding fuel (USD).
opex_transfer	number	Steam transferred (Credit).
opex_water_for_power	number	Production expenses: water for power (USD).
ownership_dupe	boolean	Whether a plant part record has a duplicate record with different ownership status.
ownership_record_type	string	Whether each generator record is for one owner or represents a total of all ownerships.
peak_demand_mw	number	Net peak demand for 60 minutes. Note: in some cases peak demand for other time periods may have been reported instead, if hourly peak demand was unavailable.
plant_capability_mw	number	Net plant capability in megawatts.
plant_hours_connected_while_generating	number	Hours the plant was connected to load while generating in the report year.
plant_id_eia	integer	The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.
plant_id_ferc1	integer	Algorithmically assigned PUDL FERC Plant ID. WARNING: NOT STABLE BETWEEN PUDL DB INITIALIZATIONS.
plant_id_pudl	integer	A manually assigned PUDL plant ID. May not be constant over time.
plant_name_eia	string	Plant name.
plant_name_ferc1	string	Name of the plant, as reported to FERC. This is a freeform string, not guaranteed to be consistent across references to the same plant.
plant_name_ppe	string	Derived plant name that includes EIA plant name and other strings associated with ID and PK columns of the plant part.
plant_part	string	The part of the plant a record corresponds to.
plant_part_id_eia	string	Contains EIA plant ID, plant part, ownership, and EIA utility id
plant_type	string	Type of plant.
prime_mover_code	string	Code for the type of prime mover (e.g. CT, CG)
record_count	integer	Number of distinct generator IDs that participated in the aggregation for a plant part list record.
record_id_eia	string	Identifier for EIA plant parts analysis records.
record_id_ferc1	string	Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.
report_date	date	Date reported.
report_year	integer	Four-digit year in which the data was reported.
technology_description	string	High level description of the technology used by the generator to produce electricity.
total_fuel_cost_eia	number	Total annual reported fuel costs for the plant part. Includes costs from all fuels.
total_fuel_cost_ferc1	number	Total annual reported fuel costs for the plant part. Includes costs from all fuels.
total_mmbtu_eia	number	Total annual heat content of fuel consumed by a plant part record in the plant parts list.
total_mmbtu_ferc1	number	Total annual heat content of fuel consumed by a plant part record in the plant parts list.
true_gran	boolean	Indicates whether a plant part list record is associated with the highest priority plant part for all identical records.
unit_heat_rate_mmbtu_per_mwh_eia	number	Fuel content per unit of electricity generated. Coming from MCOE calculation.
unit_heat_rate_mmbtu_per_mwh_ferc1	number	Fuel content per unit of electricity generated. Calculated from FERC reported fuel consumption and net generation.
unit_id_pudl	integer	Dynamically assigned PUDL unit id. WARNING: This ID is not guaranteed to be static long term as the input data and algorithm may evolve over time.
utility_id_eia	integer	The EIA Utility Identification number.
utility_id_ferc1	integer	PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.
utility_id_pudl	integer	A manually assigned PUDL utility ID. May not be stable over time.
utility_name_ferc1	string	Name of the responding utility, as it is reported in FERC Form 1. For human readability only.
water_limited_capacity_mw	number	Plant capacity in MW when limited by condenser water.

out_rus12__monthly_demand_and_energy_at_delivery_points#

Noticed any irregularities with this table? File a report.

Monthly time series of demand and energy delivered to distribution utilities.

Most-recent data:: 2021
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part H - Section E)
Primary key:: report_date, borrower_id_rus, delivery_recipient

Usage Warnings#

Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__monthly_demand_and_energy_at_delivery_points.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__monthly_demand_and_energy_at_delivery_points.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus12__monthly_demand_and_energy_at_delivery_points.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__monthly_demand_and_energy_at_delivery_points.parquet"
)

Columns#

Field Name	Type	Description
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
borrower_name_rus	string	The name of the RUS (Rural Utilities Service) borrower.
delivered_demand_mw	number	The amount of demand delivered in MW.
delivered_energy_mwh	number	The amount of energy delivered in MWh.
delivery_recipient	string	The recipient of the delivered energy or demand.
report_date	date	Date reported.
state	string	Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

out_rus12__monthly_demand_and_energy_at_power_sources#

Noticed any irregularities with this table? File a report.

Monthly time series of demand and energy at point of generation.

Most-recent data:: 2021
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part H - Section D)
Primary key:: report_date, borrower_id_rus

Usage Warnings#

Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Additional Details#

There are some peak demand dates that have an hour component and others that don’t. This is because there is inconsistent reporting of peak_demand_hour in the raw data. If there is no hour reported, we record the hour as 0 and only the date will show in the data.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__monthly_demand_and_energy_at_power_sources.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__monthly_demand_and_energy_at_power_sources.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus12__monthly_demand_and_energy_at_power_sources.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__monthly_demand_and_energy_at_power_sources.parquet"
)

Columns#

Field Name	Type	Description
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
borrower_name_rus	string	The name of the RUS (Rural Utilities Service) borrower.
energy_output_mwh	number	The amount of energy output in MWh.
is_peak_coincident	boolean	Whether or not the peak_mw is coincident or non-coincident peak.
peak_demand_date	datetime	The date of the peak demand.
peak_demand_mw	number	Net peak demand for 60 minutes. Note: in some cases peak demand for other time periods may have been reported instead, if hourly peak demand was unavailable.
report_date	date	Date reported.
state	string	Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

out_rus12__yearly_balance_sheet_assets#

Noticed any irregularities with this table? File a report.

Annual time series of assets and other debts from the balance sheet.

Most-recent data:: 2021
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part A - Section B)
Primary key:: report_date, borrower_id_rus, asset_type

Usage Warnings#

Some columns contain subtotals; use caution when choosing columns to aggregate.
Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Additional Details#

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_balance_sheet_assets.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_balance_sheet_assets.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus12__yearly_balance_sheet_assets.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_balance_sheet_assets.parquet"
)

Columns#

Field Name	Type	Description
asset_type	string	Type of asset being reported to the core_ferc1__yearly_balance_sheet_assets_sched110 table.
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
borrower_name_rus	string	The name of the RUS (Rural Utilities Service) borrower.
ending_balance	number	Account balance at end of year.
is_total	boolean	Whether or not this record represents a total.
report_date	date	Date reported.
state	string	Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

out_rus12__yearly_balance_sheet_liabilities#

Noticed any irregularities with this table? File a report.

Annual time series of liabilities and other credits from the balance sheet.

Most-recent data:: 2021
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part A - Section B)
Primary key:: report_date, borrower_id_rus, liability_type

Usage Warnings#

Some columns contain subtotals; use caution when choosing columns to aggregate.
Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Additional Details#

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_balance_sheet_liabilities.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_balance_sheet_liabilities.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus12__yearly_balance_sheet_liabilities.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_balance_sheet_liabilities.parquet"
)

Columns#

Field Name	Type	Description
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
borrower_name_rus	string	The name of the RUS (Rural Utilities Service) borrower.
ending_balance	number	Account balance at end of year.
is_total	boolean	Whether or not this record represents a total.
liability_type	string	Type of liability being reported to the core_ferc1__yearly_balance_sheet_liabilities_sched110 table.
report_date	date	Date reported.
state	string	Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

out_rus12__yearly_depreciation_changes#

Noticed any irregularities with this table? File a report.

Annual time series of changes in accumulated depreciation.

Most-recent data:: 2021
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part H - Section B)
Primary key:: report_date, borrower_id_rus, depreciation_and_amortization_group, depreciation_and_amortization_item

Usage Warnings#

Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_depreciation_changes.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_depreciation_changes.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus12__yearly_depreciation_changes.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_depreciation_changes.parquet"
)

Columns#

Field Name	Type	Description
accruals	number	Value of additions into an asset class a.k.a accruals.
adjustments_and_transfers	number	Amount of adjustments and transfers within a class of assets.
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
borrower_name_rus	string	The name of the RUS (Rural Utilities Service) borrower.
composite_depreciation_rate	number	The composite depreciation rate within a given category. This is typically expressed as a number between 0 and 100.
depreciation_and_amortization_group	string	High-level category of depreciation and amortization items.
depreciation_and_amortization_item	string	Category of depreciation and amortization items.
ending_balance	number	Account balance at end of year.
is_total	boolean	Whether or not this record represents a total.
report_date	date	Date reported.
retirements_less_net_salvage	number	Cost of retirements minus any net salvage value.
state	string	Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

out_rus12__yearly_depreciation_misc#

Noticed any irregularities with this table? File a report.

Annual time series of accumulated depreciation.

Most-recent data:: 2021
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part H - 2nd part of Section B)
Primary key:: report_date, borrower_id_rus, depreciation_and_amortization_item

Usage Warnings#

Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Additional Details#

This is the second half of the depreciation table. It includes ending balances of six depreciation and amortization values.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_depreciation_misc.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_depreciation_misc.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus12__yearly_depreciation_misc.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_depreciation_misc.parquet"
)

Columns#

Field Name	Type	Description
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
borrower_name_rus	string	The name of the RUS (Rural Utilities Service) borrower.
depreciation_and_amortization_item	string	Category of depreciation and amortization items.
ending_balance	number	Account balance at end of year.
report_date	date	Date reported.
state	string	Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

out_rus12__yearly_external_financial_risk_ratio#

Noticed any irregularities with this table? File a report.

Annual time series of ratio of investments and loan guarantee balances to total utility plant assets.

Most-recent data:: 2021
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part H - Section F - Subsection III)
Primary key:: report_date, borrower_id_rus

Usage Warnings#

Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_external_financial_risk_ratio.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_external_financial_risk_ratio.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus12__yearly_external_financial_risk_ratio.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_external_financial_risk_ratio.parquet"
)

Columns#

Field Name	Type	Description
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
borrower_name_rus	string	The name of the RUS (Rural Utilities Service) borrower.
external_financial_risk_ratio	number	total investments + loan guarantee balances / total utility plant assets. This ratio shows how much a utility is financially exposed to outside entities relative to its own assets.
report_date	date	Date reported.
state	string	Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

out_rus12__yearly_investments#

Noticed any irregularities with this table? File a report.

Annual time series of investments, loan guarantees and loans.

Most-recent data:: 2021
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part H - Section F, Sub-section I)
Primary key:: This table has no primary key. This is a list of all investments or loans in each year and borrowers can have multiple records with the same investment_description.

Usage Warnings#

Some columns contain subtotals; use caution when choosing columns to aggregate.
Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Additional Details#

Reporting of investments is required by 7 CFR 1717, Subpart N. Investment categories reported on this Part correspond to Balance Sheet items in Part A Section B in table core_rus12__yearly_balance_sheet_assets.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_investments.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_investments.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus12__yearly_investments.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_investments.parquet"
)

Columns#

Field Name	Type	Description
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
borrower_name_rus	string	The name of the RUS (Rural Utilities Service) borrower.
excluded_investments	number	Excluded investment.
for_rural_development	boolean	Whether or not the investment or loan is for rural development. This includes investments in any/all types of projects or products that were made to improve the economy and/or quality of life in the specified area.
included_investments	number	Included investment.
income_or_loss	number	Income or loss from investment.
investment_description	string	Description of investment. This is a free-form text field and thus contains a wide variety of values.
investment_type_code	integer	Investment type code.
report_date	date	Date reported.
state	string	Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

out_rus12__yearly_lines_stations_labor_materials_cost#

Noticed any irregularities with this table? File a report.

Annual time series of labor and material cost for lines and stations operated by RUS borrowers.

Most-recent data:: 2021
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part I - Section C)
Primary key:: report_date, borrower_id_rus, labor_or_material, operation_or_maintenance, lines_or_stations

Usage Warnings#

Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_lines_stations_labor_materials_cost.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_lines_stations_labor_materials_cost.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus12__yearly_lines_stations_labor_materials_cost.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_lines_stations_labor_materials_cost.parquet"
)

Columns#

Field Name	Type	Description
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
borrower_name_rus	string	The name of the RUS (Rural Utilities Service) borrower.
cost	number	Cost value.
labor_or_material	string	Indicates whether the cost reported is for labor or material.
lines_or_stations	string	Indicates whether the cost reported pertains to lines or stations.
operation_or_maintenance	string	Indicates whether the expenditure is for operation or maintenance.
report_date	date	Date reported.
state	string	Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

out_rus12__yearly_loans#

Noticed any irregularities with this table? File a report.

Annual time series of loans provided by RUS borrowers.

Most-recent data:: 2021
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part H - Section F - Subsections II & IV)
Primary key:: This table has no primary key. This table has no primary key because some borrowers report multiple loan values from the same entity in a given year.

Usage Warnings#

Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Additional Details#

This table also includes loan guarantees where the RUS borrower backs a loan from another entity and is therefore liable to pay any remaining balance should the original borrower default.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_loans.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_loans.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus12__yearly_loans.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_loans.parquet"
)

Columns#

Field Name	Type	Description
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
borrower_name_rus	string	The name of the RUS (Rural Utilities Service) borrower.
for_rural_development	boolean	Whether or not the investment or loan is for rural development. This includes investments in any/all types of projects or products that were made to improve the economy and/or quality of life in the specified area.
is_loan_guarantee	boolean	Indicates a third-party loan that the reporting utility (referred as a borrower) has co-signed, taking on responsibility for repayment if the primary borrower defaults.
loan_balance	number	The amount of money still owned on a loan at the end of the reporting year.
loan_maturity_date	date	The date on which a loan is scheduled to be fully paid.
loan_original_amount	number	The original amount of a loan.
loan_recipient	string	The organization that received a loan.
report_date	date	Date reported.
state	string	Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

out_rus12__yearly_long_term_debt#

Noticed any irregularities with this table? File a report.

Annual time series of long-term debt and debt service requirements for RUS borrowers.

Most-recent data:: 2021
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part H - Section H)
Primary key:: This table has no primary key. This table has no primary key because some borrowers report multiple debt values from the same entity in a given year.

Usage Warnings#

Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_long_term_debt.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_long_term_debt.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus12__yearly_long_term_debt.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_long_term_debt.parquet"
)

Columns#

Field Name	Type	Description
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
borrower_name_rus	string	The name of the RUS (Rural Utilities Service) borrower.
debt_description	string	Description of debt or loan. On the original form, there are nine provided descriptions and a section to add other free-form descriptions.
debt_ending_balance	number	The amount of principal still owned on the debt at the end of the report year.
debt_interest	number	The interest expense on the debt for the report year.
debt_principal	number	The principal paid on the debt during the report year.
debt_total	number	The total amount of debt.
report_date	date	Date reported.
state	string	Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

out_rus12__yearly_meeting_and_board#

Noticed any irregularities with this table? File a report.

Annual time series of annual meeting and board information for RUS borrowers.

Most-recent data:: 2021
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part H - Section I)
Primary key:: report_date, borrower_id_rus

Usage Warnings#

Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_meeting_and_board.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_meeting_and_board.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus12__yearly_meeting_and_board.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_meeting_and_board.parquet"
)

Columns#

Field Name	Type	Description
board_members_num	integer	The total number of board members.
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
borrower_name_rus	string	The name of the RUS (Rural Utilities Service) borrower.
does_manager_have_written_contract	boolean	Whether or not the RUS borrower’s manager has a written contract.
fees_and_expenses_for_board_members	integer	The total amount of fees and expenses for board members.
last_annual_meeting_date	datetime	The date of the last annual meeting.
members_num	integer	The total number of members.
members_present_at_meeting_num	integer	The number of members present at the last annual meeting.
members_voting_by_proxy_or_mail_num	integer	The number of members voting by mail or by proxy.
report_date	date	Date reported.
state	string	Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.
was_quorum_present	boolean	Whether or not quorum was met.

out_rus12__yearly_non_utility_plant_changes#

Noticed any irregularities with this table? File a report.

Annual time series of changes in non-utility plant.

Most-recent data:: 2021
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part H - Section C)
Primary key:: report_date, borrower_id_rus, non_utility_plant_item

Usage Warnings#

Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_non_utility_plant_changes.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_non_utility_plant_changes.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus12__yearly_non_utility_plant_changes.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_non_utility_plant_changes.parquet"
)

Columns#

Field Name	Type	Description
additions	number	Cost of acquisition of items classified within the account.
adjustments_and_transfers	number	Amount of adjustments and transfers within a class of assets.
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
borrower_name_rus	string	The name of the RUS (Rural Utilities Service) borrower.
ending_balance	number	Account balance at end of year.
non_utility_plant_item	string	Category describing non-utility plant asset items.
report_date	date	Date reported.
retirements	number	Cost of disposal of items classified within the account.
state	string	Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

out_rus12__yearly_plant_costs#

Noticed any irregularities with this table? File a report.

Annual time series of costs of net energy generated by plant.

Most-recent data:: 2021
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part F - Section D)
Primary key:: This table has no primary key. This table has no primary key because there is one plant (Walter Scott) that has duplicate records every year. Based on other RUS-12 tables, it is likely that these duplicate records are the borrowers’ ownership portion of the plant as well as the total plant but there is no clear indication in this table.

The primary key of this table otherwise would be: [‘report_date’, ‘borrower_id_rus’, ‘plant_name_rus’, ‘cost_group’, ‘cost_type’].

Usage Warnings#

Some columns contain subtotals; use caution when choosing columns to aggregate.
Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Additional Details#

The cost column in this table is expected to be largely non-null, the cost_per_mwh and cost_per_mmbtu columns only apply to some cost_type’s and even plant_type’s and thus are expected to contain many nulls.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_plant_costs.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_plant_costs.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus12__yearly_plant_costs.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_plant_costs.parquet"
)

Columns#

Field Name	Type	Description
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
borrower_name_rus	string	The name of the RUS (Rural Utilities Service) borrower.
cost	number	Cost value.
cost_group	string	High-level category of cost type.
cost_per_mmbtu	number	Unit cost of energy production in cost per MMBTU
cost_per_mwh	number	Unit cost of energy production in cost per MWh
cost_type	string	Detailed category of cost type.
is_total	boolean	Whether or not this record represents a total.
plant_name_rus	string	Name of the plant as reported to RUS.
plant_type	string	Type of plant.
report_date	date	Date reported.
state	string	Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

out_rus12__yearly_plant_factors_and_maximum_demand#

Noticed any irregularities with this table? File a report.

Annual time series of plant factors and maximum demand for plants owned by RUS borrowers.

Most-recent data:: 2021
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Parts D, E, F IC, F CC, G - Section C)
Primary key:: This table has no primary key. The primary key for this table would be report_date, borrower_id_rus, plant_name_rus, and plant_type but the plant_type column is not reported before 2009 causing lots of duplicate records. Because none of the other fields are static year-over-year, there is no reliable way to backfill the plant_type column in instances where there are different plant_types reported for the same plant in later years.

Usage Warnings#

Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Additional Details#

The plant_type field was only included in the data starting in 2009. We backfill records with only one reported plant_type per borrower/plant name/year. Records with multiple plant_types per borrower/plant name/year are left with null plant_type because there is no reliable way to determine which plant_type is associated with which record back in time. Plants who’s names are reported differently pre and post 2009 will also miss being backfilled.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_plant_factors_and_maximum_demand.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_plant_factors_and_maximum_demand.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus12__yearly_plant_factors_and_maximum_demand.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_plant_factors_and_maximum_demand.parquet"
)

Columns#

Field Name	Type	Description
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
borrower_name_rus	string	The name of the RUS (Rural Utilities Service) borrower.
capacity_factor	number	Fraction of potential generation that was actually reported for a plant part. Energy generated over time period / nameplate capacity * time period (hours/years/etc.).
capacity_factor_running	number	Fraction of potential generation over the time period a plant was in operation. Energy generated over time period / nameplate capacity * time period (hours/years/etc.).
load_factor	number	Fraction of consumption vs demand reported for a plant over a given timeframe. Energy consumed over time period / peak demand * time period (hours/years/etc.).
peak_gross_demand_mw	number	The highest average power output recorded over any single 15 minute interval during the reporting period.
peak_gross_demand_nameplate_mw	number	The theoretical or nameplate peak the plant could produce under the best operating conditions during the reporting period.
plant_name_rus	string	Name of the plant as reported to RUS.
plant_type	string	Type of plant.
report_date	date	Date reported.
state	string	Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

out_rus12__yearly_plant_labor#

Noticed any irregularities with this table? File a report.

Annual time series of labor and payroll information for plants owned by RUS borrowers.

Most-recent data:: 2021
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Parts D, E, F, G - Section B)
Primary key:: This table has no primary key. The primary key should be report_date, borrower_id_rus, plant_name_rus, and plant_type, but this table did not report plant_type before 2009 and there are respondents who report multiple rows per plant pre-2009. The data cannot be backfilled because there is no way to distinguish between duplicate rows pre-2009.

Usage Warnings#

Some columns contain subtotals; use caution when choosing columns to aggregate.
Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Additional Details#

Note the lack of plant_type pre-2009 leading to a lack of reliable primary keys.

Plant-level data from core_rus12__yearly_plant_operations_by_plant can be matched to data from this table, but you must also join this data usingthe borrower_id_rus column as well as the report_date and plant_name. This is because the core_rus12__yearly_plant_operations_by_plant table includes records from each plant owner. The RUS instructions note that:

you should use caution when using total plant data since there are cases where more than one Borrower shares units at the same plant which means that you will be getting duplicate plant total records (and there is no guarantee that the total plant records entered by two borrowers for the same plant will be identical).

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_plant_labor.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_plant_labor.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus12__yearly_plant_labor.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_plant_labor.parquet"
)

Columns#

Field Name	Type	Description
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
borrower_name_rus	string	The name of the RUS (Rural Utilities Service) borrower.
employee_hours_worked_total	number	Total number of hours worked by employees.
employees_full_time_num	integer	Number of employees hired full-time for normal operations of the system.
employees_part_time_num	integer	Number employees regularly employed on a part-time basis. Exclude employees hired for short periods of time to complete special jobs.
payroll_maintenance	number	The amount of payroll spent on plant maintenance.
payroll_operations	number	The amount of payroll spent on plant operations.
payroll_other_accounts	number	The amount of plant payroll spent on accounts other than maintenance and operations.
plant_name_rus	string	Name of the plant as reported to RUS.
plant_type	string	Type of plant.
report_date	date	Date reported.
state	string	Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

out_rus12__yearly_plant_operations_by_borrower#

Noticed any irregularities with this table? File a report.

Annual time series of borrower portion of plant operational data including fuel consumption and operational hours.

Most-recent data:: 2021
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part D, E, F (CC), F (IC) & G - Section A)
Primary key:: This table has no primary key. This table has no primary key because there are a handful of plants that have duplicate records. The primary key of this table otherwise would be: [report_date, borrower_id_rus, plant_name_rus, plant_name_rus, unit_id_rus, plant_type, is_full_ownership_portion, is_partly_owned_by_borrower].

Usage Warnings#

Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Additional Details#

The data in this table comes from five different portions of RUS 12 corresponding to different plant types (steam, hydroelectric, combined_cycle, internal_combustion and nuclear). The original form includes plant operations data for each plant type reported in multiple ways: with records corresponding to the portion of plants that borrowers own as well as the whole plant. We split these records into two tables, core_rus12__yearly_plant_operations_by_borrower, which contains records reported at the borrower level, and core_rus12__yearly_plant_operations_by_plant, which contains records reported at the plant level.Records that are wholly owned by one borrower show up in both core_rus12__yearly_plant_operations_by_borrower and core_rus12__yearly_plant_operations_by_plant.

There are two boolean columns used to delineate which records are associated with the borrowers’ share vs the whole plant - which is documented in _OR_PowerSupply Plant File Documentation.rtf in the newer years in the RUS 12 archive. One of these two fields - is_partly_owned_by_borrower - was not reported before 2009. For the pre-2009 years, we assume that all records that report TRUE for is_full_ownership_portion should end up in the by-plant table while all records should end up in the by-borrower portion of the table.Like the post-2009 records, this involves records from the original tables ending up in both of these PUDL tables.

Also there are two cleaning steps that we performed that alter the original data slightly:

One dropped with unexpected ownership label and duplicate data: There is a Wisdom steam plant record that is labeled to be both fully owned by borrower and partly owned for one year, which is an unexpected combo based on the _OR_PowerSupply Plant File Documentation.rst documentation file in the rus12 archive. Luckily this plant has exactly the same records as the other Wisdom steam plant that year with more expected ownership labels.* Replaced two string values for unit_id_rus: There are two instances from 2018 of unit_id_rus’s that have string values in them - WSL GT 12 and WSL ST 10. Based on pre-cleaned data, we were able to clearly identify that we can use just the numeric values in these bad strings - 12 and 10 respectively. This enables us to have an integer type for this unit_id_rus column.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_plant_operations_by_borrower.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_plant_operations_by_borrower.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus12__yearly_plant_operations_by_borrower.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_plant_operations_by_borrower.parquet"
)

Columns#

Field Name	Type	Description
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
borrower_name_rus	string	The name of the RUS (Rural Utilities Service) borrower.
capacity_mw	number	Total installed (nameplate) capacity, in megawatts.
fuel_consumption_coal_lbs	number	Annual pounds of coal consumed for fuel.This field is only reported for plant_type steam.
fuel_consumption_gas_cubic_feet	number	Annual cubic feet of natural gas consumed for fuel.This field is only reported for plant_type’s combined_cycle, combined_cycle and steam.
fuel_consumption_oil_gallons	number	Annual gallons of oil consumed for fuel.This field is only reported for plant_type’s combined_cycle, combined_cycle and steam.
fuel_consumption_other	number	Annual other fuel consumed. Neither units nor type of fuel are documented.This field is only reported for plant_type’s combined_cycle, combined_cycle and steam.
gross_generation_mwh	number	Gross electricity generation for the specified period in megawatt-hours (MWh).
is_full_ownership_portion	boolean	Whether or not the plant record represents the full plant - regardless of whether its fully owned by the borrower.
is_partly_owned_by_borrower	boolean	Whether or not the plant record is partially owned by the borrower. This column was not reported before 2009.
operating_hours_in_service	number	Number of operating hours in service.
operating_hours_on_standby	number	Number of operating hours on standby.
operating_hours_out_of_service_scheduled	number	Number of operating hours out of service which were scheduled.
operating_hours_out_of_service_unscheduled	number	Number of operating hours out of service which were unscheduled.
ownership_pct	number	Percentage of the plant owned by the respondent.
plant_name_rus	string	Name of the plant as reported to RUS.
plant_type	string	Type of plant.
report_date	date	Date reported.
state	string	Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.
times_started	number	Number of times the plant was started. This field is only reported for plant_type’s steam and nuclear.
unit_id_rus	integer	RUS-assigned unit identification code.

out_rus12__yearly_plant_operations_by_plant#

Noticed any irregularities with this table? File a report.

Annual time series of whole plant operational data including fuel consumption and operational hours.

Most-recent data:: 2021
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part D, E, F (CC), F (IC) & G - Section A)
Primary key:: This table has no primary key. This table has no primary key because there are a handful of plants that have duplicate records. The primary key of this table otherwise would be: [report_date, borrower_id_rus, plant_name_rus, plant_name_rus, unit_id_rus, plant_type, is_full_ownership_portion, is_partly_owned_by_borrower].

Usage Warnings#

Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Additional Details#

The data in this table comes from five different portions of RUS 12 corresponding to different plant types (steam, hydroelectric, combined_cycle, internal_combustion and nuclear). The original form includes plant operations data for each plant type reported in multiple ways: with records corresponding to the portion of plants that borrowers own as well as the whole plant. We split these records into two tables, core_rus12__yearly_plant_operations_by_borrower, which contains records reported at the borrower level, and core_rus12__yearly_plant_operations_by_plant, which contains records reported at the plant level.Records that are wholly owned by one borrower show up in both core_rus12__yearly_plant_operations_by_borrower and core_rus12__yearly_plant_operations_by_plant.

There are two boolean columns used to delineate which records are associated with the borrowers’ share vs the whole plant - which is documented in _OR_PowerSupply Plant File Documentation.rtf in the newer years in the RUS 12 archive. One of these two fields - is_partly_owned_by_borrower - was not reported before 2009. For the pre-2009 years, we assume that all records that report TRUE for is_full_ownership_portion should end up in the by-plant table while all records should end up in the by-borrower portion of the table.Like the post-2009 records, this involves records from the original tables ending up in both of these PUDL tables.

Also there are two cleaning steps that we performed that alter the original data slightly:

One dropped with unexpected ownership label and duplicate data: There is a Wisdom steam plant record that is labeled to be both fully owned by borrower and partly owned for one year, which is an unexpected combo based on the _OR_PowerSupply Plant File Documentation.rst documentation file in the rus12 archive. Luckily this plant has exactly the same records as the other Wisdom steam plant that year with more expected ownership labels.* Replaced two string values for unit_id_rus: There are two instances from 2018 of unit_id_rus’s that have string values in them - WSL GT 12 and WSL ST 10. Based on pre-cleaned data, we were able to clearly identify that we can use just the numeric values in these bad strings - 12 and 10 respectively. This enables us to have an integer type for this unit_id_rus column..

Plant-level data can be matched to the :ref:core_rus12__yearly_plant_labor` and forthcoming core_rus12__yearly_plant_factors_and_maximum_demand, which report Section B and C data for all plants. The RUS instructions note that:

you should use caution when using total plant data since there are cases where more than one Borrower shares units at the same plant which means that you will be getting duplicate plant total records (and there is no guarantee that the total plant records entered by two borrowers for the same plant will be identical).

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_plant_operations_by_plant.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_plant_operations_by_plant.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus12__yearly_plant_operations_by_plant.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_plant_operations_by_plant.parquet"
)

Columns#

Field Name	Type	Description
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
borrower_name_rus	string	The name of the RUS (Rural Utilities Service) borrower.
capacity_mw	number	Total installed (nameplate) capacity, in megawatts.
fuel_consumption_coal_lbs	number	Annual pounds of coal consumed for fuel.This field is only reported for plant_type steam.
fuel_consumption_gas_cubic_feet	number	Annual cubic feet of natural gas consumed for fuel.This field is only reported for plant_type’s combined_cycle, combined_cycle and steam.
fuel_consumption_oil_gallons	number	Annual gallons of oil consumed for fuel.This field is only reported for plant_type’s combined_cycle, combined_cycle and steam.
fuel_consumption_other	number	Annual other fuel consumed. Neither units nor type of fuel are documented.This field is only reported for plant_type’s combined_cycle, combined_cycle and steam.
gross_generation_mwh	number	Gross electricity generation for the specified period in megawatt-hours (MWh).
is_partly_owned_by_borrower	boolean	Whether or not the plant record is partially owned by the borrower. This column was not reported before 2009.
operating_hours_in_service	number	Number of operating hours in service.
operating_hours_on_standby	number	Number of operating hours on standby.
operating_hours_out_of_service_scheduled	number	Number of operating hours out of service which were scheduled.
operating_hours_out_of_service_unscheduled	number	Number of operating hours out of service which were unscheduled.
ownership_pct	number	Percentage of the plant owned by the respondent.
plant_name_rus	string	Name of the plant as reported to RUS.
plant_type	string	Type of plant.
report_date	date	Date reported.
state	string	Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.
times_started	number	Number of times the plant was started. This field is only reported for plant_type’s steam and nuclear.
unit_id_rus	integer	RUS-assigned unit identification code.

out_rus12__yearly_renewable_plants#

Noticed any irregularities with this table? File a report.

Annual time series of renewable energy plant generation information for RUS borrowers.

Most-recent data:: 2021
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part C RE)
Primary key:: report_date, borrower_id_rus, plant_name_rus, prime_mover_id

Usage Warnings#

Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_renewable_plants.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_renewable_plants.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus12__yearly_renewable_plants.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_renewable_plants.parquet"
)

Columns#

Field Name	Type	Description
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
borrower_name_rus	string	The name of the RUS (Rural Utilities Service) borrower.
capacity_factor	number	Fraction of potential generation that was actually reported for a plant part. Energy generated over time period / nameplate capacity * time period (hours/years/etc.).
capacity_mw	number	Total installed (nameplate) capacity, in megawatts.
comments	string	General comments field.
employees_num	integer	Number of employees.
invested	number	The amount of money invested.
net_generation_mwh	number	Net electricity generation for the specified period in megawatt-hours (MWh).
opex_per_mwh	number	Total production expenses (USD per MWh generated).
ownership_pct	number	Percentage of the plant owned by the respondent.
plant_name_rus	string	Name of the plant as reported to RUS.
power_cost_per_mwh	number	The cost of power per mwh.
primary_renewable_fuel_type	string	Primary renewable fuel type used by the plant.
primary_renewable_fuel_type_id	integer	Unique numeric identifier for each renewable fuel type.
prime_mover_id	integer	Unique numeric identifier for each prime mover type used by RUS borrowers.
prime_mover_type	string	Type of prime mover (e.g. Hydro, Internal Combustion).
renewable_fuel_pct	number	Percentage of renewable fuel used.
report_date	date	Date reported.
rus_funding	number	Amount of funding received from the Rural Utilities Service (RUS).
state	string	Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

out_rus12__yearly_sources_and_distribution#

Noticed any irregularities with this table? File a report.

Annual time series of MWh and cost of energy sources and distribution by RUS borrowers.

Most-recent data:: 2021
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part C)
Primary key:: report_date, borrower_id_rus, source_of_energy

Usage Warnings#

Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Additional Details#

See the sources_and_distribution_by_plant_type table for a breakdown of plant-type-specific cost, capacity, plant_num, and net_energy_received values. Also note that there are several source_of_energy values that don’t have a corresponding cost value.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_sources_and_distribution.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_sources_and_distribution.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus12__yearly_sources_and_distribution.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_sources_and_distribution.parquet"
)

Columns#

Field Name	Type	Description
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
borrower_name_rus	string	The name of the RUS (Rural Utilities Service) borrower.
cost	number	Cost value.
net_energy_received_mwh	number	The net amount of energy received into the system.
report_date	date	Date reported.
source_of_energy	string	The source of energy (not plant type).
state	string	Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

out_rus12__yearly_sources_and_distribution_by_plant_type#

Noticed any irregularities with this table? File a report.

Annual time series of capacity, plant num, MWh, and cost of energy by plant type for RUS borrowers.

Most-recent data:: 2021
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part C)
Primary key:: report_date, borrower_id_rus, plant_type

Usage Warnings#

Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Additional Details#

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_sources_and_distribution_by_plant_type.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_sources_and_distribution_by_plant_type.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus12__yearly_sources_and_distribution_by_plant_type.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_sources_and_distribution_by_plant_type.parquet"
)

Columns#

Field Name	Type	Description
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
borrower_name_rus	string	The name of the RUS (Rural Utilities Service) borrower.
capacity_mw	number	Total installed (nameplate) capacity, in megawatts.
cost	number	Cost value.
net_energy_received_mwh	number	The net amount of energy received into the system.
plant_num	integer	Number of plants.
plant_type	string	Type of plant.
report_date	date	Date reported.
state	string	Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

out_rus12__yearly_statement_of_operations#

Noticed any irregularities with this table? File a report.

Annual time series of opex and cost of electric service for RUS borrowers by time period.

Most-recent data:: 2021
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part A - Section A)
Primary key:: report_date, borrower_id_rus, opex_group, opex_type

Usage Warnings#

Some columns contain subtotals; use caution when choosing columns to aggregate.
Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_statement_of_operations.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_statement_of_operations.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus12__yearly_statement_of_operations.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_statement_of_operations.parquet"
)

Columns#

Field Name	Type	Description
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
borrower_name_rus	string	The name of the RUS (Rural Utilities Service) borrower.
is_total	boolean	Whether or not this record represents a total.
opex_group	string	High level section from the statement of operations table. Most of these types have subcomponents broken out in the `opex_type` column.
opex_report_month	number	Amount of operational expense, cost or income during the report month.
opex_type	string	Type of item from the statement of operations.
opex_ytd	number	The year-to-date amount of operational expense, cost or income.
opex_ytd_budget	number	The year-to-date budget for amount of operational expense, cost or income.
report_date	date	Date reported.
state	string	Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

out_rus12__yearly_utility_plant_changes#

Noticed any irregularities with this table? File a report.

Annual time series of changes in utility plant

Most-recent data:: 2021
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part H - Section A)
Primary key:: report_date, borrower_id_rus, utility_plant_group, utility_plant_item

Usage Warnings#

Some columns contain subtotals; use caution when choosing columns to aggregate.
Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Additional Details#

There are a few layers of nesting in this table, which can be distinguished by the categories in the utility_plant_group column:

intangible_plant: only has one total utility_plant_item

production_plant: has several sub-component utility_plant_item and a standard total

transmission_plant: has several sub-component utility_plant_item and a standard total

distribution_plant: has several sub-component utility_plant_item and a standard total

electric_plant_in_service: has two sub-components and a total which includes those sub-components and the totals of the above categories

utility_plant_in_service: has several sub-components and the total is the sum of those sub-components as well as the total of electric_plant_in_service

total_utility_plant: has one sub-component and the total is the sum of the sub-component and utility_plant_in_service

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_utility_plant_changes.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_utility_plant_changes.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus12__yearly_utility_plant_changes.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_utility_plant_changes.parquet"
)

Columns#

Field Name	Type	Description
additions	number	Cost of acquisition of items classified within the account.
adjustments_and_transfers	number	Amount of adjustments and transfers within a class of assets.
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
borrower_name_rus	string	The name of the RUS (Rural Utilities Service) borrower.
ending_balance	number	Account balance at end of year.
is_total	boolean	Whether or not this record represents a total.
report_date	date	Date reported.
retirements	number	Cost of disposal of items classified within the account.
state	string	Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.
utility_plant_group	string	High-level category of utility plant asset type.
utility_plant_item	string	Sub-category of utility_plant_group describing utility plant asset item.

out_rus7__yearly_balance_sheet_assets#

Noticed any irregularities with this table? File a report.

Annual time series of assets and other debts from the balance sheet.

Most-recent data:: 2021
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part C)
Primary key:: report_date, borrower_id_rus, asset_type

Usage Warnings#

Some columns contain subtotals; use caution when choosing columns to aggregate.
Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_balance_sheet_assets.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_balance_sheet_assets.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus7__yearly_balance_sheet_assets.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_balance_sheet_assets.parquet"
)

Columns#

Field Name	Type	Description
asset_type	string	Type of asset being reported to the core_ferc1__yearly_balance_sheet_assets_sched110 table.
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
borrower_name_rus	string	The name of the RUS (Rural Utilities Service) borrower.
ending_balance	number	Account balance at end of year.
is_total	boolean	Whether or not this record represents a total.
report_date	date	Date reported.
state	string	Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

out_rus7__yearly_balance_sheet_liabilities#

Noticed any irregularities with this table? File a report.

Annual time series of liabilities and other credits from the balance sheet.

Most-recent data:: 2021
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part C)
Primary key:: report_date, borrower_id_rus, liability_type

Usage Warnings#

Some columns contain subtotals; use caution when choosing columns to aggregate.
Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_balance_sheet_liabilities.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_balance_sheet_liabilities.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus7__yearly_balance_sheet_liabilities.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_balance_sheet_liabilities.parquet"
)

Columns#

Field Name	Type	Description
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
borrower_name_rus	string	The name of the RUS (Rural Utilities Service) borrower.
ending_balance	number	Account balance at end of year.
is_total	boolean	Whether or not this record represents a total.
liability_type	string	Type of liability being reported to the core_ferc1__yearly_balance_sheet_liabilities_sched110 table.
report_date	date	Date reported.
state	string	Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

out_rus7__yearly_customer_energy_efficiency_and_conservation_loans#

Noticed any irregularities with this table? File a report.

Annual time series of the repayment status of loans made by a borrower to customers for investments in energy efficiency and conservation initiatives.

Most-recent data:: 2021
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part J)
Primary key:: report_date, borrower_id_rus, loan_status

Usage Warnings#

Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Additional Details#

Energy Resources Conservation (ERC) loans are defined by the RUS as loans made by an RUS borrower to its consumers for the cost of labor and materials for the following energy conservation measures: caulking, weather-stripping, ceiling insulation, wall insulation, floor insulation, duct insulation, pipe insulation, water heater insulation storm windows, thermal windows, storm or thermal doors, clock thermostats and attic ventilation fans.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_customer_energy_efficiency_and_conservation_loans.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_customer_energy_efficiency_and_conservation_loans.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus7__yearly_customer_energy_efficiency_and_conservation_loans.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_customer_energy_efficiency_and_conservation_loans.parquet"
)

Columns#

Field Name	Type	Description
actual_pct	number	Observed percentage.
anticipated_pct	number	Expected percentage.
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
borrower_name_rus	string	The name of the RUS (Rural Utilities Service) borrower.
loan_status	string	The repayment status of a loan.
report_date	date	Date reported.
state	string	Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.
ytd_dollars	number	Balance this current year, in U.S. dollars.

out_rus7__yearly_distribution_services#

Noticed any irregularities with this table? File a report.

Annual time series of distribution services.

Most-recent data:: 2021
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part B)
Primary key:: report_date, borrower_id_rus, service_status

Usage Warnings#

Some columns contain subtotals; use caution when choosing columns to aggregate.
Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_distribution_services.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_distribution_services.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus7__yearly_distribution_services.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_distribution_services.parquet"
)

Columns#

Field Name	Type	Description
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
borrower_name_rus	string	The name of the RUS (Rural Utilities Service) borrower.
is_total	boolean	Whether or not this record represents a total.
report_date	date	Date reported.
service_status	string	Status of services (e.g., idle, retired) in report period. Idle services exclude seasonals.
services	number	Number of services in system at end of year.
state	string	Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

out_rus7__yearly_employee_statistics#

Noticed any irregularities with this table? File a report.

Annual time series of statistics about employment and payroll.

Most-recent data:: 2021
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part H)
Primary key:: report_date, borrower_id_rus

Usage Warnings#

Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_employee_statistics.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_employee_statistics.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus7__yearly_employee_statistics.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_employee_statistics.parquet"
)

Columns#

Field Name	Type	Description
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
borrower_name_rus	string	The name of the RUS (Rural Utilities Service) borrower.
employee_hours_worked_over_time	integer	The number of overtime hours worked by employees.
employee_hours_worked_regular_time	integer	The number of regular (non-overtime) hours worked by employees.
employees_fte_num	integer	The number of full time employees.
payroll_capitalized	integer	The amount of payroll spent that was capitalized.
payroll_expensed	integer	The amount of payroll spent that was expensed.
payroll_other	integer	The amount of payroll spent that was funded by other means - not capitalized or expensed.
report_date	date	Date reported.
state	string	Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

out_rus7__yearly_energy_efficiency#

Noticed any irregularities with this table? File a report.

Annual time series of investments in and impacts of energy efficiency programs.

Most-recent data:: 2021
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part P)
Primary key:: report_date, borrower_id_rus, customer_class, observation_period

Usage Warnings#

The savings_mmbtu likely contains values with incorrect units.
Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_energy_efficiency.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_energy_efficiency.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus7__yearly_energy_efficiency.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_energy_efficiency.parquet"
)

Columns#

Field Name	Type	Description
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
borrower_name_rus	string	The name of the RUS (Rural Utilities Service) borrower.
customer_class	string	High level categorization of customer type.
customers_num	number	Number of customers.
invested	number	The amount of money invested.
observation_period	string	The date range that any given record pertains to. Ex: ‘december’ implies that this record covers the month of December only, while ‘avg’ implies this record pertains to the average of the reporting period.
report_date	date	Date reported.
savings_mmbtu	number	The estimated amount of energy savings from energy efficiency programs. Warning: We found values much larger than expected that we have not yet cleaned - this is likely a reporting unit error.
state	string	Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

out_rus7__yearly_energy_purchased#

Noticed any irregularities with this table? File a report.

Annual time series of energy purchased by RUS borrowers.

Most-recent data:: 2020
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part K)
Primary key:: This table has no primary key. The primary key would probably be report_date, borrower_id_rus, fuel_type_code, supplier_code_rus, renewable_energy_program if not for certain EIA utilities represented as Miscellaneous (supplier code 700000).

Usage Warnings#

Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Additional Details#

A number of records report purchases from an unspecified entity. These fields all have reported utility names beginning with an *, including *Miscellaneous, *Adjustments, and *Residential Renewable Supplier.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_energy_purchased.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_energy_purchased.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus7__yearly_energy_purchased.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_energy_purchased.parquet"
)

Columns#

Field Name	Type	Description
average_energy_cost_dollars_per_mwh	number	The average cost of energy per MWh.
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
borrower_name_rus	string	The name of the RUS (Rural Utilities Service) borrower.
comments	string	General comments field.
fuel_cost_adjustment	number	The variable fuel surcharge component of a distribution cooperative’s wholesale purchased power bill, reflecting pass-through of actual fuel cost fluctuations from the supplying utility, reported separately from base power charges and wheeling costs. Included in the total cost.
fuel_type_code_rus	integer	Unique numeric identifier for RUS fuel types.
is_supplier_eia_respondent	boolean	Whether the utility supplying energy to a RUS borrower is an EIA respondent.
purchased_energy_cost_total	number	The total cost of purchased energy. Includes fuel cost adjustment and wheeling and other charges.
purchased_mwh	number	The total electricity purchased.
report_date	date	Date reported.
state	string	Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.
supplier_code_rus	string	Unique numeric identifier for the utility supplying energy to a RUS borrower.
utility_name_eia	string	The name of the utility.
wheeling_and_other_charges	number	The cost of wheeling and other charges or credits related to fuel. Included in the total cost.

out_rus7__yearly_external_financial_risk_ratio#

Noticed any irregularities with this table? File a report.

Annual time series of ratio of investments and loan guarantee balances to total utility plant assets.

Most-recent data:: 2021
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part Q - Section III)
Primary key:: report_date, borrower_id_rus

Usage Warnings#

Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_external_financial_risk_ratio.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_external_financial_risk_ratio.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus7__yearly_external_financial_risk_ratio.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_external_financial_risk_ratio.parquet"
)

Columns#

Field Name	Type	Description
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
borrower_name_rus	string	The name of the RUS (Rural Utilities Service) borrower.
external_financial_risk_ratio	number	total investments + loan guarantee balances / total utility plant assets. This ratio shows how much a utility is financially exposed to outside entities relative to its own assets.
report_date	date	Date reported.
state	string	Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

out_rus7__yearly_investments#

Noticed any irregularities with this table? File a report.

Annual time series of investments, loan guarantees and loans.

Most-recent data:: 2021
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part P - Section I)
Primary key:: This table has no primary key. This is a list of all investments or loans in each year and borrowers can have multiple records with the same investment_description.

Usage Warnings#

Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Additional Details#

Reporting of investments is required by 7 CFR 1717, Subpart N. Investment categories reported on this Part correspond to Balance Sheet items in Part C.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_investments.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_investments.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus7__yearly_investments.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_investments.parquet"
)

Columns#

Field Name	Type	Description
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
borrower_name_rus	string	The name of the RUS (Rural Utilities Service) borrower.
excluded_investments	number	Excluded investment.
for_rural_development	boolean	Whether or not the investment or loan is for rural development. This includes investments in any/all types of projects or products that were made to improve the economy and/or quality of life in the specified area.
included_investments	number	Included investment.
income_or_loss	number	Income or loss from investment.
investment_description	string	Description of investment. This is a free-form text field and thus contains a wide variety of values.
investment_type_code	integer	Investment type code.
report_date	date	Date reported.
state	string	Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

out_rus7__yearly_loans#

Noticed any irregularities with this table? File a report.

Annual time series of loans provided by RUS borrowers.

Most-recent data:: 2021
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part Q - Sections II & IV)
Primary key:: This table has no primary key. Borrowers may receive multiple loans from lending_organizations in a given year.

Usage Warnings#

Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Additional Details#

This table also includes loan guarantees where the RUS borrower backs a loan from another entity and is therefore liable to pay any remaining balance should the original borrower default.

In 2006, the loan maturity date for borrower ND0051’s loan from ERC - Paulson, David was reported as 2/8/2820. There is no clear way to determine the correct maturity date given that 2006 is the first year of data we have and the same loan does not appear in future years. For this reason we’ve nulled the date.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_loans.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_loans.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus7__yearly_loans.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_loans.parquet"
)

Columns#

Field Name	Type	Description
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
borrower_name_rus	string	The name of the RUS (Rural Utilities Service) borrower.
for_rural_development	boolean	Whether or not the investment or loan is for rural development. This includes investments in any/all types of projects or products that were made to improve the economy and/or quality of life in the specified area.
is_loan_guarantee	boolean	Indicates a third-party loan that the reporting utility (referred as a borrower) has co-signed, taking on responsibility for repayment if the primary borrower defaults.
loan_balance	number	The amount of money still owned on a loan at the end of the reporting year.
loan_maturity_date	date	The date on which a loan is scheduled to be fully paid.
loan_original_amount	number	The original amount of a loan.
loan_recipient	string	The organization that received a loan.
report_date	date	Date reported.
state	string	Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

out_rus7__yearly_long_term_debt#

Noticed any irregularities with this table? File a report.

Annual time series of long term debt and debt service requirements.

Most-recent data:: 2021
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part N)
Primary key:: This table has no primary key. This table has no native primary key. It is a list of all debts in each year and borrowers can have multiple records with the same investment_description.

Usage Warnings#

Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_long_term_debt.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_long_term_debt.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus7__yearly_long_term_debt.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_long_term_debt.parquet"
)

Columns#

Field Name	Type	Description
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
borrower_name_rus	string	The name of the RUS (Rural Utilities Service) borrower.
debt_description	string	Description of debt or loan. On the original form, there are nine provided descriptions and a section to add other free-form descriptions.
debt_ending_balance	number	The amount of principal still owned on the debt at the end of the report year.
debt_interest	number	The interest expense on the debt for the report year.
debt_principal	number	The principal paid on the debt during the report year.
debt_total	number	The total amount of debt.
report_date	date	Date reported.
state	string	Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

out_rus7__yearly_long_term_leases#

Noticed any irregularities with this table? File a report.

Annual time series of long term leases by property type.

Most-recent data:: 2021
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part L)
Primary key:: This table has no primary key. Borrowers may receive multiple leases from lending_organizations in a given year.

Usage Warnings#

Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_long_term_leases.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_long_term_leases.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus7__yearly_long_term_leases.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_long_term_leases.parquet"
)

Columns#

Field Name	Type	Description
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
borrower_name_rus	string	The name of the RUS (Rural Utilities Service) borrower.
lending_organization	string	The organization that provided a lease or loan.
property_type	string	The type of property leased.
rental_cost_ytd	number	Year-to-date rental cost for leased property.
report_date	date	Date reported.
state	string	Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

out_rus7__yearly_materials_and_supplies#

Noticed any irregularities with this table? File a report.

Annual time series of cost of electric vs. other materials that were purchased, salvaged, used, or sold.

Most-recent data:: 2021
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part F)
Primary key:: report_date, borrower_id_rus, electric_or_other_materials

Usage Warnings#

Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_materials_and_supplies.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_materials_and_supplies.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus7__yearly_materials_and_supplies.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_materials_and_supplies.parquet"
)

Columns#

Field Name	Type	Description
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
borrower_name_rus	string	The name of the RUS (Rural Utilities Service) borrower.
electric_or_other_materials	string	Whether the cost is for electric materials or other materials
materials_adjustment	number	An adjustment value for the cost of materials and supplies.
materials_ending_balance	number	The balance at the end of the report year for materials and supplies.
materials_purchased	number	The cost of materials and supplies purchased.
materials_salvaged	number	The cost of materials and supplies salvaged.
materials_sold	number	The cost of materials and supplies sold.
materials_used	number	The cost of materials and supplies used.
report_date	date	Date reported.
state	string	Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

out_rus7__yearly_meeting_and_board#

Noticed any irregularities with this table? File a report.

Annual time series of governance information about RUS borrowers’ annual member meetings as well as information about their board.

Most-recent data:: 2021
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part M)
Primary key:: report_date, borrower_id_rus

Usage Warnings#

Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_meeting_and_board.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_meeting_and_board.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus7__yearly_meeting_and_board.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_meeting_and_board.parquet"
)

Columns#

Field Name	Type	Description
board_members_num	integer	The total number of board members.
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
borrower_name_rus	string	The name of the RUS (Rural Utilities Service) borrower.
does_manager_have_written_contract	boolean	Whether or not the RUS borrower’s manager has a written contract.
fees_and_expenses_for_board_members	integer	The total amount of fees and expenses for board members.
last_annual_meeting_date	datetime	The date of the last annual meeting.
members_num	integer	The total number of members.
members_present_at_meeting_num	integer	The number of members present at the last annual meeting.
members_voting_by_proxy_or_mail_num	integer	The number of members voting by mail or by proxy.
report_date	date	Date reported.
state	string	Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.
was_quorum_present	boolean	Whether or not quorum was met.

out_rus7__yearly_owed_by_customers#

Noticed any irregularities with this table? File a report.

Annual time series of debt owed by customers.

Most-recent data:: 2021
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part J)
Primary key:: report_date, borrower_id_rus

Usage Warnings#

Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_owed_by_customers.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_owed_by_customers.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus7__yearly_owed_by_customers.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_owed_by_customers.parquet"
)

Columns#

Field Name	Type	Description
amount_due_over_60_days	number	Reported amount of dollars due over 60 days from consumers for electric service. Includes both connected and disconnected customers.
amount_written_off_ytd	number	Total charges due from consumers for electric service written off during the current year to Account 144.1, representing the write-off of uncollectible accounts.
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
borrower_name_rus	string	The name of the RUS (Rural Utilities Service) borrower.
report_date	date	Date reported.
state	string	Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

out_rus7__yearly_patronage_capital#

Noticed any irregularities with this table? File a report.

Annual time series of patronage capital distributed and received.

Most-recent data:: 2021
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part I)
Primary key:: report_date, borrower_id_rus, patronage_type

Usage Warnings#

Some columns contain subtotals; use caution when choosing columns to aggregate.
Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_patronage_capital.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_patronage_capital.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus7__yearly_patronage_capital.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_patronage_capital.parquet"
)

Columns#

Field Name	Type	Description
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
borrower_name_rus	string	The name of the RUS (Rural Utilities Service) borrower.
is_total	boolean	Whether or not this record represents a total.
patronage_cumulative	number	Amount of patronage distributed or received cumulatively. Received patronage capital is not reported cumulatively and thus will be null.
patronage_report_year	number	Amount of patronage distributed or received within report year.
patronage_type	string	Type of patronage capital distributed or received.
report_date	date	Date reported.
state	string	Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

out_rus7__yearly_power_requirements#

Noticed any irregularities with this table? File a report.

Annual time series of power requirements - revenue and generation summary.

Most-recent data:: 2021
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part O)
Primary key:: report_date, borrower_id_rus

Usage Warnings#

Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Additional Details#

This table includes totals of electric sales and revenue which also appear in core_rus7__yearly_power_requirements_electric_sales with a customer_class of total. This table includes all other power requirements - not broken out by customer class, so we include these electric requirements in this table as well.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_power_requirements.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_power_requirements.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus7__yearly_power_requirements.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_power_requirements.parquet"
)

Columns#

Field Name	Type	Description
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
borrower_name_rus	string	The name of the RUS (Rural Utilities Service) borrower.
electric_sales_mwh	number	Total MWh Sold to electric sales. Total of lines 1b thru 9b on the original form.
electric_sales_revenue	integer	Total Revenue Received From Sales of Electric Energy. Total of lines 1c thru 9c on the original form.
generated_mwh	number	The total electricity generated.
interchange_mwh	number	The net interchange of electricity. The net amount of electricity exchanged in purchases and sales.
is_peak_coincident	boolean	Whether or not the peak_mw is coincident or non-coincident peak.
other_electric_revenue	integer	Electric revenue other than electric_sales_revenue.
own_use_mwh	number	The electricity in MWh used for the borrower’s own internal use.
peak_mw	number	The peak system MWh - the sum of all MW.
purchased_mwh	number	The total electricity purchased.
purchases_and_generation_cost	integer	The cost of purchases and generation of electricity.
report_date	date	Date reported.
state	string	Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.
transmission_revenue	integer	Transmission revenue.

out_rus7__yearly_power_requirements_electric_customers#

Noticed any irregularities with this table? File a report.

Annual time series of power requirements - number of customers served by customer type.

Most-recent data:: 2021
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part O)
Primary key:: report_date, borrower_id_rus, customer_class, observation_period

Usage Warnings#

Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_power_requirements_electric_customers.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_power_requirements_electric_customers.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus7__yearly_power_requirements_electric_customers.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_power_requirements_electric_customers.parquet"
)

Columns#

Field Name	Type	Description
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
borrower_name_rus	string	The name of the RUS (Rural Utilities Service) borrower.
customer_class	string	High level categorization of customer type.
customers_num	number	Number of customers.
observation_period	string	The date range that any given record pertains to. Ex: ‘december’ implies that this record covers the month of December only, while ‘avg’ implies this record pertains to the average of the reporting period.
report_date	date	Date reported.
state	string	Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

out_rus7__yearly_power_requirements_electric_sales#

Noticed any irregularities with this table? File a report.

Annual time series of power requirements - revenue and energy sold by customer type.

Most-recent data:: 2021
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part O)
Primary key:: report_date, borrower_id_rus, customer_class

Usage Warnings#

Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_power_requirements_electric_sales.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_power_requirements_electric_sales.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus7__yearly_power_requirements_electric_sales.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_power_requirements_electric_sales.parquet"
)

Columns#

Field Name	Type	Description
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
borrower_name_rus	string	The name of the RUS (Rural Utilities Service) borrower.
customer_class	string	High level categorization of customer type.
report_date	date	Date reported.
revenue	number	Amount of revenue.
sales_mwh	number	Quantity of electricity sold in MWh.
state	string	Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

out_rus7__yearly_service_interruptions#

Noticed any irregularities with this table? File a report.

Annual time series of service interruptions by cause.

Most-recent data:: 2021
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part G)
Primary key:: report_date, borrower_id_rus, service_interruption_cause, observation_period

Usage Warnings#

Some columns contain subtotals; use caution when choosing columns to aggregate.
Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_service_interruptions.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_service_interruptions.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus7__yearly_service_interruptions.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_service_interruptions.parquet"
)

Columns#

Field Name	Type	Description
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
borrower_name_rus	string	The name of the RUS (Rural Utilities Service) borrower.
is_total	boolean	Whether or not this record represents a total.
observation_period	string	The date range that any given record pertains to. Ex: ‘december’ implies that this record covers the month of December only, while ‘avg’ implies this record pertains to the average of the reporting period.
report_date	date	Date reported.
saidi_minutes	number	Cumulative duration (minutes) of interruption for the average customer during the report year.
service_interruption_cause	string	Source of service interruption.
state	string	Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

out_rus7__yearly_statement_of_operations#

Noticed any irregularities with this table? File a report.

Annual time series of opex and cost of electric service for RUS borrowers by time period.

Most-recent data:: 2021
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part A)
Primary key:: report_date, borrower_id_rus, opex_group, opex_type

Usage Warnings#

Some columns contain subtotals; use caution when choosing columns to aggregate.
Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_statement_of_operations.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_statement_of_operations.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus7__yearly_statement_of_operations.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_statement_of_operations.parquet"
)

Columns#

Field Name	Type	Description
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
borrower_name_rus	string	The name of the RUS (Rural Utilities Service) borrower.
is_total	boolean	Whether or not this record represents a total.
opex_group	string	High level section from the statement of operations table. Most of these types have subcomponents broken out in the `opex_type` column.
opex_report_month	number	Amount of operational expense, cost or income during the report month.
opex_type	string	Type of item from the statement of operations.
opex_ytd	number	The year-to-date amount of operational expense, cost or income.
opex_ytd_budget	number	The year-to-date budget for amount of operational expense, cost or income.
report_date	date	Date reported.
state	string	Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

out_rus7__yearly_transmission_and_distribution_mileage#

Noticed any irregularities with this table? File a report.

Annual time series of miles of transmission and distribution infrastructure.

Most-recent data:: 2021
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part B)
Primary key:: report_date, borrower_id_rus, line_type

Usage Warnings#

Some columns contain subtotals; use caution when choosing columns to aggregate.
Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Additional Details#

Note that according to RUS, ‘underbuild in overhead lines or joint runs in underground installations do not increase the number of line miles except for distribution underbuild on transmission poles. In such cases, distribution pole line miles would be increased by the number of underbuild miles involved.’

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_transmission_and_distribution_mileage.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_transmission_and_distribution_mileage.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus7__yearly_transmission_and_distribution_mileage.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_transmission_and_distribution_mileage.parquet"
)

Columns#

Field Name	Type	Description
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
borrower_name_rus	string	The name of the RUS (Rural Utilities Service) borrower.
is_total	boolean	Whether or not this record represents a total.
line_type	string	The type of line mileage reported (e.g., transmission, overhead distribution).
miles	number	Line length at the end of the reported period, in miles.
report_date	date	Date reported.
state	string	Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

out_rus7__yearly_utility_plant_changes#

Noticed any irregularities with this table? File a report.

Annual time series of changes in utility plant

Most-recent data:: 2021
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part E)
Primary key:: report_date, borrower_id_rus, utility_plant_group, utility_plant_item

Usage Warnings#

Some columns contain subtotals; use caution when choosing columns to aggregate.
Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_utility_plant_changes.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_utility_plant_changes.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus7__yearly_utility_plant_changes.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_utility_plant_changes.parquet"
)

Columns#

Field Name	Type	Description
additions	number	Cost of acquisition of items classified within the account.
adjustments_and_transfers	number	Amount of adjustments and transfers within a class of assets.
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
borrower_name_rus	string	The name of the RUS (Rural Utilities Service) borrower.
ending_balance	number	Account balance at end of year.
is_total	boolean	Whether or not this record represents a total.
report_date	date	Date reported.
retirements	number	Cost of disposal of items classified within the account.
state	string	Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.
utility_plant_group	string	High-level category of utility plant asset type.
utility_plant_item	string	Sub-category of utility_plant_group describing utility plant asset item.

out_sec10k__changelog_company_name#

Noticed any irregularities with this table? File a report.

Changelog table tracking changes in the names each SEC 10-K filer has used.

Most-recent data:: 2023
Processing:: Data is ready for use in analyses, but for practical reasons has not been denormalized and remains in narrow format.
Source:: U.S. Securities and Exchange Commission (SEC) Form 10-K
Primary key:: This table has no primary key.

Usage Warnings#

PUDL does not currently update its copy of this data.

Additional Details#

We use the company name reported in association with each name change block in the company information table to fill in the most recent value of company_name_new. Roughly 1,000 reported “name changes” in which the old and new names were identical have been dropped.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_sec10k__changelog_company_name.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_sec10k__changelog_company_name.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_sec10k__changelog_company_name.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_sec10k__changelog_company_name.parquet"
)

Columns#

Field Name	Type	Description
central_index_key	string	Identifier of the company in SEC database.
company_name_new	string	Name of company after name change.
company_name_old	string	Name of company prior to name change.
name_change_date	date	Date of last name change of the company.

out_sec10k__parents_and_subsidiaries#

Noticed any irregularities with this table? File a report.

A denormalized table containing information about parent companies that file SEC Form 10-K and their subsidiaries, which may or may not file Form 10-K.

Most-recent data:: 2023
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: U.S. Securities and Exchange Commission (SEC) Form 10-K
Primary key:: filename_sec10k, subsidiary_company_id_sec10k

Usage Warnings#

This data has been extracted probabilistically using a machine learning model and contains some incompletions and errors. It should not be treated as ground truth data.
PUDL does not currently update its copy of this data.

Additional Details#

Company ownership fractions are extracted from SEC 10-K Exhibit 21. Information about the companies is extracted primarily from the headers of the SEC 10-K filing. Subsidiaries that file Form 10-K will have much more information available than those that only appear as subsidiaries in Exhibit 21.

SEC 10-K filers and EIA utilities are matched using probabilistic record linkage. Exhibit 21 subsidiaries that don’t file a Form 10-K are matched to EIA utilities using the company name.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_sec10k__parents_and_subsidiaries.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_sec10k__parents_and_subsidiaries.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_sec10k__parents_and_subsidiaries.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_sec10k__parents_and_subsidiaries.parquet"
)

Columns#

Field Name	Type	Description
filename_sec10k	string	Unique portion of the filename associated with the SEC 10-K filing in the EDGAR database. The full source URL can be reconstructed by prepending https://www.sec.gov/Archives/edgar/data/ and adding the .txt file type extension.
filing_date	date	Date on which the filing was submitted.
fraction_owned	number	Proportion of generator ownership attributable to this utility.
parent_company_business_city	string	City where the parent company’s place of business is located.
parent_company_business_state	string	State where the parent company’s place of business is located.
parent_company_business_street_address	string	Street address of the parent company’s place of business.
parent_company_business_street_address_2	string	Second line of the street address of the parent company’s place of business.
parent_company_business_zip_code	string	Zip code of the parent company’s place of business.
parent_company_business_zip_code_4	string	Zip code suffix of the company’s place of business.
parent_company_central_index_key	string	Central index key (CIK) of the parent company.
parent_company_incorporation_state	string	Two letter state code where parent company is incorporated.
parent_company_industry_id_sic	string	Four-digit Standard Industrial Classification (SIC) code identifying the parent company’s primary industry. SIC codes have been replaced by NAICS codes in many applications, but are still used by the SEC. See e.g. https://www.osha.gov/data/sic-manual for code definitions.
parent_company_industry_name_sic	string	Text description of the parent company’s Standard Industrial Classification (SIC)
parent_company_mail_city	string	City of the parent company’s mailing address.
parent_company_mail_state	string	State of the parent company’s mailing address.
parent_company_mail_street_address	string	Street portion of the parent company’s mailing address.
parent_company_mail_street_address_2	string	Second line of the street portion of the parent company’s mailing address.
parent_company_mail_zip_code	string	Zip code of the parent company’s mailing address.
parent_company_mail_zip_code_4	string	Zip code suffix of the parent company’s mailing address.
parent_company_name	string	Name of the parent company.
parent_company_phone_number	string	Phone number of the parent company.
parent_company_taxpayer_id_irs	string	Taxpayer ID of the parent company with the IRS.
parent_company_utility_id_eia	integer	The EIA utility ID of the parent company.
parent_company_utility_name_eia	string	The EIA reported utility name of the parent company.
report_date	date	Date reported.
subsidiary_company_business_city	string	City where the subsidiary company’s place of business is located.
subsidiary_company_business_state	string	State where the subsidiary company’s place of business is located.
subsidiary_company_business_street_address	string	Street address of the subsidiary company’s place of business.
subsidiary_company_business_street_address_2	string	Second line of the street address of the subsidiary company’s place of business.
subsidiary_company_business_zip_code	string	Zip code of the subsidiary company’s place of business.
subsidiary_company_business_zip_code_4	string	Zip code suffix of the subsidiary company’s place of business.
subsidiary_company_central_index_key	string	Central index key (CIK) of the subsidiary company.
subsidiary_company_id_sec10k	string	PUDL-assigned ID for subsidiaries found in SEC 10-K Exhibit 21. The ID is created by concatenating the CIK of the company whose filing the subsidiary was found in, the subsidiary company’s name, and location of incorporation. It is not guaranteed to be stable across different releases of PUDL and so should never be hard-coded in analyses.
subsidiary_company_incorporation_state	string	Two letter state code where subisidary company is incorporated.
subsidiary_company_industry_id_sic	string	Four-digit Standard Industrial Classification (SIC) code identifying the subsidiary company’s primary industry. SIC codes have been replaced by NAICS codes in many applications, but are still used by the SEC. See e.g. https://www.osha.gov/data/sic-manual for code definitions.
subsidiary_company_industry_name_sic	string	Text description of the subsidiary company’s Standard Industrial Classification (SIC)
subsidiary_company_location	string	Location of subsidiary company. This is the full US state name or country name and occasionally a two digit code that was not mapped to a full name during cleaning.
subsidiary_company_mail_city	string	City of the subsidiary company’s mailing address.
subsidiary_company_mail_state	string	State of the parent company’s mailing address.
subsidiary_company_mail_street_address	string	Street portion of the subsidiary company’s mailing address.
subsidiary_company_mail_street_address_2	string	Second line of the street portion of the subsidiary company’s mailing address.
subsidiary_company_mail_zip_code	string	Zip code of the subsidiary company’s mailing address.
subsidiary_company_mail_zip_code_4	string	Zip code suffix of the subsidiary company’s mailing address.
subsidiary_company_name	string	Name of subsidiary company.
subsidiary_company_phone_number	string	Phone number of the subsidiary company.
subsidiary_company_taxpayer_id_irs	string	Taxpayer ID of the subsidiary company with the IRS.
subsidiary_company_utility_id_eia	integer	The EIA utility ID of the subsidiary company.
subsidiary_company_utility_name_eia	string	The EIA reported utility name of the subsidiary company.

out_sec10k__quarterly_company_information#

Noticed any irregularities with this table? File a report.

Quarterly time series of company and filing data extracted from SEC 10-K filings.

Most-recent data:: 2023
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: U.S. Securities and Exchange Commission (SEC) Form 10-K
Primary key:: filename_sec10k, central_index_key

Usage Warnings#

PUDL does not currently update its copy of this data.

Additional Details#

In addition to the information provided by the core_sec10k__quarterly_company_information table, this output table merges in the associated utility_id_eia (and utility name) if it is available, as well as the report and filing dates associated with the filing each record was extracted from, as well as providing a link to the source URL for the filing.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_sec10k__quarterly_company_information.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_sec10k__quarterly_company_information.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_sec10k__quarterly_company_information.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_sec10k__quarterly_company_information.parquet"
)

Columns#

Field Name	Type	Description
business_city	string	City where the company’s place of business is located.
business_postal_code	string	Non-US postal code of the company’s place of business.
business_state	string	State where the company’s place of business is located.
business_street_address	string	Street address of the company’s place of business.
business_street_address_2	string	Second line of the street address of the company’s place of business.
business_zip_code	string	Zip code of the company’s place of business.
business_zip_code_4	string	Zip code suffix of the company’s place of business.
central_index_key	string	Identifier of the company in SEC database.
company_name	string	Name of the reporting company.
filename_sec10k	string	Unique portion of the filename associated with the SEC 10-K filing in the EDGAR database. The full source URL can be reconstructed by prepending https://www.sec.gov/Archives/edgar/data/ and adding the .txt file type extension.
filer_count	integer	A counter indicating which observation of company data within an SEC 10-K filing header the record pertains to.
filing_date	date	Date on which the filing was submitted.
filing_number_sec	string	Filing number used internally by the SEC commission to track filing.
film_number	string	Document control number used in the SEC EDGAR database. The first four digits can be used to access scans of the document in the SEC’s Virtual Private Reference Room.
fiscal_year_end	string	The end date of an SEC filing company’s fiscal year, in MMDD format.
incorporation_state	string	Two letter state code where company is incorporated.
industry_group_sic	string	A higher level industry category defined within the Standard Industrial Classification (SIC) system.
industry_id_sic	string	Four-digit Standard Industrial Classification (SIC) code identifying the company’s primary industry. SIC codes have been replaced by NAICS codes in many applications, but are still used by the SEC. See e.g. https://www.osha.gov/data/sic-manual for code definitions.
industry_name_sic	string	Text description of Standard Industrial Classification (SIC)
mail_city	string	City of the company’s mailing address.
mail_postal_code	string	Non-US postal code of the company’s mailing address.
mail_state	string	State of the company’s mailing address.
mail_street_address	string	Street portion of the company’s mailing address.
mail_street_address_2	string	Second line of the street portion of the company’s mailing address.
mail_zip_code	string	Zip code of the company’s mailing address.
mail_zip_code_4	string	Zip code suffix of the company’s mailing address.
phone_number	string	Phone number for utility contact 1.
report_date	date	Date reported.
sec10k_type	string	Specific version of SEC 10-K that was filed. 10-k: the standard annual report. 10-k/a: an amended version of the annual report. 10-k405: filed to report insider trading that was not reported in a timely fashion. 10-k405/a: an amended version of the 10-k405. 10-kt: submitted in lieu of or in addition to a standard 10-K annual report when a company changes the end of its fiscal year (e.g. due to a merger) leaving the company with a longer or shorter reporting period. 10-kt/a: an amended version of the 10-kt. 10-ksb: the annual report for small businesses, also known as penny stocks. 10-ksb/a: an amended version of the 10-ksb.
sec_act	string	SEC Act through which the form was enacted, e.g. 1934 act.
source_url	string	URL pointing to the original source of the data in the record.
taxpayer_id_irs	string	Taxpayer ID of the company with the IRS.
utility_id_eia	integer	The EIA Utility Identification number.
utility_name_eia	string	The name of the utility.

out_sec10k__quarterly_filings#

Noticed any irregularities with this table? File a report.

Quarterly time series of metadata describing SEC 10-K filings.

Most-recent data:: 2023
Processing:: Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
Source:: U.S. Securities and Exchange Commission (SEC) Form 10-K
Primary key:: filename_sec10k

Usage Warnings#

PUDL does not currently update its copy of this data.

Additional Details#

Each SEC 10-K filing is submitted by a single company, but may contain information about numerous other companies. This table indicates the company submitting the filing, as well as some information about the overall filing. Each filing is guaranteed to have a unique filename, but ~1% of all filings are one company submitting the same form multiple times on the same day, so the filename is the only available natural primary key. This output table adds a link to the source URL for the filing, which is constructed from the filename.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_sec10k__quarterly_filings.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_sec10k__quarterly_filings.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_sec10k__quarterly_filings.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_sec10k__quarterly_filings.parquet"
)

Columns#

Field Name	Type	Description
central_index_key	string	Identifier of the company in SEC database.
company_name	string	Name of the reporting company.
exhibit_21_version	string	Version of exhibit 21 submitted (if applicable).
filename_sec10k	string	Unique portion of the filename associated with the SEC 10-K filing in the EDGAR database. The full source URL can be reconstructed by prepending https://www.sec.gov/Archives/edgar/data/ and adding the .txt file type extension.
filing_date	date	Date on which the filing was submitted.
report_date	date	Date reported.
sec10k_type	string	Specific version of SEC 10-K that was filed. 10-k: the standard annual report. 10-k/a: an amended version of the annual report. 10-k405: filed to report insider trading that was not reported in a timely fashion. 10-k405/a: an amended version of the 10-k405. 10-kt: submitted in lieu of or in addition to a standard 10-K annual report when a company changes the end of its fiscal year (e.g. due to a merger) leaving the company with a longer or shorter reporting period. 10-kt/a: an amended version of the 10-kt. 10-ksb: the annual report for small businesses, also known as penny stocks. 10-ksb/a: an amended version of the 10-ksb.
source_url	string	URL pointing to the original source of the data in the record.

out_vcerare__hourly_available_capacity_factor#

Noticed any irregularities with this table? File a report.

Hourly time series of estimated county-averaged capacity factors for wind and solar generating facilities across the contiguous United States (US), to be used as a tool and input for resource adequacy modeling and planning.

Most-recent data:: 2023
Processing:: Data is ready for use in analyses, but for practical reasons has not been denormalized and remains in narrow format.
Source:: Vibrant Clean Energy Resource Adequacy Renewable Energy (RARE) Power Dataset
Primary key:: state, place_name, datetime_utc

Usage Warnings#

The hourly capacity factors are normalized to unity for maximal power output. To convert to units of power, the user must multiply by the installed capacity within the county.
Hourly capacity factors are spatially averaged across each county over the contiguous USA. There are a handful of counties that are too small to pick up representation on the HRRR operational forecast grid. As such, these counties will have no wind or solar power production curves.
Due to power production performance being correlated with panel temperatures, during cold sunny periods, some solar capacity factor values are greater than 1 (but less that 1.1).

Additional Details#

The data in this table were produced by Vibrant Clean Energy, and are licensed to the public under the Creative Commons Attribution 4.0 International license (CC-BY-4.0).

The technologies provided are:

Onshore wind assuming a 100m hub height and 120m rotor diameter;
Offshore wind assuming a 140m hub height and 120m rotor diameter;
Utility solar assuming a fixed axis panel tilted at latitude.

The foundation of the capacity factors provided here is the NOAA HRRR operational numerical weather prediction model. The HRRR covers the entire contiguous US at a horizontal resolution of 3 km. Forecasts are initialized each hour of the year. Forecast hour two (2) is used as the input data for the power algorithms. This forecast hour is chosen to trade-off the impact of the measurement and data assimilation procedure of the HRRR with the physics of the model to derive the most complete picture of the atmosphere at the forecast time horizon.

For wind capacity factors: vertical slices of the atmosphere are considered across the defined rotor swept area. Bringing together wind speed, density, temperature and icing information, a power capacity is estimated using a representative power coefficient (Cp) curve to determine the power from a given wind speed, atmospheric density and temperature. There is no wake modeling included in the dataset.

For solar capacity factors: pertinent surface weather variables are pulled such as incoming short wave radiation, direct normal irradiance (calculated in the HRRR 2016 forward), surface temperature and other parameters. These are used in a non-linear I-V curve translation to power capacity factors.

Access methods#

Warning

This table will likely not fit in your computer’s memory. Make sure to filter or aggregate to reduce the amount of data read in. This is easier in polars and duckdb than in pandas, so we only provide examples for these.

polars

import polars as pl
df = (
    # Use scan_parquet (lazy evaluation) to plan the query before execution
    pl.scan_parquet(
        "s3://pudl.catalyst.coop/nightly/out_vcerare__hourly_available_capacity_factor.parquet",
        storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
    )
    # Select only a subset of columns to reduce network and memory usage
    .select(
        [
            "report_year",
            "datetime_utc",
            "hour_of_year",
            "county_id_fips",
            "state",
            "place_name",
            "capacity_factor_solar_pv",
        ]
    )
    .filter(
        # Select data for only a single year and state to reduce network and memory usage
        (pl.col("report_year") == 2022)
        & (pl.col("state") == "CO")
    )
    .collect()
)

SQL (DuckDB)

SELECT
    -- Select only a subset of columns to reduce network and memory usage
    report_year,
    datetime_utc,
    hour_of_year,
    county_id_fips,
    state,
    place_name,
    capacity_factor_solar_pv
FROM 's3://pudl.catalyst.coop/nightly/out_vcerare__hourly_available_capacity_factor.parquet';
WHERE
    -- Select data for only a single year and state to reduce network and memory usage
    report_year = 2022
    AND state = "CO";

Columns#

Field Name	Type	Description
capacity_factor_offshore_wind	number	Estimated capacity factor (0-1) calculated for offshore wind assuming a 140m hub height and 120m rotor diameter.Based on outputs from the NOAA HRRR operational numerical weather prediction model. Capacity factors are normalized to unity for maximal power output. Vertical slices of the atmosphere are considered across the defined rotor swept area. Bringing together wind speed, density, temperature and icing information, a power capacity is estimated using a representative power coefficient (Cp) curve to determine the power from a given wind speed, atmospheric density and temperature. There is no wake modeling included in the dataset.
capacity_factor_onshore_wind	number	Estimated capacity factor (0-1) calculated for onshore wind assuming a 100m hub height and 120m rotor diameter.Based on outputs from the NOAA HRRR operational numerical weather prediction model. Capacity factors are normalized to unity for maximal power output. Vertical slices of the atmosphere are considered across the defined rotor swept area. Bringing together wind speed, density, temperature and icing information, a power capacity is estimated using a representative power coefficient (Cp) curve to determine the power from a given wind speed, atmospheric density and temperature. There is no wake modeling included in the dataset.
capacity_factor_solar_pv	number	Estimated capacity factor (0-1) calculated for solar PV assuming a fixed axis panel tilted at latitude and DC power outputs. Due to power production performance being correlated with panel temperatures, during cold sunny periods, some solar capacity factor values are greater than 1 (but less that 1.1).All values are based on outputs from the NOAA HRRR operational numerical weather prediction model. Capacity factors are normalized to unity for maximal power output. Pertinent surface weather variables are pulled such as incoming short wave radiation, direct normal irradiance (calculated in the HRRR 2016 forward), surface temperature and other parameters. These are used in a non-linear I-V curve translation to power capacity factors.
county_id_fips	string	County ID from the Federal Information Processing Standard Publication 6-4.
datetime_utc	datetime	Date and time converted to Coordinated Universal Time (UTC).
hour_of_year	integer	Integer between 1 and 8670 representing the hour in a given year.
latitude	number	Latitude of the place centroid (e.g., county centroid).
longitude	number	Longitude of the place centroid (e.g., county centroid).
place_name	string	County or lake name, sourced from the latest Census PEP vintage based on county FIPS ID. Lake names originate from VCE RARE directly, and may also appear several times–once for each state it touches. FIPS ID values for lakes have been nulled.
report_year	integer	Four-digit year in which the data was reported.
state	string	Two letter US state abbreviation.

core_eia176__yearly_gas_disposition#

Noticed any irregularities with this table? File a report.

Annual time series of a company’s natural and supplemental gas disposition for the report state.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA Form 176 – Annual Report of Natural and Supplemental Gas Supply and Disposition (Part 6, Lines 9, 12-20)
Primary key:: operator_id_eia, report_year

Additional Details#

The deliveries_out_of_state_volume_mcf (Line 14.0) are reported as one aggregated volume, calculated by summing the original granular data. Similarly, disposition_to_other_volume_mcf (Line 18.4) is summed from the original granular data and reported as one aggregate field.

The delivered_gas_heat_content_mmbtu_per_mcf is expected to be between 0.8 and 1.2 by the EIA. We find that less than 0.5 percent of data falls outside of these expected bounds.

The total_disposition_mcf field includes disposition to consumers which is reported in core_eia176__yearly_gas_disposition_by_consumer. Note that the reported total disposition and the sum of values in this table and core_eia176__yearly_gas_disposition_by_consumer often don’t match as would be expected. Thus, we have preserved both the total field and the sub-components in these two tables.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia176__yearly_gas_disposition.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia176__yearly_gas_disposition.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia176__yearly_gas_disposition.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia176__yearly_gas_disposition.parquet"
)

Columns#

Field Name	Type	Description
delivered_gas_heat_content_mmbtu_per_mcf	number	The average annual heat content of gas delivered directly to consumers.
disposition_distribution_companies_mcf	number	Disposition to distribution companies within the report state that does not fall into one of the other reported categories in lines 10.1-17.0. Reference conditions for measurement are 14.73 psia and 60° Fahrenheit.
disposition_other_pipelines_mcf	number	Disposition to other pipelines within the report state that does not fall into one of the other reported categories in lines 10.1-17.0. Reference conditions for measurement are 14.73 psia and 60° Fahrenheit.
disposition_out_of_state_mcf	number	Total volume of the operator’s deliveries across or to state lines or U.S. borders. This has been summed from the detailed data reported by each company on Line 14.0 of the original form in order to preserve the primary key of the table. Reference conditions for measurement are 14.73 psia and 60° Fahrenheit.
disposition_storage_operators_mcf	number	Disposition to storage operators within the report state that does not fall into one of the other reported categories in lines 10.1-17.0. Reference conditions for measurement are 14.73 psia and 60° Fahrenheit.
losses_mcf	number	Known loss volumes as a result of leaks, damage, accidents, migration and blow down within the report state where the events took place, as well as estimated losses from leaks occurring during distribution activities. Reference conditions for measurement are 14.73 psia and 60° Fahrenheit.
operating_state	string	State that the distribution utility is reporting for. Prior to 2004, this may be a list of states.
operational_consumption_compressors_mcf	number	Volume of natural gas consumed as fuel for the operator’s own pipeline distribution or storage compressor use within the report state. Reference conditions for measurement are 14.73 psia and 60° Fahrenheit.
operational_consumption_facility_space_heat_mcf	number	Volume of natural gas consumed as fuel for space heat of the operator’s own facilities within the report state. Reference conditions for measurement are 14.73 psia and 60° Fahrenheit.
operational_consumption_lng_vaporization_liquefaction_mcf	number	Volume of natural gas consumed by the operator for vaporization, liquefaction, and LNG fuel within the report state. Reference conditions for measurement are 14.73 psia and 60° Fahrenheit.
operational_consumption_new_pipeline_fill_mcf	number	Volume of natural gas consumed as fuel for the operator’s new pipeline fill within the report state. Reference conditions for measurement are 14.73 psia and 60° Fahrenheit.
operational_consumption_other_detail	string	Free-text detail describing the operator’s specified ‘other purposes’ for operational natural-gas consumption, corresponding to the volume reported in operational_consumption_other_mcf.
operational_consumption_other_mcf	number	Volume of natural gas consumed as fuel by the operator for other purposes within the report state. Reference conditions for measurement are 14.73 psia and 60° Fahrenheit.
operational_consumption_vehicle_fuel_mcf	number	Volume of natural gas used in the operator’s company-owned fleet within the report state. Reference conditions for measurement are 14.73 psia and 60° Fahrenheit.
operational_lng_storage_injections_mcf	number	The total volume added to LNG storage operated by the company within the report state, regardless of ownership of the gas. For LNG import and export marine terminals, this should exclude injections of LNG during the course of routine operations for handling imports, as well as LNG held in storage for future use. Reference conditions for measurement are 14.73 psia and 60° Fahrenheit.
operational_storage_underground_mcf	number	The total volume added to underground storage operated by the company within the report state, regardless of ownership of the gas. This includes new fields. Reference conditions for measurement are 14.73 psia and 60° Fahrenheit.
operator_id_eia	string	The unique EIA identifier for an operator in a given state. The last two letters of the ID indicate the state.
other_disposition_all_other_mcf	number	Other disposition within the report state that does not fall into one of the other reported categories in lines 10.1-17.0. This has been summed from the detailed data reported by each company on Line 18.4 of the original form in order to preserve the primary key of the table. Reference conditions for measurement are 14.73 psia and 60° Fahrenheit.
producer_lease_use_mcf	number	The total volume of gas used in the company’s well, field and lease operations. This field should only be reported by producers. Reference conditions for measurement are 14.73 psia and 60° Fahrenheit.
producer_returned_for_repressuring_reinjection_mcf	number	The volume of gas delivered directly from the operator’s system to oil or gas fields located within the report state for repressuring, pressure maintenance, and cycling operations. This field should only be reported by producers. Reference conditions for measurement are 14.73 psia and 60° Fahrenheit.
report_year	integer	Four-digit year in which the data was reported.
total_disposition_mcf	number	Total disposition within the report state, as reported by the operator. This includes disposition to consumers which is reported in core_eia176__yearly_gas_disposition_by_consumer. Note that the reported total disposition and the sum of values in this table and core_eia176__yearly_gas_disposition_by_consumer often don’t match as it would be expected. Reference conditions for measurement are 14.73 psia and 60° Fahrenheit.
unaccounted_for_mcf	number	The difference between gas supply and disposition. A positive entry indicates supply in excess of accounted-for disposition. A negative entry indicates accounted-for disposition exceeds reported supply. This is calculated as the difference between Part 4 Line 7.0 and Part 6 Line 19.0, and is reported as Line 20.0 in the original form.

core_eia176__yearly_gas_disposition_by_consumer#

Noticed any irregularities with this table? File a report.

Annual time series of a company’s natural gas deliveries to end-use consumers within the report state.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA Form 176 – Annual Report of Natural and Supplemental Gas Supply and Disposition (Part 6, Lines 10.0-11.6)
Primary key:: report_year, operator_id_eia, customer_class, revenue_class

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia176__yearly_gas_disposition_by_consumer.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia176__yearly_gas_disposition_by_consumer.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia176__yearly_gas_disposition_by_consumer.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia176__yearly_gas_disposition_by_consumer.parquet"
)

Columns#

Field Name	Type	Description
consumers	integer	Number of end-use consumers within the report state.
customer_class	string	High level categorization of customer type (e.g., commercial, residential).
operating_state	string	State that the operator is reporting for.
operator_id_eia	string	The unique EIA identifier for an operator in a given state. The last two letters of the ID indicate the state.
report_year	integer	Four-digit year in which the data was reported.
revenue	number	Revenue including taxes, rounded to the nearest whole dollar.
revenue_class	string	Source of revenue: whether revenue originates from gas owned directly by the operator (sales) or gas transported by the operator (transport).
volume_mcf	number	Total volume of natural gas deliveries in the report state. Reference conditions for measurement are 14.73 psia and 60° Fahrenheit.

core_eia191__monthly_gas_storage#

Noticed any irregularities with this table? File a report.

Monthly time series of monthly underground natural gas storage activity reported by operators.

Most-recent data:: 2025
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA Form 191 – Monthly Underground Natural Gas Storage Report
Primary key:: storage_field_id_eia, report_date

Usage Warnings#

total_field_capacity_mcf is not reliably equal to the sum of working_gas_capacity_mcf and base_gas_mcf: approximately 23% of records differ. This reflects loose EIA definitions and operator self-reporting practices, not data errors. Do not assume additivity among these three fields.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia191__monthly_gas_storage.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia191__monthly_gas_storage.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia191__monthly_gas_storage.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia191__monthly_gas_storage.parquet"
)

Columns#

Field Name	Type	Description
base_gas_mcf	number	Volume of base gas (cushion gas) in the underground storage reservoir. Base gas is the volume of gas intended as permanent inventory in a reservoir to maintain adequate pressure and deliverability rates.
company_name	string	Name of the reporting company.
county	string	County name.
field_name	string	Name of the natural gas storage field.
field_type	string	Type of underground natural gas storage facility. One of: depleted field (a reservoir previously used for natural gas production), aquifer (a porous rock structure used to store gas), or salt dome (a cavern created within a salt formation).
gas_field_id_eia	integer	EIA-assigned numeric identifier for a natural gas storage field.
maximum_daily_delivery_mcf	number	Maximum daily quantity of gas that can be withdrawn from the underground storage reservoir under normal operating conditions.
operational_status	string	Operational status of the underground natural gas storage reservoir.
region	string	EIA storage region in which the underground natural gas storage field is located.
report_date	date	Date reported.
reservoir_id_eia	integer	EIA-assigned numeric identifier for the individual storage reservoir within a natural gas storage field.
reservoir_name	string	Name of the individual underground natural gas storage reservoir.
state	string	Two letter US state abbreviation.
storage_field_id_eia	string	EIA-assigned identifier for an underground natural gas storage reservoir. IDs are assigned per company and state, so the same physical reservoir may have different IDs if ownership changes.
total_field_capacity_mcf	number	Total design capacity of the underground natural gas storage field (base gas plus working gas).
working_gas_capacity_mcf	number	Volume of working gas (the portion of stored gas available for withdrawal) in the underground storage reservoir.

core_eia860__assn_boiler_cooling#

Noticed any irregularities with this table? File a report.

Association table providing connections between EIA boiler IDs and EIA cooling system IDs.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA Form 860 – Annual Electric Generator Report
Primary key:: report_date, plant_id_eia, boiler_id, cooling_id_eia

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860__assn_boiler_cooling.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860__assn_boiler_cooling.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia860__assn_boiler_cooling.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860__assn_boiler_cooling.parquet"
)

Columns#

Field Name	Type	Description
boiler_id	string	Alphanumeric boiler ID.
cooling_id_eia	string	EIA Identification code for cooling system (if multiple cooling systems are not distinguished by separate IDs, the word ‘PLANT’ is listed to encompass the cooling system for the entire plant)
data_maturity	string	Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.
plant_id_eia	integer	The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.
report_date	date	Date reported.

core_eia860__assn_boiler_generator#

Noticed any irregularities with this table? File a report.

Association table providing connections between boilers and generators.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA Form 860 – Annual Electric Generator Report (Schedule 6, Part A)
Primary key:: plant_id_eia, report_date, generator_id, boiler_id

Usage Warnings#

Contains information from multiple raw inputs.

Additional Details#

Augmented with various heuristics within PUDL.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860__assn_boiler_generator.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860__assn_boiler_generator.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia860__assn_boiler_generator.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860__assn_boiler_generator.parquet"
)

Columns#

Field Name	Type	Description
bga_source	string	The source from where the unit_id_pudl is compiled. The unit_id_pudl comes directly from EIA 860, or string association (which looks at all the boilers and generators that are not associated with a unit and tries to find a matching string in the respective collection of boilers or generator), or from a unit connection (where the unit_id_eia is employed to find additional boiler generator connections).
boiler_generator_assn_type_code	string	Indicates whether boiler associations with generator during the year were actual or theoretical. Only available before 2013.
boiler_id	string	Alphanumeric boiler ID.
data_maturity	string	Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.
generator_id	string	Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string!
plant_id_eia	integer	The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.
report_date	date	Date reported.
steam_plant_type_code	integer	Code that describes types of steam plants from EIA 860. See steam_plant_types_eia table for more details.
unit_id_eia	string	EIA-assigned unit identification code.
unit_id_pudl	integer	Dynamically assigned PUDL unit id. WARNING: This ID is not guaranteed to be static long term as the input data and algorithm may evolve over time.

core_eia860__assn_boiler_stack_flue#

Noticed any irregularities with this table? File a report.

Association table providing connections between EIA boiler IDs and EIA stack and/or flue system IDs.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA Form 860 – Annual Electric Generator Report
Primary key:: report_date, plant_id_eia, boiler_id, stack_flue_id_pudl

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860__assn_boiler_stack_flue.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860__assn_boiler_stack_flue.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia860__assn_boiler_stack_flue.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860__assn_boiler_stack_flue.parquet"
)

Columns#

Field Name	Type	Description
boiler_id	string	Alphanumeric boiler ID.
flue_id_eia	string	The flue identification value reported to EIA. The flue is a duct, pipe, or opening that transports exhast gases through the stack. This field was reported in conjunction with stack_id_eia until 2013 when stack_flue_id_eia took their place.
plant_id_eia	integer	The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.
report_date	date	Date reported.
stack_flue_id_eia	string	The stack or flue identification value reported to EIA. This denotes the place where emissions from the combustion process are released into the atmosphere. Prior to 2013, this was reported as stack_id_eia and flue_id_eia.
stack_flue_id_pudl	string	A stack and/or flue identification value created by PUDL for use as part of the primary key for the stack flue equipment and boiler association tables. For 2013 and onward, this value is equal to the value for stack_flue_id_eia. Prior to 2013, this value is equal to the value for stack_id_eia and the value for flue_id_eia separated by an underscore or just the stack_flue_eia in cases where flue_id_eia is NA.
stack_id_eia	string	The stack identification value reported to EIA. Stacks or chimneys are the place where emissions from the combustion process are released into the atmosphere. This field was reported in conjunction with flue_id_eia until 2013 when stack_flue_id_eia took their place.

core_eia860__assn_yearly_boiler_emissions_control_equipment#

Noticed any irregularities with this table? File a report.

Association table providing connections between EIA boiler IDs and emissions control IDs for NOx, SO2, mercury, and particulate monitoring.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA Form 860 – Annual Electric Generator Report
Primary key:: report_date, plant_id_eia, boiler_id, emission_control_id_type, emission_control_id_eia

Usage Warnings#

The relationship between the IDs is sometimes many to many.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860__assn_yearly_boiler_emissions_control_equipment.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860__assn_yearly_boiler_emissions_control_equipment.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia860__assn_yearly_boiler_emissions_control_equipment.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860__assn_yearly_boiler_emissions_control_equipment.parquet"
)

Columns#

Field Name	Type	Description
boiler_id	string	Alphanumeric boiler ID.
data_maturity	string	Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.
emission_control_id_eia	string	The emission control ID used to collect SO2, NOx, particulate, and mercury emissions data. This column should be used in conjunction with emissions_control_type as it’s not guaranteed to be unique.
emission_control_id_type	string	The type of emissions control id: SO2, NOx, particulate, or mercury.
plant_id_eia	integer	The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.
report_date	date	Date reported.

core_eia860__scd_boilers#

Noticed any irregularities with this table? File a report.

Slowly changing dimension (SCD) table describing attributes of boilers which may vary from year to year. Compiled from across all EIA-860 data.

Most-recent data:: 2025
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA Form 860 – Annual Electric Generator Report
Primary key:: plant_id_eia, boiler_id, report_date

Usage Warnings#

Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.
Contains information from multiple raw inputs.

Additional Details#

This is one of two tables where canonical values for boilers are set. It contains values which are expected to vary slowly, while core_eia__entity_boilers contains those expected to remain fixed. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process. All tables downstream of this one inherit the canonical values established here.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860__scd_boilers.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860__scd_boilers.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia860__scd_boilers.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860__scd_boilers.parquet"
)

Columns#

Field Name	Type	Description
air_flow_100pct_load_cubic_feet_per_minute	number	Total air flow including excess air at 100 percent load, reported at standard temperature and pressure (i.e. 68 F and one atmosphere pressure).
boiler_fuel_code_1	string	The code representing the most predominant type of energy that fuels the boiler.
boiler_fuel_code_2	string	The code representing the second most predominant type of energy that fuels the boiler.
boiler_fuel_code_3	string	The code representing the third most predominant type of energy that fuels the boiler.
boiler_fuel_code_4	string	The code representing the fourth most predominant type of energy that fuels the boiler.
boiler_id	string	Alphanumeric boiler ID.
boiler_operating_date	date	Date the boiler began or is planned to begin commercial operation.
boiler_retirement_date	date	Date of the scheduled or effected retirement of the boiler.
boiler_status	string	EIA short code identifying boiler operational status.
boiler_type	string	EIA short code indicating the standards under which the boiler is operating as described in the U.S. EPA regulation under 40 CFR.
compliance_year_mercury	integer	Year boiler was or is expected to be in compliance with federal, state and/or local regulations for mercury emissions.
compliance_year_nox	integer	Year boiler was or is expected to be in compliance with federal, state and/or local regulations for nitrogen oxide emissions.
compliance_year_particulate	integer	Year boiler was or is expected to be in compliance with federal, state and/or local regulations for particulate matter emissions.
compliance_year_so2	integer	Year boiler was or is expected to be in compliance with federal, state and/or local regulations for sulfur dioxide emissions.
data_maturity	string	Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.
efficiency_100pct_load	number	Boiler efficiency percentage when burning at 100 percent load to the nearest 0.1 percent.
efficiency_50pct_load	number	Boiler efficiency percentage when burning at 50 percent load to the nearest 0.1 percent.
firing_rate_using_coal_tons_per_hour	number	Design firing rate at maximum continuous steam flow for coal to the nearest 0.1 ton per hour.
firing_rate_using_gas_mcf_per_hour	number	Design firing rate at maximum continuous steam flow for gas to the nearest 0.1 cubic feet per hour.
firing_rate_using_oil_bbls_per_hour	number	Design firing rate at maximum continuous steam flow for pet coke to the nearest 0.1 barrels per hour.
firing_rate_using_other_fuels	number	Design firing rate at maximum continuous steam flow for energy sources other than coal, petroleum, or natural gas.
firing_type_1	string	EIA short code indicating the type of firing used by this boiler.
firing_type_2	string	EIA short code indicating the type of firing used by this boiler.
firing_type_3	string	EIA short code indicating the type of firing used by this boiler.
fly_ash_reinjection	boolean	Indicates whether the boiler is capable of re-injecting fly ash.
hrsg	boolean	indicates if the boiler is a heat recovery steam generator (HRSG).
max_steam_flow_lbs_per_hour	number	Maximum continuous steam flow at 100 percent load.
mercury_control_existing_strategy_1	string	Existing strategy to comply with the most stringent mercury regulation.
mercury_control_existing_strategy_2	string	Existing strategy to comply with the most stringent mercury regulation.
mercury_control_existing_strategy_3	string	Existing strategy to comply with the most stringent mercury regulation.
mercury_control_existing_strategy_4	string	Existing strategy to comply with the most stringent mercury regulation.
mercury_control_existing_strategy_5	string	Existing strategy to comply with the most stringent mercury regulation.
mercury_control_existing_strategy_6	string	Existing strategy to comply with the most stringent mercury regulation.
mercury_control_proposed_strategy_1	string	Proposed strategy to comply with the most stringent mercury regulation.
mercury_control_proposed_strategy_2	string	Proposed strategy to comply with the most stringent mercury regulation.
mercury_control_proposed_strategy_3	string	Proposed strategy to comply with the most stringent mercury regulation.
new_source_review	boolean	Indicates whether the boiler is subject to New Source Review requirements.
new_source_review_date	date	Month of issued New Source Review permit.
new_source_review_permit	string	New Source Review permit number.
nox_control_existing_caaa_compliance_strategy_1	string	Existing strategies to meet the nitrogen oxide requirements of Title IV of the Clean Air Act Amendment of 1990.
nox_control_existing_caaa_compliance_strategy_2	string	Existing strategies to meet the nitrogen oxide requirements of Title IV of the Clean Air Act Amendment of 1990.
nox_control_existing_caaa_compliance_strategy_3	string	Existing strategies to meet the nitrogen oxide requirements of Title IV of the Clean Air Act Amendment of 1990.
nox_control_existing_strategy_1	string	Existing strategy to comply with the most stringent nitrogen oxide regulation.
nox_control_existing_strategy_2	string	Existing strategy to comply with the most stringent nitrogen oxide regulation.
nox_control_existing_strategy_3	string	Existing strategy to comply with the most stringent nitrogen oxide regulation.
nox_control_manufacturer	string	Name of nitrogen oxide control manufacturer.
nox_control_manufacturer_code	string	Code indicating the nitrogen oxide control burner manufacturer.
nox_control_out_of_compliance_strategy_1	string	If boiler is not in compliance with nitrogen oxide regulations, strategy for compliance.
nox_control_out_of_compliance_strategy_2	string	If boiler is not in compliance with nitrogen oxide regulations, strategy for compliance.
nox_control_out_of_compliance_strategy_3	string	If boiler is not in compliance with nitrogen oxide regulations, strategy for compliance.
nox_control_planned_caaa_compliance_strategy_1	string	Planned strategies to meet the nitrogen oxide requirements of Title IV of the Clean Air Act Amendment of 1990.
nox_control_planned_caaa_compliance_strategy_2	string	Planned strategies to meet the nitrogen oxide requirements of Title IV of the Clean Air Act Amendment of 1990.
nox_control_planned_caaa_compliance_strategy_3	string	Planned strategies to meet the nitrogen oxide requirements of Title IV of the Clean Air Act Amendment of 1990.
nox_control_proposed_strategy_1	string	Proposed strategy to comply with the most stringent nitrogen oxide regulation.
nox_control_proposed_strategy_2	string	Proposed strategy to comply with the most stringent nitrogen oxide regulation.
nox_control_proposed_strategy_3	string	Proposed strategy to comply with the most stringent nitrogen oxide regulation.
nox_control_status_code	string	Nitrogen oxide control status code.
particulate_control_out_of_compliance_strategy_1	string	If boiler is not in compliance with particulate matter regulations, strategy for compliance.
particulate_control_out_of_compliance_strategy_2	string	If boiler is not in compliance with particulate matter regulations, strategy for compliance.
particulate_control_out_of_compliance_strategy_3	string	If boiler is not in compliance with particulate matter regulations, strategy for compliance.
plant_id_eia	integer	The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.
regulation_mercury	string	Most stringent type of statute or regulation code under which the boiler is operating for mercury control standards.
regulation_nox	string	EIA short code for most stringent type of statute or regulation code under which the boiler is operating for nitrogen oxide control standards.
regulation_particulate	string	EIA short code for most stringent type of statute or regulation code under which the boiler is operating for particulate matter control standards.
regulation_so2	string	EIA short code for most stringent type of statute or regulation code under which the boiler is operating for sulfur dioxide control standards.
report_date	date	Date reported.
so2_control_existing_caaa_compliance_strategy_1	string	Existing strategies to meet the sulfur dioxide requirements of Title IV of the Clean Air Act Amendment of 1990.
so2_control_existing_caaa_compliance_strategy_2	string	Existing strategies to meet the sulfur dioxide requirements of Title IV of the Clean Air Act Amendment of 1990.
so2_control_existing_caaa_compliance_strategy_3	string	Existing strategies to meet the sulfur dioxide requirements of Title IV of the Clean Air Act Amendment of 1990.
so2_control_existing_strategy_1	string	Existing strategy to comply with the most stringent sulfur dioxide regulation.
so2_control_existing_strategy_2	string	Existing strategy to comply with the most stringent sulfur dioxide regulation.
so2_control_existing_strategy_3	string	Existing strategy to comply with the most stringent sulfur dioxide regulation.
so2_control_out_of_compliance_strategy_1	string	If boiler is not in compliance with sulfur dioxide regulations, strategy for compliance.
so2_control_out_of_compliance_strategy_2	string	If boiler is not in compliance with sulfur dioxide regulations, strategy for compliance.
so2_control_out_of_compliance_strategy_3	string	If boiler is not in compliance with sulfur dioxide regulations, strategy for compliance.
so2_control_planned_caaa_compliance_strategy_1	string	Planned strategies to meet the sulfur dioxide requirements of Title IV of the Clean Air Act Amendment of 1990.
so2_control_planned_caaa_compliance_strategy_2	string	Planned strategies to meet the sulfur dioxide requirements of Title IV of the Clean Air Act Amendment of 1990.
so2_control_planned_caaa_compliance_strategy_3	string	Planned strategies to meet the sulfur dioxide requirements of Title IV of the Clean Air Act Amendment of 1990.
so2_control_proposed_strategy_1	string	Proposed strategy to comply with the most stringent sulfur dioxide regulation.
so2_control_proposed_strategy_2	string	Proposed strategy to comply with the most stringent sulfur dioxide regulation.
so2_control_proposed_strategy_3	string	Proposed strategy to comply with the most stringent sulfur dioxide regulation.
standard_nox_rate	number	Numeric value for the unit of measurement specified for nitrogen oxide.
standard_particulate_rate	number	Numeric value for the unit of measurement specified for particulate matter.
standard_so2_percent_scrubbed	number	The percent of sulfur dioxide to be scrubbed specified by the most stringent sulfur dioxide regulation.
standard_so2_rate	number	Numeric value for the unit of measurement specified for sulfur dioxide.
turndown_ratio	number	The turndown ratio for the boiler.
unit_nox	string	Numeric value for the unit of measurement specified for nitrogen oxide.
unit_particulate	string	Numeric value for the unit of measurement specified for particulate matter.
unit_so2	string	Numeric value for the unit of measurement specified for sulfur dioxide.
waste_heat_input_mmbtu_per_hour	number	Design waste-heat input rate at maximum continuous steam flow where a waste-heat boiler is a boiler that receives all or a substantial portion of its energy input from the noncumbustible exhaust gases of a separate fuel-burning process (MMBTU per hour).
wet_dry_bottom	string	Wet or Dry Bottom where Wet Bottom is defined as slag tanks that are installed at furnace throat to contain and remove molten ash from the furnace, and Dry Bottom is defined as having no slag tanks at furnace throat area, throat area is clear, and bottom ash drops through throat to bottom ash water hoppers.

core_eia860__scd_emissions_control_equipment#

Noticed any irregularities with this table? File a report.

Slowly changing dimension (SCD) table describing attributes of emissions control equipment reported to EIA, including cost, type, operating status, retirement date, and install year.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA Form 860 – Annual Electric Generator Report
Primary key:: report_date, plant_id_eia, emission_control_id_pudl

Usage Warnings#

Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.

Additional Details#

Includes control ids for sulfur dioxide (SO2), particulate matter, mercury, nitrogen oxide (NOX), and acid (HCl) gas monitoring.

This table inherits canonicalized values for plants and utilities. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860__scd_emissions_control_equipment.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860__scd_emissions_control_equipment.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia860__scd_emissions_control_equipment.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860__scd_emissions_control_equipment.parquet"
)

Columns#

Field Name	Type	Description
acid_gas_control	boolean	Indicates whether the emissions control equipment controls acid (HCl) gas.
data_maturity	string	Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.
emission_control_equipment_cost	number	The total cost to install a piece of emission control equipment.
emission_control_equipment_type_code	string	Short code indicating the type of emission control equipment installed.
emission_control_id_pudl	number	A PUDL-generated ID used to distinguish emission control units in the same report year and plant id. This ID should not be used to track units over time or between plants.
emission_control_operating_date	date	The date a piece of emissions control equipment began operating. Derived from month and year columns in the raw data.
emission_control_retirement_date	date	The expected or actual retirement date for a piece of emissions control equipment. Derived from month and year columns in the raw data.
mercury_control_id_eia	string	Mercury control identification number. This ID is not a unique identifier.
nox_control_id_eia	string	Nitrogen oxide control identification number. This ID is not a unique identifier.
operational_status_code	string	The operating status of the asset.
particulate_control_id_eia	string	Particulate matter control identification number. This ID is not a unique identifier.
plant_id_eia	integer	The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.
report_date	date	Date reported.
so2_control_id_eia	string	Sulfur dioxide control identification number. This ID is not a unique identifier.

core_eia860__scd_generators#

Noticed any irregularities with this table? File a report.

Slowly changing dimension (SCD) table describing attributes of generators which may vary from year to year. Compiled from across EIA-860 and EIA-923 data.

Most-recent data:: 2026
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA Form 860 – Annual Electric Generator Report
Primary key:: plant_id_eia, generator_id, report_date

Usage Warnings#

Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.
Contains information from multiple raw inputs.

Additional Details#

This is one of two tables where canonical values for generators are set. It contains values which are expected to vary slowly, while core_eia__entity_generators contains those expected to remain fixed. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process. All tables downstream of this one inherit the canonical values established here.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860__scd_generators.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860__scd_generators.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia860__scd_generators.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860__scd_generators.parquet"
)

Columns#

Field Name	Type	Description
can_burn_multiple_fuels	boolean	Whether the generator can burn multiple fuels.
can_cofire_fuels	boolean	Whether the generator can co-fire fuels.
can_switch_oil_gas	boolean	Whether the generator can switch between oil and natural gas.
capacity_mw	number	Total installed (nameplate) capacity, in megawatts.
carbon_capture	boolean	Indicates whether the generator uses carbon capture technology.
current_planned_generator_operating_date	date	The most recently updated effective date on which the generator is scheduled to start operation
data_maturity	string	Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.
deliver_power_transgrid	boolean	Indicate whether the generator can deliver power to the transmission grid.
distributed_generation	boolean	Whether the generator is considered distributed generation
energy_source_1_transport_1	string	Primary mode of transport for energy source 1.
energy_source_1_transport_2	string	Secondary mode of transport for energy source 1.
energy_source_1_transport_3	string	Tertiary mode of transport for energy source 1.
energy_source_2_transport_1	string	Primary mode of transport for energy source 2.
energy_source_2_transport_2	string	Secondary mode of transport for energy source 2.
energy_source_2_transport_3	string	Tertiary mode of transport for energy source 2.
energy_source_code_1	string	The code representing the most predominant type of energy that fuels the generator.
energy_source_code_2	string	The code representing the second most predominant type of energy that fuels the generator
energy_source_code_3	string	The code representing the third most predominant type of energy that fuels the generator
energy_source_code_4	string	The code representing the fourth most predominant type of energy that fuels the generator
energy_source_code_5	string	The code representing the fifth most predominant type of energy that fuels the generator
energy_source_code_6	string	The code representing the sixth most predominant type of energy that fuels the generator
energy_storage_capacity_mwh	number	Energy storage capacity in MWh (e.g. for batteries).
ferc_qualifying_facility	boolean	Indicates whether or not a generator is a qualifying FERC cogeneration facility.
fuel_type_code_pudl	string	Simplified fuel type code used in PUDL
generator_id	string	Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string!
generator_retirement_date	date	Date of the scheduled or effected retirement of the generator.
minimum_load_mw	number	The minimum load at which the generator can operate at continuosuly.
nameplate_power_factor	number	The nameplate power factor of the generator.
net_capacity_mwdc	number	Generation capacity in megawatts of direct current that is subject to a net metering agreement. Typically used for behind-the-meter solar PV.
operational_status	string	The operating status of the asset. For generators this is based on which tab the generator was listed in in EIA 860.
operational_status_code	string	The operating status of the asset.
other_modifications_date	date	Planned effective date that the generator is scheduled to enter commercial operation after any other planned modification is complete.
other_planned_modifications	boolean	Indicates whether there are there other modifications planned for the generator.
owned_by_non_utility	boolean	Whether any part of generator is owned by a nonutilty
ownership_code	string	Identifies the ownership for each generator.
planned_derate_date	date	Planned effective month that the generator is scheduled to enter operation after the derate modification.
planned_energy_source_code_1	string	New energy source code for the planned repowered generator.
planned_generator_retirement_date	date	Planned effective date of the scheduled retirement of the generator.
planned_modifications	boolean	Indicates whether there are any planned capacity uprates/derates, repowering, other modifications, or generator retirements scheduled for the next 5 years.
planned_net_summer_capacity_derate_mw	number	Decrease in summer capacity expected to be realized from the derate modification to the equipment.
planned_net_summer_capacity_uprate_mw	number	Increase in summer capacity expected to be realized from the modification to the equipment.
planned_net_winter_capacity_derate_mw	number	Decrease in winter capacity expected to be realized from the derate modification to the equipment.
planned_net_winter_capacity_uprate_mw	number	Increase in winter capacity expected to be realized from the uprate modification to the equipment.
planned_new_capacity_mw	number	The expected new namplate capacity for the generator.
planned_new_prime_mover_code	string	New prime mover for the planned repowered generator.
planned_repower_date	date	Planned effective date that the generator is scheduled to enter operation after the repowering is complete.
planned_uprate_date	date	Planned effective date that the generator is scheduled to enter operation after the uprate modification.
plant_id_eia	integer	The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.
prime_mover_code	string	Code for the type of prime mover (e.g. CT, CG)
reactive_power_output_mvar	number	Reactive Power Output (MVAr)
report_date	date	Date reported.
startup_source_code_1	string	The code representing the first, second, third or fourth start-up and flame stabilization energy source used by the combustion unit(s) associated with this generator.
startup_source_code_2	string	The code representing the first, second, third or fourth start-up and flame stabilization energy source used by the combustion unit(s) associated with this generator.
startup_source_code_3	string	The code representing the first, second, third or fourth start-up and flame stabilization energy source used by the combustion unit(s) associated with this generator.
startup_source_code_4	string	The code representing the first, second, third or fourth start-up and flame stabilization energy source used by the combustion unit(s) associated with this generator.
summer_capacity_estimate	boolean	Whether the summer capacity value was an estimate
summer_capacity_mw	number	The net summer capacity.
summer_estimated_capability_mw	number	EIA estimated summer capacity (in MWh).
synchronized_transmission_grid	boolean	Indicates whether standby generators (SB status) can be synchronized to the grid.
technology_description	string	High level description of the technology used by the generator to produce electricity.
time_cold_shutdown_full_load_code	string	The minimum amount of time required to bring the unit to full load from shutdown.
turbines_inverters_hydrokinetics	integer	Number of wind turbines, or hydrokinetic buoys.
turbines_num	integer	Number of wind turbines, or hydrokinetic buoys.
uprate_derate_completed_date	date	The date when the uprate or derate was completed.
uprate_derate_during_year	boolean	Was an uprate or derate completed on this generator during the reporting year?
utility_id_eia	integer	The EIA Utility Identification number.
winter_capacity_estimate	boolean	Whether the winter capacity value was an estimate
winter_capacity_mw	number	The net winter capacity.
winter_estimated_capability_mw	number	EIA estimated winter capacity (in MWh).

core_eia860__scd_generators_energy_storage#

Noticed any irregularities with this table? File a report.

Slowly changing dimension (SCD) table describing attributes of energy storage which may vary from year to year.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA Form 860 – Annual Electric Generator Report (Schedule 3)
Primary key:: plant_id_eia, generator_id, report_date

Additional Details#

This table includes only those values that are unique to energy storage. The rest of the columns that are reported in the EIA-860 Energy Storage tabs are included in core_eia860__scd_generators and core_eia__entity_generators.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860__scd_generators_energy_storage.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860__scd_generators_energy_storage.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia860__scd_generators_energy_storage.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860__scd_generators_energy_storage.parquet"
)

Columns#

Field Name	Type	Description
generator_id	string	Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string!
generator_id_direct_support_1	string	The EIA Generator ID of the primary unit whose generation this energy storage device is intended to firm or store.
generator_id_direct_support_2	string	The EIA Generator ID of the secondary unit whose generation this energy storage device is intended to firm or store.
generator_id_direct_support_3	string	The EIA Generator ID of the tertiary unit whose generation this energy storage device is intended to firm or store.
is_ac_coupled	boolean	Indicates if this energy storage device is AC-coupled (means the energy storage device and the PV system are not installed on the same side of an inverter).
is_dc_coupled	boolean	Indicates if this energy storage device is DC-coupled (means the energy storage device and the PV system are on the same side of an inverter and the battery can still charge from the grid).
is_dc_coupled_tightly	boolean	Indicates if this energy storage device is DC tightly coupled (means the energy storage device and the PV system are on the same side of an inverter and the battery cannot charge from the grid).
is_direct_support	boolean	Indicates if this energy storage device is intended for dedicated generator firming or storing excess generation of other units.
is_independent	boolean	Indicates if this energy storage device is independent (not coupled with another generators)
is_transmission_and_distribution_asset_support	boolean	Indicate if the energy storage system is intended to support a specific substation, transmission or distribution asset.
max_charge_rate_mw	number	Maximum charge rate in MW.
max_discharge_rate_mw	number	Maximum discharge rate in MW.
plant_id_eia	integer	The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.
plant_id_eia_direct_support_1	integer	The EIA Plant ID of the primary unit whose generation this energy storage device is intended to firm or store.
plant_id_eia_direct_support_2	integer	The EIA Plant ID of the secondary unit whose generation this energy storage device is intended to firm or store.
plant_id_eia_direct_support_3	integer	The EIA Plant ID of the tertiary unit whose generation this energy storage device is intended to firm or store.
report_date	date	Date reported.
served_arbitrage	boolean	Whether the energy storage device served arbitrage applications during the reporting year
served_backup_power	boolean	Whether the energy storage device served backup power applications during the reporting year.
served_co_located_renewable_firming	boolean	Whether the energy storage device served renewable firming applications during the reporting year.
served_frequency_regulation	boolean	Whether the energy storage device served frequency regulation applications during the reporting year.
served_load_following	boolean	Whether the energy storage device served load following applications during the reporting year.
served_load_management	boolean	Whether the energy storage device served load management applications during the reporting year.
served_ramping_spinning_reserve	boolean	Whether the this energy storage device served ramping / spinning reserve applications during the reporting year.
served_system_peak_shaving	boolean	Whether the energy storage device served system peak shaving applications during the reporting year.
served_transmission_and_distribution_deferral	boolean	Whether the energy storage device served renewable firming applications during the reporting year.
served_voltage_or_reactive_power_support	boolean	Whether the energy storage device served voltage or reactive power support applications during the reporting year.
storage_enclosure_code	string	A code representing the enclosure type that best describes where the generator is located.
storage_technology_code_1	string	The electro-chemical storage technology used for this battery applications.
storage_technology_code_2	string	The electro-chemical storage technology used for this battery applications.
storage_technology_code_3	string	The electro-chemical storage technology used for this battery applications.
storage_technology_code_4	string	The electro-chemical storage technology used for this battery applications.
stored_excess_wind_and_solar_generation	boolean	Whether the energy storage device was used to store excess wind/solar generation during the reporting year.

core_eia860__scd_generators_multifuel#

Noticed any irregularities with this table? File a report.

Slowly changing dimension (SCD) table describing attributes of generators that may vary from year to year, pertaining to fuel-switching and the use of multiple fuels.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA Form 860 – Annual Electric Generator Report (Schedule 3)
Primary key:: report_date, utility_id_eia, generator_id, plant_id_eia

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860__scd_generators_multifuel.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860__scd_generators_multifuel.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia860__scd_generators_multifuel.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860__scd_generators_multifuel.parquet"
)

Columns#

Field Name	Type	Description
can_burn_multiple_fuels	boolean	Whether the generator can burn multiple fuels.
can_cofire_100_oil	boolean	Whether the generator can co-fire 100 oil.
can_cofire_fuels	boolean	Whether the generator can co-fire fuels.
can_cofire_oil_and_gas	boolean	Whether the generator can co-fire oil and gas.
can_fuel_switch	boolean	Whether a unit is able to switch fuels.
can_switch_oil_gas	boolean	Whether the generator can switch between oil and natural gas.
can_switch_when_operating	boolean	Indicates whether a fuel switching generator can switch fuels while operating.
capacity_mw	number	Total installed (nameplate) capacity, in megawatts.
cofire_energy_source_1	string	The codes representing the type of fuel that will be co-fired.
cofire_energy_source_2	string	The codes representing the type of fuel that will be co-fired.
cofire_energy_source_3	string	The codes representing the type of fuel that will be co-fired.
cofire_energy_source_4	string	The codes representing the type of fuel that will be co-fired.
cofire_energy_source_5	string	The codes representing the type of fuel that will be co-fired.
cofire_energy_source_6	string	The codes representing the type of fuel that will be co-fired.
county	string	County name.
current_planned_generator_operating_date	date	The most recently updated effective date on which the generator is scheduled to start operation
energy_source_code_1	string	The code representing the most predominant type of energy that fuels the generator.
energy_source_code_2	string	The code representing the second most predominant type of energy that fuels the generator
fuel_switch_energy_source_1	string	The codes representing the type of fuel that will be able to be used as a sole source of fuel for this unit.
fuel_switch_energy_source_2	string	The codes representing the type of fuel that will be able to be used as a sole source of fuel for this unit.
fuel_switch_energy_source_3	string	The codes representing the type of fuel that will be able to be used as a sole source of fuel for this unit.
fuel_switch_energy_source_4	string	The codes representing the type of fuel that will be able to be used as a sole source of fuel for this unit.
fuel_switch_energy_source_5	string	The codes representing the type of fuel that will be able to be used as a sole source of fuel for this unit.
fuel_switch_energy_source_6	string	The codes representing the type of fuel that will be able to be used as a sole source of fuel for this unit.
generator_id	string	Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string!
has_air_permit_limits	boolean	Whether air permit limits are a factor that limits the generator’s ability to switch between oil and natural gas.
has_factors_that_limit_switching	boolean	Whether there are factors that limit the generator’s ability to switch between oil and natural gas.
has_other_factors_that_limit_switching	boolean	Whether there are factors other than air permit limits and storage that limit the generator’s ability to switch between oil and natural gas.
has_regulatory_limits	boolean	Whether there are factors that limit the operation of the generator when running on 100 percent oil
has_storage_limits	boolean	Whether limited on-site fuel storage is a factor that limits the generator’s ability to switch between oil and natural gas.
max_oil_heat_input	number	The maximum oil heat input (percent of MMBTUs) expected for proposed unit when co-firing with natural gas
max_oil_output_mw	number	The maximum output (net MW) expected for proposed unit, when making the maximum use of oil and co-firing natural gas.
net_summer_capacity_natural_gas_mw	number	The maximum net summer output achievable when running on natural gas.
net_summer_capacity_oil_mw	number	The maximum net summer output achievable when running on oil.
net_winter_capacity_natural_gas_mw	number	The maximum net winter output achievable when running on natural gas.
net_winter_capacity_oil_mw	number	The maximum net summer output achievable when running on oil.
operational_status_code	string	The operating status of the asset.
plant_id_eia	integer	The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.
plant_name_eia	string	Plant name.
prime_mover_code	string	Code for the type of prime mover (e.g. CT, CG)
report_date	date	Date reported.
sector_id_eia	integer	EIA assigned sector ID, corresponding to high level NAICS sector, designated by the primary purpose, regulatory status and plant-level combined heat and power status
sector_name_eia	string	EIA assigned sector name, corresponding to high level NAICS sector, designated by the primary purpose, regulatory status and plant-level combined heat and power status
state	string	Two letter US state abbreviation.
summer_capacity_mw	number	The net summer capacity.
technology_description	string	High level description of the technology used by the generator to produce electricity.
time_to_switch_gas_to_oil	string	The time required to switch the generator from running 100 percent natural gas to running 100 percent oil.
time_to_switch_oil_to_gas	string	The time required to switch the generator from running 100 percent oil to running 100 percent natural gas.
utility_id_eia	integer	The EIA Utility Identification number.
utility_name_eia	string	The name of the utility.
winter_capacity_mw	number	The net winter capacity.

core_eia860__scd_generators_solar#

Noticed any irregularities with this table? File a report.

Slowly changing dimension (SCD) table describing attributes of solar generators that may vary from year to year.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA Form 860 – Annual Electric Generator Report (Schedule 3.3)
Primary key:: plant_id_eia, generator_id, report_date

Additional Details#

This table includes only those values that are unique to solar generators. The rest of the columns that are reported in the EIA-860 Solar tabs are included in core_eia860__scd_generators and core_eia__entity_generators.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860__scd_generators_solar.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860__scd_generators_solar.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia860__scd_generators_solar.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860__scd_generators_solar.parquet"
)

Columns#

Field Name	Type	Description
azimuth_angle_deg	number	Indicates the azimuth angle of the unit for fixed tilt or single-axis technologies.
generator_id	string	Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string!
net_metering_capacity_mwdc	number	The DC megawatt capacity that is part of a net metering agreement.
plant_id_eia	integer	The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.
report_date	date	Date reported.
standard_testing_conditions_capacity_mwdc	number	The net capacity of this photovoltaic generator in direct current under standard test conditions (STC) of 1000 W/m^2 solar irradiance and 25 degrees Celsius PV module temperature. This was only reported in 2013 and 2014.
tilt_angle_deg	number	Indicates the tilt angle of the unit for fixed tilt or single-axis technologies.
uses_bifacial_panels	boolean	Indicates whether bifacial solar panels are used at this solar generating unit.
uses_material_crystalline_silicon	boolean	Indicates whether any solar photovoltaic panels at this generator are made of crystalline silicon.
uses_material_other	boolean	Indicates whether any solar photovoltaic panels at this generator are made of other materials.
uses_material_thin_film_a_si	boolean	Indicates whether any solar photovoltaic panels at this generator are made of thin-film amorphous silicon (A-Si).
uses_material_thin_film_cdte	boolean	Indicates whether any solar photovoltaic panels at this generator are made of thin-film cadmium telluride (CdTe).
uses_material_thin_film_cigs	boolean	Indicates whether any solar photovoltaic panels at this generator are made of thin-film copper indium gallium diselenide (CIGS).
uses_material_thin_film_other	boolean	Indicates whether any solar photovoltaic panels at this generator are made of other thin-film material.
uses_net_metering_agreement	boolean	Indicates if the output from this generator is part of a net metering agreement.
uses_technology_dish_engine	boolean	Indicates whether dish engines are used at this solar generating unit.
uses_technology_dual_axis_tracking	boolean	Indicates whether dual-axis tracking technologies are used at this solar generating unit.
uses_technology_east_west_fixed_tilt	boolean	Indicates whether east west fixed tilt technologies are used at this solar generating unit.
uses_technology_fixed_tilt	boolean	Indicates whether fixed tilt technologies are used at this solar generating unit.
uses_technology_lenses_mirrors	boolean	Indicates whether lenses or mirrors are used at this solar generating unit.
uses_technology_linear_fresnel	boolean	Indicates whether linear fresnel technologies are used at this solar generating unit.
uses_technology_other	boolean	Indicates whether other solar technologies are used at this solar generating unit.
uses_technology_parabolic_trough	boolean	Indicates whether parabolic trough technologies s are used at this solar generating unit.
uses_technology_power_tower	boolean	Indicates whether power towers are used at this solar generating unit.
uses_technology_single_axis_tracking	boolean	Indicates whether single-axis tracking technologies are used at this solar generating unit.
uses_virtual_net_metering_agreement	boolean	Indicates if the output from this generator is part of a virtual net metering agreement.
virtual_net_metering_capacity_mwdc	number	The DC capacity in MW that is part of a virtual net metering agreement.

core_eia860__scd_generators_wind#

Noticed any irregularities with this table? File a report.

Slowly changing dimension (SCD) table describing attributes of wind generators that may vary from year to year.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA Form 860 – Annual Electric Generator Report (Schedule 3.2)
Primary key:: plant_id_eia, generator_id, report_date

Additional Details#

This table includes only those values that are unique to wind generators. The rest of the columns that are reported in the EIA-860 Wind tabs are included in core_eia860__scd_generators and core_eia__entity_generators.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860__scd_generators_wind.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860__scd_generators_wind.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia860__scd_generators_wind.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860__scd_generators_wind.parquet"
)

Columns#

Field Name	Type	Description
design_wind_speed_mph	number	Average annual wind speed that turbines at this wind site were designed for.
generator_id	string	Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string!
obstacle_id_faa	string	The Federal Aviation Administration (FAA) obstacle number assigned to this generator. If more than one obstacle number exists, the one that best represents the turbines. References the obstacle numbers reported in the FAA’s Digital Obstacle File: https://www.faa.gov/air_traffic/flight_info/aeronav/digital_products/dof/ This field was only reported from 2013 through 2015.
plant_id_eia	integer	The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.
predominant_turbine_manufacturer	string	Name of predominant manufacturer of turbines at this generator.
predominant_turbine_model	string	Predominant model number of turbines at this generator.
report_date	date	Date reported.
turbine_hub_height_feet	number	The hub height of turbines at this generator. If more than one value exists, the one that best represents the turbines.
wind_quality_class	integer	The wind quality class for turbines at this generator. See table core_eia__codes_wind_quality_class for specifications about each class.

core_eia860__scd_ownership#

Noticed any irregularities with this table? File a report.

Slowly changing dimension (SCD) table describing attributes of generator ownership. Includes only jointly or third-party owned generators.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA Form 860 – Annual Electric Generator Report (Schedule 4)
Primary key:: report_date, plant_id_eia, generator_id, owner_utility_id_eia

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860__scd_ownership.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860__scd_ownership.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia860__scd_ownership.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860__scd_ownership.parquet"
)

Columns#

Field Name	Type	Description
data_maturity	string	Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.
fraction_owned	number	Proportion of generator ownership attributable to this utility.
generator_id	string	Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string!
operator_utility_id_eia	integer	The EIA utility Identification number for the operator utility.
owner_city	string	City of owner.
owner_country	string	Three letter ISO-3166 country code.
owner_state	string	Two letter ISO-3166 political subdivision code.
owner_street_address	string	Steet address of owner.
owner_utility_id_eia	integer	The EIA utility Identification number for the owner company that is responsible for the day-to-day operations of the generator, not the operator utility.
owner_utility_name_eia	string	The name of the EIA owner utility.
owner_zip_code	string	Zip code of owner.
plant_id_eia	integer	The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.
report_date	date	Date reported.

core_eia860__scd_plants#

Noticed any irregularities with this table? File a report.

Slowly changing dimension (SCD) table describing attributes of plants which may vary from year to year. Compiled from across all EIA-860 and EIA-923 data.

Most-recent data:: 2026
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA Form 860 – Annual Electric Generator Report
Primary key:: plant_id_eia, report_date

Usage Warnings#

Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.
Contains information from multiple raw inputs.

Additional Details#

This is one of two tables where canonical values for plants are set. It contains values which are expected to vary slowly, while core_eia__entity_plants contains those expected to remain fixed. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process. All tables downstream of this one inherit the canonical values established here.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860__scd_plants.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860__scd_plants.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia860__scd_plants.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860__scd_plants.parquet"
)

Columns#

Field Name	Type	Description
ash_impoundment	boolean	Is there an ash impoundment (e.g. pond, reservoir) at the plant?
ash_impoundment_lined	boolean	If there is an ash impoundment at the plant, is the impoundment lined?
ash_impoundment_status	string	If there is an ash impoundment at the plant, the ash impoundment status as of December 31 of the reporting year.
balancing_authority_code_eia	string	EIA short code identifying a balancing authority. May include Canadian and Mexican BAs.
balancing_authority_name_eia	string	Name of the balancing authority.
data_maturity	string	Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.
datum	string	Geodetic coordinate system identifier (e.g. NAD27, NAD83, or WGS84).
energy_storage	boolean	Indicates if the facility has energy storage capabilities.
ferc_cogen_docket_no	string	The docket number relating to the FERC cogenerator status. See FERC Form 556.
ferc_cogen_status	boolean	Indicates whether the plant has FERC qualifying facility cogenerator status. See FERC Form 556.
ferc_exempt_wholesale_generator	boolean	Indicates whether the plant has FERC qualifying facility exempt wholesale generator status
ferc_exempt_wholesale_generator_docket_no	string	The docket number relating to the FERC qualifying facility exempt wholesale generator status.
ferc_qualifying_facility_docket_no	string	The docket number relating to the FERC qualifying facility cogenerator status. See FERC Form 556.
ferc_small_power_producer	boolean	Indicates whether the plant has FERC qualifying facility small power producer status. See FERC Form 556.
ferc_small_power_producer_docket_no	string	The docket number relating to the FERC qualifying facility small power producer status. See FERC Form 556.
grid_voltage_1_kv	number	Plant’s grid voltage at point of interconnection to transmission or distribution facilities
grid_voltage_2_kv	number	Plant’s grid voltage at point of interconnection to transmission or distribution facilities
grid_voltage_3_kv	number	Plant’s grid voltage at point of interconnection to transmission or distribution facilities
has_net_metering	boolean	Whether the plant has a net metering agreement in effect during the reporting year. (Only displayed for facilities that report the sun or wind as an energy source). This field was only reported up until 2015
iso_rto_code	string	The code of the plant’s ISO or RTO. NA if not reported in that year.
liquefied_natural_gas_storage	boolean	Indicates if the facility have the capability to store the natural gas in the form of liquefied natural gas.
natural_gas_local_distribution_company	string	Names of Local Distribution Company (LDC), connected to natural gas burning power plants.
natural_gas_pipeline_name_1	string	The name of the owner or operator of natural gas pipeline that connects directly to this facility or that connects to a lateral pipeline owned by this facility.
natural_gas_pipeline_name_2	string	The name of the owner or operator of natural gas pipeline that connects directly to this facility or that connects to a lateral pipeline owned by this facility.
natural_gas_pipeline_name_3	string	The name of the owner or operator of natural gas pipeline that connects directly to this facility or that connects to a lateral pipeline owned by this facility.
natural_gas_storage	boolean	Indicates if the facility have on-site storage of natural gas.
nerc_region	string	NERC region in which the plant is located
pipeline_notes	string	Additional owner or operator of natural gas pipeline.
plant_id_eia	integer	The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.
primary_purpose_id_naics	integer	North American Industry Classification System (NAICS) code that best describes the primary purpose of the reporting plant
regulatory_status_code	string	Indicates whether the plant is regulated or non-regulated.
report_date	date	Date reported.
reporting_frequency_code	string	Code that specifies what time period data has to be reported (i.e. monthly data or annual totals) and how often the power plant reports this data to EIA. See reporting_frequencies_eia for more details.
sector_id_eia	integer	EIA assigned sector ID, corresponding to high level NAICS sector, designated by the primary purpose, regulatory status and plant-level combined heat and power status
sector_name_eia	string	EIA assigned sector name, corresponding to high level NAICS sector, designated by the primary purpose, regulatory status and plant-level combined heat and power status
service_area	string	Service area in which plant is located; for unregulated companies, it’s the electric utility with which plant is interconnected
transmission_distribution_owner_id	integer	EIA-assigned code for owner of transmission/distribution system to which the plant is interconnected.
transmission_distribution_owner_name	string	Name of the owner of the transmission or distribution system to which the plant is interconnected.
transmission_distribution_owner_state	string	State location for owner of transmission/distribution system to which the plant is interconnected.
utility_id_eia	integer	The EIA Utility Identification number.
water_source	string	Name of water source associated with the plant.

core_eia860__scd_utilities#

Noticed any irregularities with this table? File a report.

Slowly changing dimension (SCD) table describing attributes of utilities which may vary from year to year. Compiled from all EIA data.

Most-recent data:: 2026
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA Form 860 – Annual Electric Generator Report
Primary key:: utility_id_eia, report_date

Usage Warnings#

Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.
Contains information from multiple raw inputs.

Additional Details#

This is one of two tables where canonical values for utilities are set. It contains values which are expected to vary slowly, while core_eia__entity_utilities contains those expected to remain fixed. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process. All tables downstream of this one inherit the canonical values established here.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860__scd_utilities.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860__scd_utilities.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia860__scd_utilities.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860__scd_utilities.parquet"
)

Columns#

Field Name	Type	Description
address_2	string	Second line of the address.
attention_line	string	Mail attention name of the operator/owner.
city	string	Name of the city.
contact_firstname	string	First name of utility contact 1.
contact_firstname_2	string	First name of utility contact 2.
contact_lastname	string	Last name of utility contact 1.
contact_lastname_2	string	Last name of utility contact 2.
contact_title	string	Title of of utility contact 1.
contact_title_2	string	Title of utility contact 2.
data_maturity	string	Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.
entity_type	string	Entity type of principal owner.
phone_extension	string	Phone extension for utility contact 1
phone_extension_2	string	Phone extension for utility contact 2
phone_number	string	Phone number for utility contact 1.
phone_number_2	string	Phone number for utility contact 2.
plants_reported_asset_manager	boolean	Is the reporting entity an asset manager of power plants reported on Schedule 2 of the form?
plants_reported_operator	boolean	Is the reporting entity an operator of power plants reported on Schedule 2 of the form?
plants_reported_other_relationship	boolean	Does the reporting entity have any other relationship to the power plants reported on Schedule 2 of the form?
plants_reported_owner	boolean	Is the reporting entity an owner of power plants reported on Schedule 2 of the form?
report_date	date	Date reported.
state	string	Two letter US state abbreviation.
street_address	string	Physical street address.
utility_id_eia	integer	The EIA Utility Identification number.
zip_code	string	Five digit US Zip Code.
zip_code_4	string	Four digit US Zip Code suffix.

core_eia860m__changelog_generators#

Noticed any irregularities with this table? File a report.

Changelog table tracking changes in generator characteristics reported in EIA-860M.

Most-recent data:: 2026
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA Form 860M – Monthly Update to the Annual Electric Generator Report
Primary key:: plant_id_eia, generator_id, report_date

Additional Details#

EIA-860M includes generator tables with the most up-to-date catalog of EIA generators and their operational status and other generator characteristics. EIA-860M is reported monthly, although for the vast majority of the generators nothing changes month-to-month. This table is a changelog of that monthly reported generator data. There is a record corresponding to the first instance of a generator and associated characteristics with a report_date column and a valid_until_date column. Whenever any of the reported EIA-860M data was changed for a record, there will be a new changelog record with a new report_date.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860m__changelog_generators.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860m__changelog_generators.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia860m__changelog_generators.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860m__changelog_generators.parquet"
)

Columns#

Field Name	Type	Description
balancing_authority_code_eia	string	EIA short code identifying a balancing authority. May include Canadian and Mexican BAs.
capacity_mw	number	Total installed (nameplate) capacity, in megawatts.
county	string	County name.
current_planned_generator_operating_date	date	The most recently updated effective date on which the generator is scheduled to start operation
data_maturity	string	Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.
energy_source_code_1	string	The code representing the most predominant type of energy that fuels the generator.
energy_storage_capacity_mwh	number	Energy storage capacity in MWh (e.g. for batteries).
fuel_type_code_pudl	string	Simplified fuel type code used in PUDL
generator_id	string	Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string!
generator_operating_date	date	Date the generator began commercial operation. If harvested values are inconsistent, we default to using the most recently reported date.
generator_retirement_date	date	Date of the scheduled or effected retirement of the generator.
latitude	number	Latitude of the plant’s location, in degrees.
longitude	number	Longitude of the plant’s location, in degrees.
net_capacity_mwdc	number	Generation capacity in megawatts of direct current that is subject to a net metering agreement. Typically used for behind-the-meter solar PV.
operational_status	string	The operating status of the asset. For generators this is based on which tab the generator was listed in in EIA 860.
operational_status_code	string	The operating status of the asset.
planned_derate_date	date	Planned effective month that the generator is scheduled to enter operation after the derate modification.
planned_generator_retirement_date	date	Planned effective date of the scheduled retirement of the generator.
planned_net_summer_capacity_derate_mw	number	Decrease in summer capacity expected to be realized from the derate modification to the equipment.
planned_net_summer_capacity_uprate_mw	number	Increase in summer capacity expected to be realized from the modification to the equipment.
planned_repower_date	date	Planned effective date that the generator is scheduled to enter operation after the repowering is complete.
planned_uprate_date	date	Planned effective date that the generator is scheduled to enter operation after the uprate modification.
plant_id_eia	integer	The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.
plant_name_eia	string	Plant name.
prime_mover_code	string	Code for the type of prime mover (e.g. CT, CG)
report_date	date	Date reported.
sector_id_eia	integer	EIA assigned sector ID, corresponding to high level NAICS sector, designated by the primary purpose, regulatory status and plant-level combined heat and power status
state	string	Two letter US state abbreviation.
summer_capacity_mw	number	The net summer capacity.
technology_description	string	High level description of the technology used by the generator to produce electricity.
utility_id_eia	integer	The EIA Utility Identification number.
utility_name_eia	string	The name of the utility.
valid_until_date	date	The record in the changelog is valid until this date. The record is valid from the report_date up until but not including the valid_until_date.
winter_capacity_mw	number	The net winter capacity.

core_eia861__assn_balancing_authority#

Noticed any irregularities with this table? File a report.

Association table providing connections between state, balancing authority, and utility in a given year.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA Form 861 – Annual Electric Power Industry Report
Primary key:: report_date, balancing_authority_id_eia, utility_id_eia, state

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__assn_balancing_authority.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__assn_balancing_authority.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__assn_balancing_authority.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__assn_balancing_authority.parquet"
)

Columns#

Field Name	Type	Description
balancing_authority_id_eia	integer	EIA balancing authority ID. This is often (but not always!) the same as the utility ID associated with the same legal entity.
report_date	date	Date reported.
state	string	Two letter US state abbreviation.
utility_id_eia	integer	The EIA Utility Identification number.

core_eia861__assn_utility#

Noticed any irregularities with this table? File a report.

Association table providing connections between utility and state in a given year.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA Form 861 – Annual Electric Power Industry Report
Primary key:: report_date, utility_id_eia, state

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__assn_utility.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__assn_utility.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__assn_utility.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__assn_utility.parquet"
)

Columns#

Field Name	Type	Description
report_date	date	Date reported.
state	string	Two letter US state abbreviation.
utility_id_eia	integer	The EIA Utility Identification number.

core_eia861__yearly_advanced_metering_infrastructure#

Noticed any irregularities with this table? File a report.

Annual time series of advanced metering infrastructure (AMI) and automated meter reading (AMR) by state, sector, and balancing authority.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA Form 861 – Annual Electric Power Industry Report
Primary key:: balancing_authority_code_eia, customer_class, report_date, state, utility_id_eia

Additional Details#

The energy served (in MWH) for AMI systems is provided. Form EIA-861 respondents also report the number of standard meters (non AMR/AMI) in their system. Historical Changes: We started collecting the number of standard meters in 2013. The monthly survey collected these data from January 2011 to January 2017.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_advanced_metering_infrastructure.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_advanced_metering_infrastructure.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__yearly_advanced_metering_infrastructure.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_advanced_metering_infrastructure.parquet"
)

Columns#

Field Name	Type	Description
advanced_metering_infrastructure	integer	Number of meters that measure and record usage data at a minimum, in hourly intervals and provide usage data at least daily to energy companies and may also provide data to consumers. Data are used for billing and other purposes. Advanced meters include basic hourly interval meters and extend to real-time meters with built-in two-way communication capable of recording and transmitting instantaneous data.
automated_meter_reading	integer	Number of meters that collect data for billing purposes only and transmit this data one way, usually from the customer to the distribution utility.
balancing_authority_code_eia	string	EIA short code identifying a balancing authority. May include Canadian and Mexican BAs.
customer_class	string	High level categorization of customer type (e.g., commercial, residential).
daily_digital_access_customers	integer	Number of customers able to access daily energy usage through a webportal or other electronic means.
data_maturity	string	Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.
direct_load_control_customers	integer	Number of customers with direct load control: a A demand response activity by which the program sponsor remotely shuts down or cycles a customer’s electrical equipment (e.g. air conditioner, water heater) on short notice.
energy_served_ami_mwh	number	Amount of energy served through AMI meters. AMI meters can transmit data in both directions, between the delivery entity and the customer.
entity_type	string	Entity type of principal owner.
home_area_network	integer	Number of AMI meters with home area network (HAN) gateway enabled.
non_amr_ami	integer	Number of non-AMR/AMI meters. Usually electromechanical or solid state meters measuring aggregated kWh where data are manually retrieved over monthly billing cycles for billing purposes only. Standard meters may also include functions to measure time-of-use and/or demand with data manually retrieved over monthly billing cycles.
report_date	date	Date reported.
short_form	boolean	Whether the reported information comes from the short form. In the case of form EIA 861, a shorter version of the form was created in 2012 to reduce respondent burden on smaller utilities and increase our processing efficiency.
state	string	Two letter US state abbreviation.
utility_id_eia	integer	The EIA Utility Identification number.
utility_name_eia	string	The name of the utility.

core_eia861__yearly_balancing_authority#

Noticed any irregularities with this table? File a report.

Annual time series of balancing authorities.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA Form 861 – Annual Electric Power Industry Report
Primary key:: report_date, balancing_authority_id_eia

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_balancing_authority.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_balancing_authority.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__yearly_balancing_authority.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_balancing_authority.parquet"
)

Columns#

Field Name	Type	Description
balancing_authority_code_eia	string	EIA short code identifying a balancing authority. May include Canadian and Mexican BAs.
balancing_authority_id_eia	integer	EIA balancing authority ID. This is often (but not always!) the same as the utility ID associated with the same legal entity.
balancing_authority_name_eia	string	Name of the balancing authority.
report_date	date	Date reported.

core_eia861__yearly_demand_response#

Noticed any irregularities with this table? File a report.

Annual time series of demand response programs by state, sector, and balancing authority.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA Form 861 – Annual Electric Power Industry Report
Primary key:: balancing_authority_code_eia, customer_class, report_date, state, utility_id_eia

Additional Details#

The EIA861 demand-side management (DSM) table (split into three normalized tables in PUDL) contain data through 2012. The form changed in 2013 and split the contents of the DSM table into energy efficiency and demand response tables. Though similar, the information collected before and after 2012 are not comparable enough to combine into a singular, continuous table. We were discouraged from doing so after contacting a representative from EIA.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_demand_response.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_demand_response.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__yearly_demand_response.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_demand_response.parquet"
)

Columns#

Field Name	Type	Description
actual_peak_demand_savings_mw	number	Demand reduction actually achieved by demand response activities. Measured at the time of the company’s annual system peak hour.
balancing_authority_code_eia	string	EIA short code identifying a balancing authority. May include Canadian and Mexican BAs.
customer_class	string	High level categorization of customer type (e.g., commercial, residential).
customer_incentives_cost	number	Total cost of customer incentives in a given report year. Customer incentives are the total financial value provided to a customer for program participation, whether, for example, cash payment, or lowered tariff rates relative to non-participants, in-kind services (e.g. design work), or other benefits directly provided to the customer for their program participation.
customers	number	Number of customers.
data_maturity	string	Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.
energy_savings_mwh	number	The energy savings incurred in a given reporting year by participation in demand response programs.
other_costs	number	Additional costs.
potential_peak_demand_savings_mw	number	The total demand savings that could occur at the time of the system peak hour assuming all demand response is called.
report_date	date	Date reported.
short_form	boolean	Whether the reported information comes from the short form. In the case of form EIA 861, a shorter version of the form was created in 2012 to reduce respondent burden on smaller utilities and increase our processing efficiency.
state	string	Two letter US state abbreviation.
utility_id_eia	integer	The EIA Utility Identification number.
utility_name_eia	string	The name of the utility.

core_eia861__yearly_demand_response_water_heater#

Noticed any irregularities with this table? File a report.

Annual time series of grid-connected water heaters enrolled in demand response programs.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA Form 861 – Annual Electric Power Industry Report
Primary key:: balancing_authority_code_eia, report_date, state, utility_id_eia

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_demand_response_water_heater.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_demand_response_water_heater.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__yearly_demand_response_water_heater.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_demand_response_water_heater.parquet"
)

Columns#

Field Name	Type	Description
balancing_authority_code_eia	string	EIA short code identifying a balancing authority. May include Canadian and Mexican BAs.
data_maturity	string	Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.
num_water_heaters	integer	The number of grid-enabled water heaters added to the respondent’s program this year - if the respondent has DSM program for grid-enabled water heaters (as defined by DOE’s Office of Energy Efficiency and Renewable Energy).
report_date	date	Date reported.
state	string	Two letter US state abbreviation.
utility_id_eia	integer	The EIA Utility Identification number.

core_eia861__yearly_demand_side_management_ee_dr#

Noticed any irregularities with this table? File a report.

Annual time series of The impact of energy efficiency and load management programs on total energy sold (MWh) and peak demand (MW) by customer class.

Most-recent data:: 2012
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA Form 861 – Annual Electric Power Industry Report
Primary key:: This table has no primary key.

Usage Warnings#

The original data is no longer being collected or reported in this way.

Additional Details#

The raw EIA861 demand-side management (DSM) table (split into three normalized tables in PUDL) contain data through 2012. The form changed in 2013 and split the contents of the DSM table into energy efficiency and demand response tables. Though similar, the information collected before and after 2012 are not comparable enough to combine into a singular, continuous table. We were discouraged from doing so after contacting a representative from EIA.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_demand_side_management_ee_dr.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_demand_side_management_ee_dr.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__yearly_demand_side_management_ee_dr.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_demand_side_management_ee_dr.parquet"
)

Columns#

Field Name	Type	Description
annual_indirect_program_cost	number	Costs that have not been included in any program category, but could be meaningfully identified with operating the company’s DSM programs (e.g., Administrative, Marketing, Monitoring & Evaluation, Company-Earned Incentives, Other).
annual_total_cost	number	The sum of direct program costs, indirect program costs, and incentive payments associated with utility demand side management programs.
customer_class	string	High level categorization of customer type (e.g., commercial, residential).
data_maturity	string	Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.
energy_efficiency_annual_actual_peak_reduction_mw	number	The peak reduction incurred in a given reporting year by all participants in efficiency programs.
energy_efficiency_annual_direct_cost	number	The sum of actual direct costs (excluding incentive payments) incurred from energy efficiency programs in a given reporting year.
energy_efficiency_annual_effects_mwh	number	The change in energy use incurred in a given reporting year by all participants in energy efficiency programs.
energy_efficiency_annual_incentive_cost	number	The cost of incentive payments incurred in a given reporting year from energy efficiency programs. Incentives are the total financial value provided to a customer for program participation, whether cash payment, in-kind services (e.g. design work), or other benefits directly provided customer for their program participation.
energy_efficiency_incremental_actual_peak_reduction_mw	number	The peak reduction incurred in a given reporting year by new participants in existing energy efficiency programs and all participants in new energy efficiency programs.
energy_efficiency_incremental_effects_mwh	number	The change in energy use incurred in a given reporting year by new participants in existing energy efficiency programs and all participants in new energy efficiency programs.
load_management_annual_actual_peak_reduction_mw	number	The peak reduction incurred in a given reporting year by all participants in load management programs.
load_management_annual_direct_cost	number	The sum of actual direct costs (excluding incentive payments) incurred from load management programs in a given reporting year.
load_management_annual_effects_mwh	number	The change in energy use incurred in a given reporting year by all participants in load management programs.
load_management_annual_incentive_cost	number	The cost of incentive payments incurred in a given reporting year from load management programs. Incentives are the total financial value provided to a customer for program participation, whether cash payment, in-kind services (e.g. design work), or other benefits directly provided customer for their program participation.
load_management_annual_potential_peak_reduction_mw	number	The potential amount of peak reduction that could be incurred in a given reporting year by all participants in load management programs.
load_management_incremental_actual_peak_reduction_mw	number	The peak reduction incurred in a given reporting year by new participants in existing load management programs and all participants in new load management programs.
load_management_incremental_effects_mwh	number	The change in energy use incurred in a given reporting year by new participants in existing load management programs and all participants in new load management programs.
load_management_incremental_potential_peak_reduction_mw	number	The potential amount of peak reduction that could be incurred in a given reporting year by new participants in existing load management programs and all participants in new load management programs.
nerc_region	string	NERC region in which the plant is located
price_responsiveness_customers	integer	The number of customers participating in the respondent’s incentive-based demand response programs.
report_date	date	Date reported.
state	string	Two letter US state abbreviation.
time_responsiveness_customers	integer	The number of cusomters participating in the respondent’s time-based rate programs.
utility_id_eia	integer	The EIA Utility Identification number.

core_eia861__yearly_demand_side_management_misc#

Noticed any irregularities with this table? File a report.

Annual time series of demand-side management (DSM) program information.

Most-recent data:: 2012
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA Form 861 – Annual Electric Power Industry Report
Primary key:: This table has no primary key.

Usage Warnings#

The original data is no longer being collected or reported in this way.

Additional Details#

Includes boolean fields about whether the energy savings estimates/calculations were independently verified and whether the utility runs time and or price responsive programs. Also contains information on whether any of the respondent’s DSM activities are reported under another company, and if so which one.

The raw EIA861 demand-side management (DSM) table (split into three normalized tables in PUDL) contain data through 2012. The form changed in 2013 and split the contents of the DSM table into energy efficiency and demand response tables. Though similar, the information collected before and after 2012 are not comparable enough to combine into a singular, continuous table. We were discouraged from doing so after contacting a representative from EIA.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_demand_side_management_misc.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_demand_side_management_misc.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__yearly_demand_side_management_misc.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_demand_side_management_misc.parquet"
)

Columns#

Field Name	Type	Description
data_maturity	string	Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.
energy_savings_estimates_independently_verified	boolean	Whether savings estimates are based on a forecast or the report of one or more independent evaluators.
energy_savings_independently_verified	boolean	Whether reported energy savings were verified through an independent evaluation.
entity_type	string	Entity type of principal owner.
has_time_responsive_programs	boolean	Whether the respondent operates any time-based rate programs (e.g., real-time pricing, critical peak pricing, variable peak pricing and time-of-use rates administered through a tariff).
major_program_changes	boolean	Whether there have been any major changes to the respondent’s demand-side management programs (e.g., terminated programs, new information or financing programs, or a shift to programs with dual load building objectives and energy efficiency objectives), program tracking procedures, or reporting methods that affect the comparison of demand-side management data reported on this schedule to data from previous years.
nerc_region	string	NERC region in which the plant is located
price_responsive_programs	boolean	Whether the respondent operates any incentive-based demand response programs (e.g., market incentives, financial incentives, direct load control, interruptible programs, demand bidding/buyback, emergency demand response, capacity market programs, and ancillary service market programs).
report_date	date	Date reported.
reported_as_another_company	string	The name of the company if a respondent’s demand-side management activities are reported on Schedule 6 of another company’s form.
short_form	boolean	Whether the reported information comes from the short form. In the case of form EIA 861, a shorter version of the form was created in 2012 to reduce respondent burden on smaller utilities and increase our processing efficiency.
state	string	Two letter US state abbreviation.
utility_id_eia	integer	The EIA Utility Identification number.
utility_name_eia	string	The name of the utility.

core_eia861__yearly_demand_side_management_sales#

Noticed any irregularities with this table? File a report.

Annual time series of electricity sales related to demand-side management (DSM).

Most-recent data:: 2012
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA Form 861 – Annual Electric Power Industry Report
Primary key:: This table has no primary key.

Usage Warnings#

The original data is no longer being collected or reported in this way.

Additional Details#

The raw EIA861 demand-side management (DSM) table (split into three normalized tables in PUDL) contain data through 2012. The form changed in 2013 and split the contents of the DSM table into energy efficiency and demand response tables. Though similar, the information collected before and after 2012 are not comparable enough to combine into a singular, continuous table. We were discouraged from doing so after contacting a representative from EIA.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_demand_side_management_sales.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_demand_side_management_sales.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__yearly_demand_side_management_sales.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_demand_side_management_sales.parquet"
)

Columns#

Field Name	Type	Description
data_maturity	string	Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.
nerc_region	string	NERC region in which the plant is located
report_date	date	Date reported.
sales_for_resale_mwh	number	The amount of electricity sold for resale purposes. This entry should include sales for resale to power marketers (reported separately in previous years), full and partial requirements customers, firm power customers and nonfirm customers.
sales_to_ultimate_consumers_mwh	number	The amount of electricity sold to customers purchasing electricity for their own use and not for resale.
state	string	Two letter US state abbreviation.
utility_id_eia	integer	The EIA Utility Identification number.

core_eia861__yearly_distributed_generation_fuel#

Noticed any irregularities with this table? File a report.

Annual time series of the energy sources used for utility or customer-owned distributed generation capacity.

Most-recent data:: 2015
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA Form 861 – Annual Electric Power Industry Report
Primary key:: This table has no primary key.

Usage Warnings#

The original data is no longer being collected or reported in this way.

Additional Details#

From 2016 on, there’s no great match for the data found in this table. You may be able to compute an analog for fuel percents using the capacity and tech class figures in core_eia861__yearly_non_net_metering_customer_fuel_class, but no corresponding consumption or generation information is available in the new format.

The raw EIA861 distributed generation (DG) table (split into three normalized tables in PUDL) was renamed in 2016 to Non-Net Metering to prevent double counting. The data in the Non-Net Metering table (2016+) are split by sector, contain fuel cell information, and convert capacity reported in DC units to AC units.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_distributed_generation_fuel.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_distributed_generation_fuel.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__yearly_distributed_generation_fuel.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_distributed_generation_fuel.parquet"
)

Columns#

Field Name	Type	Description
data_maturity	string	Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.
estimated_or_actual_fuel_data	string	Whether the reported fuel data is estimated or actual.
fuel_class	string	Fuel types specific to EIA 861 distributed generation table: [‘gas’, ‘oil’, ‘other’, ‘renewable’, ‘water’, ‘wind’, ‘wood’]
fuel_pct	number	Percent of fuel
report_date	date	Date reported.
state	string	Two letter US state abbreviation.
utility_id_eia	integer	The EIA Utility Identification number.

core_eia861__yearly_distributed_generation_misc#

Noticed any irregularities with this table? File a report.

Annual time series of the capacity and quantity of utility or customer-owned distributed generation.

Most-recent data:: 2015
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA Form 861 – Annual Electric Power Industry Report
Primary key:: This table has no primary key.

Usage Warnings#

The original data is no longer being collected or reported in this way.

Additional Details#

From 2016 on, data similar to that found in this table can be found in core_eia861__yearly_non_net_metering_misc.

The raw EIA861 distributed generation (DG) table (split into three normalized tables in PUDL) was renamed in 2016 to Non-Net Metering to prevent double counting. The data in the Non-Net Metering table (2016+) are split by sector, contain fuel cell information, and convert capacity reported in DC units to AC units.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_distributed_generation_misc.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_distributed_generation_misc.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__yearly_distributed_generation_misc.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_distributed_generation_misc.parquet"
)

Columns#

Field Name	Type	Description
backup_capacity_mw	number	The total nameplate capacity of generators that are used only for emergency backup service.
data_maturity	string	Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.
distributed_generation_owned_capacity_mw	number	Amount of distributed generation capacity owned by the respondent.
estimated_or_actual_capacity_data	string	Whether the reported capacity data is estimated or actual.
generators_num_less_1_mw	integer	Total number of generators less than 1 MW.
generators_number	integer	Total number of generators
report_date	date	Date reported.
state	string	Two letter US state abbreviation.
total_capacity_less_1_mw	number	The total amount of capacity from generators with less than 1 MW of nameplate capacity.
utility_id_eia	integer	The EIA Utility Identification number.
utility_name_eia	string	The name of the utility.

core_eia861__yearly_distributed_generation_tech#

Noticed any irregularities with this table? File a report.

Annual time series of the technology used for utility or customer-owned distributed generation.

Most-recent data:: 2015
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA Form 861 – Annual Electric Power Industry Report
Primary key:: This table has no primary key.

Usage Warnings#

The original data is no longer being collected or reported in this way.

Additional Details#

From 2016 on, data similar to that found in this table can be found in core_eia861__yearly_non_net_metering_customer_fuel_class.

The raw EIA861 distributed generation (DG) table (split into three normalized tables in PUDL) was renamed in 2016 to Non-Net Metering to prevent double counting. The data in the Non-Net Metering table (2016+) are split by sector, contain fuel cell information, and convert capacity reported in DC units to AC units.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_distributed_generation_tech.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_distributed_generation_tech.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__yearly_distributed_generation_tech.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_distributed_generation_tech.parquet"
)

Columns#

Field Name	Type	Description
capacity_mw	number	Total installed (nameplate) capacity, in megawatts.
data_maturity	string	Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.
estimated_or_actual_tech_data	string	Whether the reported technology data is estimated or actual.
report_date	date	Date reported.
state	string	Two letter US state abbreviation.
tech_class	string	Type of technology specific to EIA 861 distributed generation and net generation tables: [‘backup’, ‘chp_cogen’, ‘combustion_turbine’, ‘fuel_cell’, ‘hydro’, ‘internal_combustion’, ‘other’, ‘pv’, ‘steam’, ‘storage_pv’, ‘storage_nonpv’, ‘all_storage’, ‘total’, ‘virtual_pv’, ‘virtual_pv_under_1mw’, ‘virtual_pv_over_1mw’, ‘wind’].
utility_id_eia	integer	The EIA Utility Identification number.

core_eia861__yearly_distribution_systems#

Noticed any irregularities with this table? File a report.

Annual time series of distribution circuits and circuits with voltage optimization by state.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA Form 861 – Annual Electric Power Industry Report
Primary key:: This table has no primary key.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_distribution_systems.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_distribution_systems.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__yearly_distribution_systems.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_distribution_systems.parquet"
)

Columns#

Field Name	Type	Description
circuits_with_voltage_optimization	integer	Number of distribution circuits that employ voltage/VAR optimization (VVO).
data_maturity	string	Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.
distribution_circuits	integer	Total number of distribution circuits.
report_date	date	Date reported.
short_form	boolean	Whether the reported information comes from the short form. In the case of form EIA 861, a shorter version of the form was created in 2012 to reduce respondent burden on smaller utilities and increase our processing efficiency.
state	string	Two letter US state abbreviation.
utility_id_eia	integer	The EIA Utility Identification number.
utility_name_eia	string	The name of the utility.

core_eia861__yearly_dynamic_pricing#

Noticed any irregularities with this table? File a report.

Annual time series of enrollment in dynamic pricing programs by state, sector, and balancing authority.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA Form 861 – Annual Electric Power Industry Report
Primary key:: This table has no primary key.

Additional Details#

Respondents check if one or more customers are enrolled in time-of-use pricing, real time pricing, variable peak pricing, critical peak pricing, and critical peak rebates.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_dynamic_pricing.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_dynamic_pricing.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__yearly_dynamic_pricing.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_dynamic_pricing.parquet"
)

Columns#

Field Name	Type	Description
balancing_authority_code_eia	string	EIA short code identifying a balancing authority. May include Canadian and Mexican BAs.
critical_peak_pricing	boolean	Whether customers are participating in critical peak pricing, a program in which rate and/or price structure is designed to encourage reduced consumption during periods of high wholesale market prices or system contingencies, by imposing a pre-specified high rate or price for a limited number of days or hours.
critical_peak_rebate	boolean	Whether customers are participating in critical peak rebates, a program in which rate and/or price structure is designed to encourage reduced consumption during periods of high wholesale market prices or system contingencies, by providing a rebate to the customer on a limited number of days and for a limited number of hours, at the request of the energy provider.
customer_class	string	High level categorization of customer type (e.g., commercial, residential).
customers	number	Number of customers.
data_maturity	string	Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.
real_time_pricing	boolean	Whether the respondent has customers participating in a real time pricing (RTP) program. RTP is a program of rate and price structure in which the retail price for electricity typically fluctuates hourly or more often, to reflect changes in the wholesale price of electricity on either a day- ahead or hour-ahead basis.
report_date	date	Date reported.
short_form	boolean	Whether the reported information comes from the short form. In the case of form EIA 861, a shorter version of the form was created in 2012 to reduce respondent burden on smaller utilities and increase our processing efficiency.
state	string	Two letter US state abbreviation.
time_of_use_pricing	boolean	Whether the respondent has customers participating in a time-of-use pricing programs (TOU). TOU is a program in which customers pay different prices at different times of the day. On-peak prices are higher and off-peak prices are lower than a “standard” rate. Price schedule is fixed and predefined, based on season, day of week, and time of day.
utility_id_eia	integer	The EIA Utility Identification number.
utility_name_eia	string	The name of the utility.
variable_peak_pricing	boolean	Whether the respondent has customers participating in a variable peak pricing program (VPP). VPP is a program in which a form of TOU pricing allows customers to purchase their generation supply at prices set on a daily basis with varying on-peak and constant off-peak rates. Under the VPP program, the on-peak price for each weekday becomes available the previous day (typically late afternoon) and the customer is billed for actual consumption during the billing cycle at these prices.

core_eia861__yearly_energy_efficiency#

Noticed any irregularities with this table? File a report.

Annual time series of incremental energy savings, peak demand savings, weighted average life cycle, and associated costs for the reporting year and life cycle of energy efficiency programs.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA Form 861 – Annual Electric Power Industry Report
Primary key:: This table has no primary key.

Additional Details#

The EIA861 demand-side management (DSM) table (split into three normalized tables in PUDL) contain data through 2012. The form changed in 2013 and split the contents of the DSM table into energy efficiency and demand response tables. Though similar, the information collected before and after 2012 are not comparable enough to combine into a singular, continuous table. We were discouraged from doing so after contacting a representative from EIA.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_energy_efficiency.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_energy_efficiency.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__yearly_energy_efficiency.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_energy_efficiency.parquet"
)

Columns#

Field Name	Type	Description
balancing_authority_code_eia	string	EIA short code identifying a balancing authority. May include Canadian and Mexican BAs.
customer_class	string	High level categorization of customer type (e.g., commercial, residential).
customer_incentives_incremental_cost	number	The cost of customer incentives resulting from new participants in existing energy efficiency programs and all participants in new energy efficiency programs. Customer incentives are the total financial value provided to a customer for program participation, whether, for example, cash payment, or lowered tariff rates relative to non-participants, in-kind services (e.g. design work), or other benefits directly provided to the customer for their program participation.
customer_incentives_incremental_life_cycle_cost	number	All anticipated costs of the customer incentives including reporting year incremental costs and all future costs. Customer incentives are the total financial value provided to a customer for program participation, whether, for example, cash payment, or lowered tariff rates relative to non-participants, in-kind services (e.g. design work), or other benefits directly provided to the customer for their program participation.
customer_other_costs_incremental_life_cycle_cost	number	All anticipated costs other than customer incentives. Includes reporting year incremental costs and all future costs.
data_maturity	string	Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.
incremental_energy_savings_mwh	number	Energy savings in the given report year resulting from new participants in existing demand response programs and all participants in new demand response programs.
incremental_life_cycle_energy_savings_mwh	number	The estimated total changes in energy use for incremental programs and participants over the life of the programs. DSM programs have a useful life, and the net effects of these programs will diminish over time. Considers the useful life of energy efficiency technology by accounting for building demolition, equipment degradation, and program attrition.
incremental_life_cycle_peak_reduction_mw	number	The estimated total changes in peak load for incremental programs and participants over the life of the programs. DSM programs have a useful life, and the net effects of these programs will diminish over time. Considers the useful life of energy efficiency technology by accounting for building demolition, equipment degradation, and program attrition.
incremental_peak_reduction_mw	number	The peak reduction incurred in a given reporting year by new participants in existing energy efficiency programs and all participants in new energy efficiency programs.
other_costs_incremental_cost	number	Costs resulting from new participants in existing energy efficiency programs and all participants in new energy efficiency programs that aren’t directly associated with customer incentives.
report_date	date	Date reported.
short_form	boolean	Whether the reported information comes from the short form. In the case of form EIA 861, a shorter version of the form was created in 2012 to reduce respondent burden on smaller utilities and increase our processing efficiency.
state	string	Two letter US state abbreviation.
utility_id_eia	integer	The EIA Utility Identification number.
utility_name_eia	string	The name of the utility.
weighted_average_life_years	number	The weighted average life of the respondent’s portfolio of energy efficiency programs.

core_eia861__yearly_green_pricing#

Noticed any irregularities with this table? File a report.

Annual time series of green pricing program revenue, sales, and customer count by sector and state.

Most-recent data:: 2012
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA Form 861 – Annual Electric Power Industry Report
Primary key:: This table has no primary key.

Usage Warnings#

The original data is no longer being collected or reported in this way.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_green_pricing.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_green_pricing.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__yearly_green_pricing.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_green_pricing.parquet"
)

Columns#

Field Name	Type	Description
customer_class	string	High level categorization of customer type (e.g., commercial, residential).
customers	number	Number of customers.
data_maturity	string	Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.
green_pricing_revenue	number	The money derived from premium green pricing rate of the respondent’sprogram.
rec_revenue	number	Amount of revenue collected from Renewable Energy Certificates (RECs).
rec_sales_mwh	number	Amount of sales collected from Renewable Energy Certificates (RECs).
report_date	date	Date reported.
sales_mwh	number	Quantity of electricity sold in MWh.
state	string	Two letter US state abbreviation.
utility_id_eia	integer	The EIA Utility Identification number.
utility_name_eia	string	The name of the utility.

core_eia861__yearly_mergers#

Noticed any irregularities with this table? File a report.

Annual time series of utility mergers and acquisitions.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA Form 861 – Annual Electric Power Industry Report
Primary key:: This table has no primary key.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_mergers.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_mergers.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__yearly_mergers.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_mergers.parquet"
)

Columns#

Field Name	Type	Description
data_maturity	string	Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.
entity_type	string	Entity type of principal owner.
merge_address	string	Address of new parent company.
merge_city	string	City of new parent company.
merge_company	string	Name of the company merged with or acquired.
merge_date	date	Date of merger or acquisition.
merge_state	string	Two letter US state abbreviations and three letter ISO-3166-1 country codes for international mines.
new_parent	string	Name of the new parent company post merger.
report_date	date	Date reported.
state	string	Two letter US state abbreviation.
utility_id_eia	integer	The EIA Utility Identification number.
utility_name_eia	string	The name of the utility.
zip_code	string	Five digit US Zip Code.
zip_code_4	string	Four digit US Zip Code suffix.

core_eia861__yearly_net_metering_customer_fuel_class#

Noticed any irregularities with this table? File a report.

Annual time series of net metering by customer and fuel class.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA Form 861 – Annual Electric Power Industry Report
Primary key:: This table has no primary key.

Usage Warnings#

Some years use a slightly different data definition.

Additional Details#

The amount of energy sold to back to the grid. From 2007 - 2009 the data are reported as a lump sum of total energy dispatched by sector. After 2009, the data are broken down by sector and technology type.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_net_metering_customer_fuel_class.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_net_metering_customer_fuel_class.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__yearly_net_metering_customer_fuel_class.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_net_metering_customer_fuel_class.parquet"
)

Columns#

Field Name	Type	Description
balancing_authority_code_eia	string	EIA short code identifying a balancing authority. May include Canadian and Mexican BAs.
capacity_mw	number	Total installed (nameplate) capacity, in megawatts.
customer_class	string	High level categorization of customer type (e.g., commercial, residential).
customers	number	Number of customers.
data_maturity	string	Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.
energy_capacity_mwh	number	The total amount of energy which the system can supply power before recharging is necessary, in megawatt-hours.
report_date	date	Date reported.
short_form	boolean	Whether the reported information comes from the short form. In the case of form EIA 861, a shorter version of the form was created in 2012 to reduce respondent burden on smaller utilities and increase our processing efficiency.
sold_to_utility_mwh	number	The amount of electric energy sold back to the utility through the net metering application.
state	string	Two letter US state abbreviation.
tech_class	string	Type of technology specific to EIA 861 distributed generation and net generation tables: [‘backup’, ‘chp_cogen’, ‘combustion_turbine’, ‘fuel_cell’, ‘hydro’, ‘internal_combustion’, ‘other’, ‘pv’, ‘steam’, ‘storage_pv’, ‘storage_nonpv’, ‘all_storage’, ‘total’, ‘virtual_pv’, ‘virtual_pv_under_1mw’, ‘virtual_pv_over_1mw’, ‘wind’].
utility_id_eia	integer	The EIA Utility Identification number.
utility_name_eia	string	The name of the utility.

core_eia861__yearly_net_metering_misc#

Noticed any irregularities with this table? File a report.

Annual time series of PV current flow type for net metered capacity.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA Form 861 – Annual Electric Power Industry Report
Primary key:: This table has no primary key.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_net_metering_misc.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_net_metering_misc.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__yearly_net_metering_misc.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_net_metering_misc.parquet"
)

Columns#

Field Name	Type	Description
balancing_authority_code_eia	string	EIA short code identifying a balancing authority. May include Canadian and Mexican BAs.
data_maturity	string	Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.
pv_current_flow_type	string	Current flow type for photovoltaics: AC or DC
report_date	date	Date reported.
state	string	Two letter US state abbreviation.
utility_id_eia	integer	The EIA Utility Identification number.

core_eia861__yearly_non_net_metering_customer_fuel_class#

Noticed any irregularities with this table? File a report.

Annual time series of non-net metered distributed generation by sector and technology type.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA Form 861 – Annual Electric Power Industry Report
Primary key:: This table has no primary key.

Additional Details#

This table’s data starts in 2016. For 2015 and earlier, data similar to that found in this table can be found in core_eia861__yearly_distributed_generation_tech.

The raw EIA861 distributed generation (DG) table (split into three normalized tables in PUDL) was renamed in 2016 to Non-Net Metering to prevent double counting. The data in the Non-Net Metering table (2016+) are split by sector, contain fuel cell information, and convert capacity reported in DC units to AC units.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_non_net_metering_customer_fuel_class.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_non_net_metering_customer_fuel_class.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__yearly_non_net_metering_customer_fuel_class.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_non_net_metering_customer_fuel_class.parquet"
)

Columns#

Field Name	Type	Description
balancing_authority_code_eia	string	EIA short code identifying a balancing authority. May include Canadian and Mexican BAs.
capacity_mw	number	Total installed (nameplate) capacity, in megawatts.
customer_class	string	High level categorization of customer type (e.g., commercial, residential).
data_maturity	string	Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.
energy_capacity_mwh	number	The total amount of energy which the system can supply power before recharging is necessary, in megawatt-hours.
report_date	date	Date reported.
state	string	Two letter US state abbreviation.
tech_class	string	Type of technology specific to EIA 861 distributed generation and net generation tables: [‘backup’, ‘chp_cogen’, ‘combustion_turbine’, ‘fuel_cell’, ‘hydro’, ‘internal_combustion’, ‘other’, ‘pv’, ‘steam’, ‘storage_pv’, ‘storage_nonpv’, ‘all_storage’, ‘total’, ‘virtual_pv’, ‘virtual_pv_under_1mw’, ‘virtual_pv_over_1mw’, ‘wind’].
utility_id_eia	integer	The EIA Utility Identification number.
utility_name_eia	string	The name of the utility.

core_eia861__yearly_non_net_metering_misc#

Noticed any irregularities with this table? File a report.

Annual time series of non-net metered distributed generation generators, pv current flow type, backup capacity and utility owned capacity.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA Form 861 – Annual Electric Power Industry Report
Primary key:: This table has no primary key.

Additional Details#

This table’s data starts in 2016. For 2015 and earlier, data similar to that found in this table can be found in core_eia861__yearly_distributed_generation_misc.

The raw EIA861 distributed generation (DG) table (split into three normalized tables in PUDL) was renamed in 2016 to Non-Net Metering to prevent double counting. The data in the Non-Net Metering table (2016+) are split by sector, contain fuel cell information, and convert capacity reported in DC units to AC units.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_non_net_metering_misc.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_non_net_metering_misc.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__yearly_non_net_metering_misc.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_non_net_metering_misc.parquet"
)

Columns#

Field Name	Type	Description
backup_capacity_mw	number	The total nameplate capacity of generators that are used only for emergency backup service.
balancing_authority_code_eia	string	EIA short code identifying a balancing authority. May include Canadian and Mexican BAs.
data_maturity	string	Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.
generators_number	integer	Total number of generators
pv_current_flow_type	string	Current flow type for photovoltaics: AC or DC
report_date	date	Date reported.
state	string	Two letter US state abbreviation.
utility_id_eia	integer	The EIA Utility Identification number.
utility_owned_capacity_mw	number	Total non-net-metered capacity owned by the respondent.

core_eia861__yearly_operational_data_misc#

Noticed any irregularities with this table? File a report.

Annual time series of megawatt hours (MWH) for the sources and disposition of a utility’s electricity.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA Form 861 – Annual Electric Power Industry Report (Schedule 2B)
Primary key:: utility_id_eia, nerc_region, report_date

Usage Warnings#

Some values have been redacted.
Contains rows where missing values were imputed.

Additional Details#

Respondents are required to report this information to the EIA, but are not required to disclose utility-level data to the public. When a respondent chooses to keep its utility-level data proprietary, it files using EIA utility id 88888. For more details, see EIA-861 Notable Irregularities.

Rows where data_observed is False were labeled as imputed in the raw EIA data. The EIA documentation does not specify what imputation applies to operational data, only net metering and non net-metering distributed data. Less than 1% of rows are labeled as imputed, all of which occur in 2004 or 2005. Imputed rows make up no more than 25% of the rows for any utility.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_operational_data_misc.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_operational_data_misc.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__yearly_operational_data_misc.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_operational_data_misc.parquet"
)

Columns#

Field Name	Type	Description
consumed_by_facility_mwh	number	The amount of electricity used by the facility.
consumed_by_respondent_without_charge_mwh	number	The amount of electricity used by the electric utility in its electric and other departments without charge.
data_maturity	string	Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.
data_observed	boolean	Is the value observed (True) or imputed (False).
entity_type	string	Entity type of principal owner.
exchange_energy_delivered_mwh	number	The amount of exchange energy delivered. Does not include power delivered as part of a tolling arrangement.
exchange_energy_received_mwh	number	The amount of exchange energy received. Does not include power received through tolling arrangements.
furnished_without_charge_mwh	number	The amount of electricity furnished by the electric utility without charge, such as to a municipality under a franchise agreement or for street and highway lighting.
nerc_region	string	NERC region in which the plant is located
net_generation_mwh	number	Net electricity generation for the specified period in megawatt-hours (MWh).
net_power_exchanged_mwh	number	The net amount of energy exchanged. Net exchange is the difference between the amount of exchange received and the amount of exchange delivered. This entry should not include wholesale energy purchased from or sold to regulated companies or unregulated companies for other systems.
net_wheeled_power_mwh	number	The difference between the amount of energy entering the respondent’s system (wheeled received) for transmission through the respondent’s system and the amount of energy leaving the respondent’s system (wheeled delivered). Wheeled net represents the energy losses on the respondent’s system associated with the wheeling of energy for other systems.
report_date	date	Date reported.
retail_sales_mwh	number	MWh of sales to end-use customers in areas where the customer has been given the legal right to select a power supplier other than the traditional, vertically integrated electric utility.
sales_for_resale_mwh	number	The amount of electricity sold for resale purposes. This entry should include sales for resale to power marketers (reported separately in previous years), full and partial requirements customers, firm power customers and nonfirm customers.
short_form	boolean	Whether the reported information comes from the short form. In the case of form EIA 861, a shorter version of the form was created in 2012 to reduce respondent burden on smaller utilities and increase our processing efficiency.
state	string	Two letter US state abbreviation.
summer_peak_demand_mw	number	The maximum hourly summer load (for the months of June through September) based on net energy for the system during the reporting year. Net energy for the system is the sum of energy an electric utility needs to satisfy their service area and includes full and partial wholesale requirements customers, and the losses experienced in delivery. The maximum hourly load is determined by the interval in which the 60-minute integrated demand is the greatest.
total_disposition_mwh	number	Sum of all disposition of electricity listed. Includes sales to ultimate customers, sales for resale, energy furnished without charge, energy consumed by respondent without charge and total energy losses.
total_energy_losses_mwh	number	The total amount of electricity lost from transmission, distribution, and/or unaccounted for. Should be expressed as a positive number.
total_sources_mwh	number	Sum of all sources of electricity listed. Includes net generation, purchases from electricity suppliers, net exchanges (received - delivered), net wheeled (received - delivered), transmission by others, and losses.
transmission_by_other_losses_mwh	number	The amount of energy losses associated with the wheeling of electricity provided to the respondent’s system by other utilities. Transmission by others, losses should always be a negative value.
utility_id_eia	integer	The EIA Utility Identification number.
utility_name_eia	string	The name of the utility.
wheeled_power_delivered_mwh	number	The total amount of energy leaving the respondent’s system that was transmitted through the respondent’s system for delivery to other systems. If wheeling delivered is not precisely known, the value is an estimate based on the respondent’s system’s known percentage of losses for wheeling transactions.
wheeled_power_received_mwh	number	The total amount of energy entering the respondent’s system from other systems for transmission through the respondent’s system (wheeling) for delivery to other systems. Does not include energy purchased or exchanged for consumption within the respondent’s system, which was wheeled to the respondent by others.
wholesale_power_purchases_mwh	number	Purchases from electricity suppliers.
winter_peak_demand_mw	number	The maximum hourly winter load (for the months of January through March) based on net energy for the system during the reporting year. Net energy for the system is the sum of energy an electric utility needs to satisfy their service area and includes full and partial wholesale requirements customers, and the losses experienced in delivery. The maximum hourly load is determined by the interval in which the 60-minute integrated demand is the greatest.

core_eia861__yearly_operational_data_revenue#

Noticed any irregularities with this table? File a report.

Annual time series of utility revenue by type of electric operating revenue.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA Form 861 – Annual Electric Power Industry Report (Schedule 2C)
Primary key:: utility_id_eia, nerc_region, report_date, revenue_class

Usage Warnings#

Some values have been redacted.

Additional Details#

A utility’s revenue by type of electric operating revenue. Includes electric operating revenue from sales to ultimate customers, revenue from unbundled (delivery) customers, revenue from sales for resale, electric credits/other adjustments, revenue from transmission, other electric operating revenue, and total electric operating revenue.

Respondents are required to report this information to the EIA, but are not required to disclose utility-level data to the public. When a respondent chooses to keep its utility-level data proprietary, it files using EIA utility id 88888. For more details, see EIA-861 Notable Irregularities.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_operational_data_revenue.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_operational_data_revenue.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__yearly_operational_data_revenue.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_operational_data_revenue.parquet"
)

Columns#

Field Name	Type	Description
data_maturity	string	Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.
nerc_region	string	NERC region in which the plant is located
report_date	date	Date reported.
revenue	number	Amount of revenue.
revenue_class	string	Source of revenue (e.g., retail sales, transmission).
state	string	Two letter US state abbreviation.
utility_id_eia	integer	The EIA Utility Identification number.

core_eia861__yearly_reliability#

Noticed any irregularities with this table? File a report.

Annual time series of electricity system reliability and outage impacts.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA Form 861 – Annual Electric Power Industry Report (Schedules 3B and 3C)

Primary key:

utility_id_eia, state, report_date, standard

Note: standard is included because while respondents are asked to only fill out one of parts B or C, sometimes they fill out both.

Additional Details#

Contains information on non-momentary electrical interruptions. Includes the system average interruption duration index (SAIDI), system average interruption frequency index (SAIFI), and customer average interruption duration index (CAIDI) aka SAIDI/SAIFI with and without major event days and loss of service. Includes the standard (IEEE/other) and other relevant information.

SAIDI is measured in minutes, and represents the sum over the year of all customers interrupted times the number of minutes they experienced an interruption, divided by total number of customers.

SAIFI is measured in customers, and represents the sum over the year of the total number of customers that experienced an interruption, divided by the total number of customers.

The IEEE standards which can be used to calculate SAIDI and SAIFI include IEEE 1366-2003 and IEEE 1366-2012. These standards define momentary interruptions as having a duration of five minutes or less. If one of these IEEE standards is used, respondents fill out part B and are required to specify:

SAIDI major event days minus loss of supply
SAIFI major event days minus loss of supply

SAIDI and SAIFI major event days minus loss of supply excludes events where the reliability event resulted from an event on the distribution system, not from the high-voltage system. The voltage that distinguishes the distribution system from the supply system is given in highest_distribution_voltage_kv.

If a method other than these IEEE standards is used for calculating SAIDI and SAIFI indexes, respondents fill out part C and are required to specify:

whether inactive accounts are included
how they define a momentary interruption (less than 1 minute, less than or equal to 5 minutes, or other)

In this table, column standard is “ieee_standard” for respondents who have filled out part B, and “other_standard” for respondents who have filled out part C.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_reliability.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_reliability.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__yearly_reliability.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_reliability.parquet"
)

Columns#

Field Name	Type	Description
caidi_w_major_event_days_minus_loss_of_service_minutes	number	Average number of minutes per interruption (SAIDI/SAIFI) including major event days and excluding reliability events caused by a loss of supply.
caidi_w_major_event_days_minutes	number	Average number of minutes per interruption (SAIDI/SAIFI) including major event days.
caidi_wo_major_event_days_minutes	number	Average number of minutes per interruption (SAIDI/SAIFI) excluding major event days.
customers	number	Number of customers.
data_maturity	string	Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.
entity_type	string	Entity type of principal owner.
highest_distribution_voltage_kv	number	The highest voltage that’s part of the distribution system.
inactive_accounts_included	boolean	Whether the respondent includes inactive accounts in its definition of customers used to determine SAIDI and SAIFI.
momentary_interruption_definition	string	How the respondent defines momentary service interruptions: less than 1 min, equal to or less than 5 min, or some other way.
outages_recorded_automatically	boolean	Whether the information about customer outages is recorded automatically.
report_date	date	Date reported.
saidi_w_major_event_days_minus_loss_of_service_minutes	number	Cumulative duration (minutes) of interruption for the average customer during the report year including major event days and excluding reliability events caused by a loss of supply.
saidi_w_major_event_days_minutes	number	Cumulative duration (minutes) of interruption for the average customer during the report year including major event days.
saidi_wo_major_event_days_minutes	number	Cumulative duration (minutes) of interruption for the average customer during the report year excluding major event days.
saifi_w_major_event_days_customers	number	Average number of times a customer experienced a sustained interruption (over 5 minutes) during the report year including major event days.
saifi_w_major_event_days_minus_loss_of_service_customers	number	Average number of times a customer experienced a sustained interruption (over 5 minutes) during the report year including major event days and excluding reliability events caused by a loss of supply.
saifi_wo_major_event_days_customers	number	Average number of times a customer experienced a sustained interruption (over 5 minutes) during the report year excluding major event days.
short_form	boolean	Whether the reported information comes from the short form. In the case of form EIA 861, a shorter version of the form was created in 2012 to reduce respondent burden on smaller utilities and increase our processing efficiency.
standard	string	Whether the respondent calculates SAIDI/SAIFI, and major event days according to the IEEE or an Other standard.
state	string	Two letter US state abbreviation.
utility_id_eia	integer	The EIA Utility Identification number.
utility_name_eia	string	The name of the utility.

core_eia861__yearly_sales#

Noticed any irregularities with this table? File a report.

Annual time series of electricity sales to ultimate customers by utility, balancing authority, state, and customer class.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA Form 861 – Annual Electric Power Industry Report
Primary key:: utility_id_eia, state, report_date, balancing_authority_code_eia, customer_class, business_model, service_type

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_sales.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_sales.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__yearly_sales.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_sales.parquet"
)

Columns#

Field Name	Type	Description
balancing_authority_code_eia	string	EIA short code identifying a balancing authority. May include Canadian and Mexican BAs.
business_model	string	Business model.
customer_class	string	High level categorization of customer type (e.g., commercial, residential).
customers	number	Number of customers.
data_maturity	string	Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.
data_observed	boolean	Is the value observed (True) or imputed (False).
entity_type	string	Entity type of principal owner.
report_date	date	Date reported.
sales_mwh	number	Quantity of electricity sold in MWh.
sales_revenue	number	Revenue from electricity sold.
service_type	string	The type of service the respondent provides to a given customer class.Bundled: both energy and delivery; energy: just the energy consumed; delivery: just the billing and energy delivery services.
short_form	boolean	Whether the reported information comes from the short form. In the case of form EIA 861, a shorter version of the form was created in 2012 to reduce respondent burden on smaller utilities and increase our processing efficiency.
state	string	Two letter US state abbreviation.
utility_id_eia	integer	The EIA Utility Identification number.
utility_name_eia	string	The name of the utility.

core_eia861__yearly_service_territory#

Noticed any irregularities with this table? File a report.

Annual time series of counties in utility service territories.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA Form 861 – Annual Electric Power Industry Report
Primary key:: report_date, utility_id_eia, county_id_fips

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_service_territory.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_service_territory.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__yearly_service_territory.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_service_territory.parquet"
)

Columns#

Field Name	Type	Description
county	string	County name.
county_id_fips	string	County ID from the Federal Information Processing Standard Publication 6-4.
data_maturity	string	Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.
report_date	date	Date reported.
short_form	boolean	Whether the reported information comes from the short form. In the case of form EIA 861, a shorter version of the form was created in 2012 to reduce respondent burden on smaller utilities and increase our processing efficiency.
state	string	Two letter US state abbreviation.
state_id_fips	string	Two digit state FIPS code.
utility_id_eia	integer	The EIA Utility Identification number.
utility_name_eia	string	The name of the utility.

core_eia861__yearly_short_form#

Noticed any irregularities with this table? File a report.

Annual time series of data from the short form (EIA-861S).

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA Form 861 – Annual Electric Power Industry Report
Primary key:: utility_id_eia, state, report_date, balancing_authority_code_eia

Usage Warnings#

Some years are missing from the data record.

Additional Details#

The data started being reported in 2012. However, the 2019 data is not available. They are expected to submit the completed Form EIA-861S to EIA by April 30th, following the end of the prior calendar year. Utilities report on Form EIA-861S if they:

Report less than 200,000 megawatthours on the last previous Form EIA-861.
Provide only bundled service (generation and distribution).
Are not needed to ensure acceptable quality of statistical estimates.
Are not part of the aggregate TVA or WPPI.
Do not report on Form EIA-861M.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_short_form.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_short_form.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__yearly_short_form.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_short_form.parquet"
)

Columns#

Field Name	Type	Description
balancing_authority_code_eia	string	EIA short code identifying a balancing authority. May include Canadian and Mexican BAs.
customers	number	Number of customers.
data_maturity	string	Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.
entity_type	string	Entity type of principal owner.
has_demand_side_management	boolean	Whether there were strategies or measures used to control electricity demand by customers
has_green_pricing	boolean	Whether a green pricing program was associated with this utility during the reporting year.
has_net_metering	boolean	Whether the plant has a net metering agreement in effect during the reporting year. (Only displayed for facilities that report the sun or wind as an energy source). This field was only reported up until 2015
has_time_responsive_programs	boolean	Whether the respondent operates any time-based rate programs (e.g., real-time pricing, critical peak pricing, variable peak pricing and time-of-use rates administered through a tariff).
report_date	date	Date reported.
sales_mwh	number	Quantity of electricity sold in MWh.
sales_revenue	number	Revenue from electricity sold.
state	string	Two letter US state abbreviation.
utility_id_eia	integer	The EIA Utility Identification number.
utility_name_eia	string	The name of the utility.

core_eia861__yearly_utility_data_misc#

Noticed any irregularities with this table? File a report.

Annual time series of utility business activities.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA Form 861 – Annual Electric Power Industry Report
Primary key:: This table has no primary key. The primary key would have been: utility_id_eia, state, report_date, and nerc_region, but there are nulls in the state column across several years of reporting.

Additional Details#

This includes whether they operate alternative fuel vehicles, whether they provide transmission, distribution, or generation services (bundled or unbundled), and whether they engage in wholesale and/or retail markets.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_utility_data_misc.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_utility_data_misc.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__yearly_utility_data_misc.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_utility_data_misc.parquet"
)

Columns#

Field Name	Type	Description
alternative_fuel_vehicle_2_activity	boolean	Whether the utility plants to operate alternative-fueled vehicles this coming year.
alternative_fuel_vehicle_activity	boolean	Whether the utility operates alternative-fueled vehicles during the year.
bundled_activity	boolean	Whether a utility engaged in combined utility services (electricity plus other services such as gas, water, etc. in addition to electric services) during the year.
buying_distribution_activity	boolean	Whether a utility bought any distribution on other electrical systems during the year.
buying_transmission_activity	boolean	Whether a utility bought any transmission services on other electrical systems during the year.
data_maturity	string	Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.
distribution_activity	boolean	Whether a utility engaged in any distribution using owned/leased electrical wires during the year.
entity_type	string	Entity type of principal owner.
generation_activity	boolean	Whether a utility utilized generation from company owned plant during the year.
nerc_region	string	NERC region in which the plant is located
operates_generating_plant	boolean	Whether the respondent operated at least one generating plant during the reporting period.
report_date	date	Date reported.
retail_marketing_activity	boolean	Whether a utility engaged in retail power marketing during the year.
short_form	boolean	Whether the reported information comes from the short form. In the case of form EIA 861, a shorter version of the form was created in 2012 to reduce respondent burden on smaller utilities and increase our processing efficiency.
state	string	Two letter US state abbreviation.
transmission_activity	boolean	Whether a utility engaged in any transmission activities during the year.
utility_id_eia	integer	The EIA Utility Identification number.
utility_name_eia	string	The name of the utility.
wholesale_marketing_activity	boolean	Whether a utility engages in wholesale power marketing during the year.

core_eia861__yearly_utility_data_nerc#

Noticed any irregularities with this table? File a report.

Annual time series of the NERC regions that utilities operate in.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA Form 861 – Annual Electric Power Industry Report
Primary key:: This table has no primary key. The primary key would have been: utility_id_eia, state, report_date, nerc_region, and nerc_regions_of_operation, but there are nulls in the state column across several years of reporting.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_utility_data_nerc.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_utility_data_nerc.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__yearly_utility_data_nerc.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_utility_data_nerc.parquet"
)

Columns#

Field Name	Type	Description
data_maturity	string	Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.
nerc_region	string	NERC region in which the plant is located
nerc_regions_of_operation	string	All the regional entities within the North American Electric Reliability Corporation (NERC) in which the respodent conducts operations.
report_date	date	Date reported.
state	string	Two letter US state abbreviation.
utility_id_eia	integer	The EIA Utility Identification number.

core_eia861__yearly_utility_data_rto#

Noticed any irregularities with this table? File a report.

Annual time series of the RTOs that utilities operate in.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA Form 861 – Annual Electric Power Industry Report
Primary key:: This table has no primary key. The primary key would have been: utility_id_eia, state, report_date, nerc_region, and rtos_of_operation, but there are nulls in the state column across several years of reporting.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_utility_data_rto.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_utility_data_rto.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__yearly_utility_data_rto.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_utility_data_rto.parquet"
)

Columns#

Field Name	Type	Description
data_maturity	string	Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.
nerc_region	string	NERC region in which the plant is located
report_date	date	Date reported.
rtos_of_operation	string	The ISOs/RTOs, in which the respondent conducts operations.
state	string	Two letter US state abbreviation.
utility_id_eia	integer	The EIA Utility Identification number.

core_eia923__entity_coalmine#

Noticed any irregularities with this table? File a report.

Entity table containing static information about coal mines reporting deliveries in the Fuel Receipts and Costs table.

Most-recent data:: 2025
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA Form 923 – Power Plant Operations Report (Schedule 2, Part C)
Primary key:: mine_id_pudl

Usage Warnings#

Mine ID’s are not as well defined and consistent as other EIA ID’s.

Additional Details#

This table is produced during the transformation of fuel delivery data, in order to produce a better normalized database. The same coalmines report many individual deliveries, and repeating their attributes many times in the fuel receipts and costs table is duplicative. Unfortunately the coalmine attributes do not generally use a controlled vocabulary or well defined IDs and so in practice there are many distinct records in this table that correspond to the same mines in reality.

We have not yet taken the time to rigorously clean this data, but it could be linked with both Mining Safety and Health Administration (MSHA) and USGS data to provide more insight into where coal is coming from, and what the employment and geological context is for those supplies.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia923__entity_coalmine.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia923__entity_coalmine.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia923__entity_coalmine.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia923__entity_coalmine.parquet"
)

Columns#

Field Name	Type	Description
county_id_fips	string	County ID from the Federal Information Processing Standard Publication 6-4.
data_maturity	string	Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.
mine_id_msha	integer	MSHA issued mine identifier.
mine_id_pudl	integer	Dynamically assigned PUDL mine identifier.
mine_name	string	Coal mine name.
mine_type_code	string	Type of coal mine.
state	string	Two letter US state abbreviation.

core_eia923__fuel_receipts_costs#

Noticed any irregularities with this table? File a report.

Individual fuel deliveries to power plants, organized by fuel type and supplier.

Most-recent data:: 2025
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA Form 923 – Power Plant Operations Report (Schedule 2 - Part A)
Primary key:: This table has no primary key. Each record describes an individual fuel delivery. There can be multiple deliveries of the same type of fuel from the same supplier to the same plant in a single month, so the table has no natural primary key.

Usage Warnings#

Date column arbitrarily uses the first of the month.
Some values have been redacted.
Time of fuel deliveries is not necessarily connected with time of fuel consumption.

Additional Details#

There can be a significant delay between the receipt of fuel and its consumption, so using this table to infer monthly attributes associated with power generation may not be entirely accurate. However, this is the most granular data we have describing fuel costs, and we use it in calculating the marginal cost of electricity for individual generation units.

Under some circumstances utilities are allowed to treat the price of fuel as proprietary business data, meaning it is redacted from the publicly available spreadsheets. It’s still reported to EIA and influences the aggregated (state, region, annual, etc.) fuel prices they publish. From 2009-2021 about 1/3 of all prices are redacted. The missing data is not randomly distributed. Deregulated markets dominated by merchant generators (independent power producers) redact much more data, and natural gas is by far the most likely fuel to have its price redacted. This means, for instance, that the entire Northeastern US reports essentially no fine-grained data about its natural gas prices.

Additional data which we haven’t yet integrated is available in a similar format from 2002-2008 via the EIA-423, and going back as far as 1972 from the FERC-423.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia923__fuel_receipts_costs.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia923__fuel_receipts_costs.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia923__fuel_receipts_costs.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia923__fuel_receipts_costs.parquet"
)

Columns#

Field Name	Type	Description
ash_content_pct	number	Ash content percentage by weight to the nearest 0.1 percent.
chlorine_content_ppm	number	For coal only: the chlorine content in parts per million (ppm) to the nearest 0.001 ppm. If lab tests of the coal do not include the chlorine content, this field contains the amount specified in the contract with the supplier.
contract_expiration_date	date	Date contract expires.Format: MMYY.
contract_type_code	string	Purchase type under which receipts occurred in the reporting month. C: Contract, NC: New Contract, S: Spot Purchase, T: Tolling Agreement.
data_maturity	string	Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.
energy_source_code	string	A 2-3 letter code indicating the energy source (e.g. fuel type) associated with the record.
fuel_cost_per_mmbtu	number	Average fuel cost per MMBTU of heat content in nominal USD.
fuel_group_code	string	Fuel groups used in the Electric Power Monthly
fuel_mmbtu_per_unit	number	Heat content of the fuel in millions of Btus per physical unit.
fuel_received_units	number	Quantity of fuel received in tons, barrel, or Mcf.
fuel_type_code_pudl	string	Simplified fuel type code used in PUDL
mercury_content_ppm	number	Mercury content in parts per million (ppm) to the nearest 0.001 ppm.
mine_id_pudl	integer	Dynamically assigned PUDL mine identifier.
moisture_content_pct	number	For coal only: the moisture content of the fuel in terms of moisture percentage by weight. Reported to the nearest 0.01 percent.
natural_gas_delivery_contract_type_code	string	Contract type for natural gas delivery service:
natural_gas_transport_code	string	Contract type for natural gas transportation service.
plant_id_eia	integer	The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.
primary_transportation_mode_code	string	Transportation mode for the longest distance transported.
report_date	date	Date reported.
secondary_transportation_mode_code	string	Transportation mode for the second longest distance transported.
sulfur_content_pct	number	Sulfur content percentage by weight to the nearest 0.01 percent.
supplier_name	string	Company that sold the fuel to the plant or, in the case of Natural Gas, pipeline owner.

core_eia923__monthly_boiler_fuel#

Noticed any irregularities with this table? File a report.

Monthly time series of boiler fuel consumption and emissions.

Most-recent data:: 2025
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA Form 923 – Power Plant Operations Report (Schedule 3)
Primary key:: plant_id_eia, boiler_id, energy_source_code, prime_mover_code, report_date

Usage Warnings#

Date column arbitrarily uses the first of the month.
A small number of respondents only report annual fuel consumption, and all of it is reported in December.

Additional Details#

Reports the quantity of each type of fuel consumed by each boiler on a monthly basis, as well as the sulfur and ash content of those fuels. Fuel quantity is reported in standard EIA fuel units (tons, barrels, Mcf). Heat content per unit of fuel is also reported, making this table useful for calculating the thermal efficiency (heat rate) of various generation units.

This table provides better coverage of the entire fleet of generators than the core_eia923__monthly_generation_fuel table, but the fuel consumption reported here is not directly associated with a generator. This complicates the heat rate calculation, since the associations between individual boilers and generators are incomplete and can be complex.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia923__monthly_boiler_fuel.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia923__monthly_boiler_fuel.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia923__monthly_boiler_fuel.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia923__monthly_boiler_fuel.parquet"
)

Columns#

Field Name	Type	Description
ash_content_pct	number	Ash content percentage by weight to the nearest 0.1 percent.
boiler_id	string	Alphanumeric boiler ID.
data_maturity	string	Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.
energy_source_code	string	A 2-3 letter code indicating the energy source (e.g. fuel type) associated with the record.
fuel_consumed_units	number	Consumption of the fuel type in physical unit. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production.
fuel_mmbtu_per_unit	number	Heat content of the fuel in millions of Btus per physical unit.
fuel_type_code_pudl	string	Simplified fuel type code used in PUDL
plant_id_eia	integer	The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.
prime_mover_code	string	Code for the type of prime mover (e.g. CT, CG)
report_date	date	Date reported.
sulfur_content_pct	number	Sulfur content percentage by weight to the nearest 0.01 percent.

core_eia923__monthly_energy_storage#

Noticed any irregularities with this table? File a report.

Monthly time series of energy consumed and discharged (“generated”) by energy storage units.

Most-recent data:: 2025
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA Form 923 – Power Plant Operations Report (Schedule 3)
Primary key:: plant_id_eia, report_date, prime_mover_code, energy_source_code

Usage Warnings#

Date column arbitrarily uses the first of the month.

Additional Details#

The total MWh discharged from the energy storage unit during the reporting period is the gross generation and the difference between gross generation and consumption is the net generation.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia923__monthly_energy_storage.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia923__monthly_energy_storage.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia923__monthly_energy_storage.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia923__monthly_energy_storage.parquet"
)

Columns#

Field Name	Type	Description
data_maturity	string	Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.
energy_source_code	string	A 2-3 letter code indicating the energy source (e.g. fuel type) associated with the record.
fuel_consumed_for_electricity_units	number	Consumption for electric generation of the fuel type in physical unit.
fuel_consumed_units	number	Consumption of the fuel type in physical unit. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production.
fuel_units	string	Reported unit of measure for fuel.
gross_generation_mwh	number	Gross electricity generation for the specified period in megawatt-hours (MWh).
net_generation_mwh	number	Net electricity generation for the specified period in megawatt-hours (MWh).
plant_id_eia	integer	The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.
prime_mover_code	string	Code for the type of prime mover (e.g. CT, CG)
report_date	date	Date reported.

core_eia923__monthly_generation#

Noticed any irregularities with this table? File a report.

Monthly time series of net electricity generated by power plant generators.

Most-recent data:: 2025
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA Form 923 – Power Plant Operations Report (Schedule 3)
Primary key:: plant_id_eia, generator_id, report_date

Usage Warnings#

Date column arbitrarily uses the first of the month.
Not all IDs are present.
Table has known low coverage - either geographic or temporal or otherwise.
A small number of respondents only report annual fuel consumption, and all of it is reported in December.

Additional Details#

This is the most granular information we have about how much electricity individual generators are producing, but only about half of all the generation reported in the core_eia923__monthly_generation_fuel appears in this table due to the different reporting requirements imposed on different types and sizes of generators.

Whenever possible, we use this generator-level net generation to estimate the heat rates of generation units and the marginal cost of electricity on a per-generator basis, but those calculations depend on plant-level fuel costs and sometimes uncertain or incomplete boiler-generator associations.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia923__monthly_generation.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia923__monthly_generation.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia923__monthly_generation.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia923__monthly_generation.parquet"
)

Columns#

Field Name	Type	Description
data_maturity	string	Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.
generator_id	string	Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string!
net_generation_mwh	number	Net electricity generation for the specified period in megawatt-hours (MWh).
plant_id_eia	integer	The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.
report_date	date	Date reported.

core_eia923__monthly_generation_fuel#

Noticed any irregularities with this table? File a report.

Monthly time series of generation and fuel consumption by fuel and prime mover.

Most-recent data:: 2025
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA Form 923 – Power Plant Operations Report (Schedule 3)
Primary key:: plant_id_eia, report_date, prime_mover_code, energy_source_code

Usage Warnings#

Date column arbitrarily uses the first of the month.
A small number of respondents only report annual fuel consumption, and all of it is reported in December.

Additional Details#

This data can’t be easily linked to individual boilers, generators, and generation units, but it is provides the most complete coverage of fuel consumption and electricity generation for the entire generation fleet. We use the primary fuels and prime movers reported for each generator along with their capacities to attribute fuel consumption and generation when it isn’t directly reported in the core_eia923__monthly_generation and core_eia923__monthly_boiler_fuel tables in order to calculate capacity factors, heat rates, and the marginal cost of electricity.

The table makes a distinction between all fuel consumed and fuel consumed for electricity generation because some units are also combined heat and power (CHP) units, and also provide high temperature process heat at the expense of net electricity generation.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia923__monthly_generation_fuel.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia923__monthly_generation_fuel.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia923__monthly_generation_fuel.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia923__monthly_generation_fuel.parquet"
)

Columns#

Field Name	Type	Description
data_maturity	string	Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.
energy_source_code	string	A 2-3 letter code indicating the energy source (e.g. fuel type) associated with the record.
fuel_consumed_for_electricity_mmbtu	number	Total consumption of fuel to produce electricity, in physical unit, year to date.
fuel_consumed_for_electricity_units	number	Consumption for electric generation of the fuel type in physical unit.
fuel_consumed_mmbtu	number	Total consumption of fuel in physical unit, year to date. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production.
fuel_consumed_units	number	Consumption of the fuel type in physical unit. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production.
fuel_mmbtu_per_unit	number	Heat content of the fuel in millions of Btus per physical unit.
fuel_type_code_agg	string	A partial aggregation of the reported fuel type codes into larger categories used by EIA in, for example, the Annual Energy Review (AER) or Monthly Energy Review (MER). Two or three letter alphanumeric.
fuel_type_code_pudl	string	Simplified fuel type code used in PUDL
net_generation_mwh	number	Net electricity generation for the specified period in megawatt-hours (MWh).
plant_id_eia	integer	The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.
prime_mover_code	string	Code for the type of prime mover (e.g. CT, CG)
report_date	date	Date reported.

core_eia923__monthly_generation_fuel_nuclear#

Noticed any irregularities with this table? File a report.

Monthly time series of generation and fuel consumption of nuclear generation units by fuel and prime mover.

Most-recent data:: 2025
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA Form 923 – Power Plant Operations Report (Schedule 3)
Primary key:: plant_id_eia, report_date, nuclear_unit_id, energy_source_code, prime_mover_code

Usage Warnings#

Date column arbitrarily uses the first of the month.
A small number of respondents only report annual fuel consumption, and all of it is reported in December.

Additional Details#

This data is originally reported alongside similar information for fossil fuel plants, but the nuclear data is reported by (nuclear) generation unit rather than fuel type and prime mover, and so has a different primary key.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia923__monthly_generation_fuel_nuclear.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia923__monthly_generation_fuel_nuclear.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia923__monthly_generation_fuel_nuclear.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia923__monthly_generation_fuel_nuclear.parquet"
)

Columns#

Field Name	Type	Description
data_maturity	string	Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.
energy_source_code	string	A 2-3 letter code indicating the energy source (e.g. fuel type) associated with the record.
fuel_consumed_for_electricity_mmbtu	number	Total consumption of fuel to produce electricity, in physical unit, year to date.
fuel_consumed_for_electricity_units	number	Consumption for electric generation of the fuel type in physical unit.
fuel_consumed_mmbtu	number	Total consumption of fuel in physical unit, year to date. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production.
fuel_consumed_units	number	Consumption of the fuel type in physical unit. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production.
fuel_mmbtu_per_unit	number	Heat content of the fuel in millions of Btus per physical unit.
fuel_type_code_agg	string	A partial aggregation of the reported fuel type codes into larger categories used by EIA in, for example, the Annual Energy Review (AER) or Monthly Energy Review (MER). Two or three letter alphanumeric.
fuel_type_code_pudl	string	Simplified fuel type code used in PUDL
net_generation_mwh	number	Net electricity generation for the specified period in megawatt-hours (MWh).
nuclear_unit_id	string	For nuclear plants only, the unit number .One digit numeric. Nuclear plants are the only type of plants for which data are shown explicitly at the generating unit level.
plant_id_eia	integer	The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.
prime_mover_code	string	Code for the type of prime mover (e.g. CT, CG)
report_date	date	Date reported.

core_eia930__hourly_interchange#

Noticed any irregularities with this table? File a report.

Hourly time series of balancing authority interchange.

Most-recent data:: 2026
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA Form 930 – Hourly and Daily Balancing Authority Operations Report
Primary key:: datetime_utc, balancing_authority_code_eia, balancing_authority_code_adjacent_eia

Additional Details#

Interchange is the net metered tie line flow from one BA to another directly interconnected BA. Total net interchange is the net sum of all interchange occurring between a BA and its directly interconnected neighboring BAs. Negative interchange values indicate net inflows, and positive interchange values indicate net outflows.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia930__hourly_interchange.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia930__hourly_interchange.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia930__hourly_interchange.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia930__hourly_interchange.parquet"
)

Columns#

Field Name	Type	Description
balancing_authority_code_adjacent_eia	string	EIA short code for the other adjacent balancing authority, with which interchange is occurring. Includes Canadian and Mexican BAs.
balancing_authority_code_eia	string	EIA short code identifying a balancing authority. May include Canadian and Mexican BAs.
datetime_utc	datetime	Timestamp at the end of the hour for which the data is reported.
interchange_reported_mwh	number	Original reported energy interchange between adjacent balancing authorities.

core_eia930__hourly_net_generation_by_energy_source#

Noticed any irregularities with this table? File a report.

Hourly time series of balancing authority net generation by energy source.

Most-recent data:: 2026
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA Form 930 – Hourly and Daily Balancing Authority Operations Report
Primary key:: datetime_utc, balancing_authority_code_eia, generation_energy_source

Usage Warnings#

Contains known calculation discrepancies.

Additional Details#

BAs report generation from dual-fuel (switching from one fuel to another) and multiple-fuel (using multiple fuels simultaneously) generators under the actual energy source used, if known, and under the generator’s primary energy source, if not known.

To maintain generator confidentiality, generation may sometimes be reported in the Other category if too few generators are reported for a particular energy source category.

In theory, the sum of net generation across all energy sources should equal the total net generation reported in the balancing authority operations table. In practice, there are many cases in which these values diverge significantly, which require further investigation.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia930__hourly_net_generation_by_energy_source.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia930__hourly_net_generation_by_energy_source.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia930__hourly_net_generation_by_energy_source.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia930__hourly_net_generation_by_energy_source.parquet"
)

Columns#

Field Name	Type	Description
balancing_authority_code_eia	string	EIA short code identifying a balancing authority. May include Canadian and Mexican BAs.
datetime_utc	datetime	Timestamp at the end of the hour for which the data is reported.
generation_energy_source	string	High level energy source used to produce electricity.
net_generation_adjusted_mwh	number	Reported net generation adjusted by EIA to reflect non-physical commercial transfers through pseudo-ties and dynamic scheduling.
net_generation_imputed_eia_mwh	number	Reported net generation with outlying values removed and missing values imputed by EIA.
net_generation_reported_mwh	number	Unaltered originally reported net generation for the specified period.

core_eia930__hourly_operations#

Noticed any irregularities with this table? File a report.

Hourly time series of balancing authority net generation, interchange, and demand.

Most-recent data:: 2026
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA Form 930 – Hourly and Daily Balancing Authority Operations Report
Primary key:: datetime_utc, balancing_authority_code_eia

Additional Details#

Net generation represents the metered output of electric generating units in a BA’s electric system. This generation only includes generating units that are managed by a BA or whose operations are visible to a BA.

Generators on the distribution system—both large-scale resources and small-scale distributed resources, such as rooftop solar photovoltaic (PV) systems—are typically not included.

In some electricity publications, EIA reports generation from all utility-scale generating units in the United States. BAs only meter generating units that are from a subset of all utility-scale generating units. As a result, when hourly generation from the EIA-930 is aggregated to monthly or annual levels, the results will be lower than monthly and annual aggregations in other EIA electricity publications.

Interchange is the net metered tie line flow from one BA to another directly interconnected BA. This table includes the net sum of all interchange occurring between a BA and all of its directly interconnected neighboring BAs. For a detailed breakdown of interchange between each pair of adjacent BAs see core_eia930__hourly_interchange.

Negative interchange values indicate net inflows, and positive interchange values indicate net outflows.

Demand is a calculated value representing the amount of electricity load within a BA’s electric system. A BA derives its demand value by taking the total metered net electricity generation within its electric system and subtracting the total metered net electricity interchange occurring between the BA and its neighboring BAs.

Each BA produces a day-ahead electricity demand forecast for every hour of the next day. These forecasts help BAs plan for and coordinate the reliable operation of their electric system.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia930__hourly_operations.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia930__hourly_operations.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia930__hourly_operations.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia930__hourly_operations.parquet"
)

Columns#

Field Name	Type	Description
balancing_authority_code_eia	string	EIA short code identifying a balancing authority. May include Canadian and Mexican BAs.
datetime_utc	datetime	Timestamp at the end of the hour for which the data is reported.
demand_adjusted_mwh	number	Electricity demand adjusted by EIA to reflect non-physical commercial transfers through pseudo-ties and dynamic scheduling.
demand_forecast_mwh	number	Day ahead demand forecast.
demand_imputed_eia_mwh	number	Electricity demand calculated by subtracting BA interchange from net generation, with outliers and missing values imputed by EIA.
demand_reported_mwh	number	Originally reported electricity demand, calculated by taking the net generation within the BA and subtracting the interchange with adjacent BAs.
interchange_adjusted_mwh	number	Energy interchange between adjacent balancing authorities, adjusted by EIA to reflect non-physical commercial transfers through pseudo-ties and dynamic scheduling.
interchange_imputed_eia_mwh	number	Energy interchange between adjacent balancing authorities, with outliers and missing values imputed by EIA.
interchange_reported_mwh	number	Original reported energy interchange between adjacent balancing authorities.
net_generation_adjusted_mwh	number	Reported net generation adjusted by EIA to reflect non-physical commercial transfers through pseudo-ties and dynamic scheduling.
net_generation_imputed_eia_mwh	number	Reported net generation with outlying values removed and missing values imputed by EIA.
net_generation_reported_mwh	number	Unaltered originally reported net generation for the specified period.

core_eia930__hourly_subregion_demand#

Noticed any irregularities with this table? File a report.

Hourly time series of balancing authority subregion demand.

Most-recent data:: 2026
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA Form 930 – Hourly and Daily Balancing Authority Operations Report
Primary key:: datetime_utc, balancing_authority_code_eia, balancing_authority_subregion_code_eia

Additional Details#

For select BAs, balancing authority subregion demand provides a further level of geographic disaggregation (for example, load zones, weather zones, operating company areas, local BAs, etc.) within a BA’s electric system.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia930__hourly_subregion_demand.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia930__hourly_subregion_demand.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia930__hourly_subregion_demand.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia930__hourly_subregion_demand.parquet"
)

Columns#

Field Name	Type	Description
balancing_authority_code_eia	string	EIA short code identifying a balancing authority. May include Canadian and Mexican BAs.
balancing_authority_subregion_code_eia	string	Code identifying subregions of larger balancing authorities.
datetime_utc	datetime	Timestamp at the end of the hour for which the data is reported.
demand_reported_mwh	number	Originally reported electricity demand for the balancing area subregion. Note that different BAs have different methods of calculating and allocating subregion demand.

core_eia__codes_averaging_periods#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for the averaging period specified by emissions statutes and regulations for the EIA 860.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA – Mix of multiple EIA Forms
Primary key:: code

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_averaging_periods.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_averaging_periods.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_averaging_periods.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_averaging_periods.parquet"
)

Columns#

Field Name	Type	Description
code	string	Originally reported short code.
description	string	Long human-readable description of the meaning of a code/label.
label	string	Longer human-readable code using snake_case

core_eia__codes_balancing_authorities#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for balancing authorities in EIA 860, EIA 923, and EIA 930.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA – Mix of multiple EIA Forms
Primary key:: code

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_balancing_authorities.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_balancing_authorities.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_balancing_authorities.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_balancing_authorities.parquet"
)

Columns#

Field Name	Type	Description
balancing_authority_region_code_eia	string	EIA balancing authority region code.
balancing_authority_region_name_eia	string	Human-readable name of the EIA balancing region.
balancing_authority_retirement_date	date	Date on which the balancing authority ceased independent operation.
code	string	Originally reported short code.
description	string	Long human-readable description of the meaning of a code/label.
interconnect_code_eia	string	EIA interconnect code.
is_generation_only	boolean	Indicates whether the balancing authority is generation-only, meaning it does not serve retail customers and thus reports only net generation and interchange, but not demand.
label	string	Longer human-readable code using snake_case
report_timezone	string	Timezone used by the reporting entity. For use in localizing UTC times.

core_eia__codes_balancing_authority_subregions#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for balancing authority subregions in EIA 930.

Most-recent data:: 2026half1
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA – Mix of multiple EIA Forms
Primary key:: balancing_authority_code_eia, balancing_authority_subregion_code_eia

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_balancing_authority_subregions.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_balancing_authority_subregions.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_balancing_authority_subregions.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_balancing_authority_subregions.parquet"
)

Columns#

Field Name	Type	Description
balancing_authority_code_eia	string	EIA short code identifying a balancing authority. May include Canadian and Mexican BAs.
balancing_authority_subregion_code_eia	string	Code identifying subregions of larger balancing authorities.
balancing_authority_subregion_name_eia	string	Name of the balancing authority subregion.

core_eia__codes_boiler_generator_assn_types#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for boiler-generator associations in the EIA 860.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA – Mix of multiple EIA Forms
Primary key:: code

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_boiler_generator_assn_types.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_boiler_generator_assn_types.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_boiler_generator_assn_types.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_boiler_generator_assn_types.parquet"
)

Columns#

Field Name	Type	Description
code	string	Originally reported short code.
description	string	Long human-readable description of the meaning of a code/label.
label	string	Longer human-readable code using snake_case

core_eia__codes_boiler_status#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for boiler status in the EIA 860.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA – Mix of multiple EIA Forms
Primary key:: code

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_boiler_status.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_boiler_status.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_boiler_status.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_boiler_status.parquet"
)

Columns#

Field Name	Type	Description
code	string	Originally reported short code.
description	string	Long human-readable description of the meaning of a code/label.
label	string	Longer human-readable code using snake_case

core_eia__codes_boiler_types#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for boiler regulatory types in the EIA 860.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA – Mix of multiple EIA Forms
Primary key:: code

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_boiler_types.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_boiler_types.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_boiler_types.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_boiler_types.parquet"
)

Columns#

Field Name	Type	Description
code	string	Originally reported short code.
description	string	Long human-readable description of the meaning of a code/label.
label	string	Longer human-readable code using snake_case

core_eia__codes_coalmine_types#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for coalmines reported as fuel sources in the EIA-923.

Most-recent data:: 2025
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA – Mix of multiple EIA Forms
Primary key:: code

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_coalmine_types.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_coalmine_types.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_coalmine_types.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_coalmine_types.parquet"
)

Columns#

Field Name	Type	Description
code	string	Originally reported short code.
description	string	Long human-readable description of the meaning of a code/label.
label	string	Longer human-readable code using snake_case

core_eia__codes_contract_types#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for fuel supply contracts reported in EIA-923.

Most-recent data:: 2025
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA – Mix of multiple EIA Forms
Primary key:: code

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_contract_types.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_contract_types.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_contract_types.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_contract_types.parquet"
)

Columns#

Field Name	Type	Description
code	string	Originally reported short code.
description	string	Long human-readable description of the meaning of a code/label.
label	string	Longer human-readable code using snake_case

core_eia__codes_cooling_system_types#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for cooling systems.

Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA – Mix of multiple EIA Forms
Primary key:: code

Usage Warnings#

Contains information from multiple raw inputs.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_cooling_system_types.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_cooling_system_types.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_cooling_system_types.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_cooling_system_types.parquet"
)

Columns#

Field Name	Type	Description
code	string	Originally reported short code.
description	string	Long human-readable description of the meaning of a code/label.
label	string	Longer human-readable code using snake_case

core_eia__codes_cooling_tower_types#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for cooling towers.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA – Mix of multiple EIA Forms
Primary key:: code

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_cooling_tower_types.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_cooling_tower_types.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_cooling_tower_types.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_cooling_tower_types.parquet"
)

Columns#

Field Name	Type	Description
code	string	Originally reported short code.
description	string	Long human-readable description of the meaning of a code/label.
label	string	Longer human-readable code using snake_case

core_eia__codes_cooling_water_sources#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for cooling water sources in EIA-860.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA – Mix of multiple EIA Forms
Primary key:: code

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_cooling_water_sources.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_cooling_water_sources.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_cooling_water_sources.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_cooling_water_sources.parquet"
)

Columns#

Field Name	Type	Description
code	string	Originally reported short code.
description	string	Long human-readable description of the meaning of a code/label.
label	string	Longer human-readable code using snake_case

core_eia__codes_cooling_water_types#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for cooling water.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA – Mix of multiple EIA Forms
Primary key:: code

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_cooling_water_types.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_cooling_water_types.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_cooling_water_types.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_cooling_water_types.parquet"
)

Columns#

Field Name	Type	Description
code	string	Originally reported short code.
description	string	Long human-readable description of the meaning of a code/label.
label	string	Longer human-readable code using snake_case

core_eia__codes_emission_control_equipment_types#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for emissions control equipment installed on a boiler.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA – Mix of multiple EIA Forms
Primary key:: code

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_emission_control_equipment_types.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_emission_control_equipment_types.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_emission_control_equipment_types.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_emission_control_equipment_types.parquet"
)

Columns#

Field Name	Type	Description
code	string	Originally reported short code.
description	string	Long human-readable description of the meaning of a code/label.
label	string	Longer human-readable code using snake_case

core_eia__codes_energy_sources#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for energy sources reported to EIA.

Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA – Mix of multiple EIA Forms
Primary key:: code

Usage Warnings#

Contains information from multiple raw inputs.

Additional Details#

Compiled from EIA-860 instructions and EIA-923 file layout spreadsheets.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_energy_sources.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_energy_sources.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_energy_sources.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_energy_sources.parquet"
)

Columns#

Field Name	Type	Description
code	string	Originally reported short code.
description	string	Long human-readable description of the meaning of a code/label.
fuel_derived_from	string	Original fuel from which this refined fuel was derived.
fuel_group_eia	string	High level fuel group defined in the 2021-2023 EIA Form 860 instructions, Table 28.
fuel_phase	string	Physical phase of matter of the fuel.
fuel_type_code_pudl	string	Simplified fuel type code used in PUDL
fuel_units	string	Reported unit of measure for fuel.
label	string	Longer human-readable code using snake_case
max_fuel_mmbtu_per_unit	number	Maximum heat content per physical unit of fuel in MMBTU.
min_fuel_mmbtu_per_unit	number	Minimum heat content per physical unit of fuel in MMBTU.

core_eia__codes_environmental_equipment_manufacturers#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for manufacturers of boilers and environmental control equipment in the EIA-860.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA – Mix of multiple EIA Forms
Primary key:: code

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_environmental_equipment_manufacturers.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_environmental_equipment_manufacturers.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_environmental_equipment_manufacturers.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_environmental_equipment_manufacturers.parquet"
)

Columns#

Field Name	Type	Description
code	string	Originally reported short code.
description	string	Long human-readable description of the meaning of a code/label.
label	string	Longer human-readable code using snake_case

core_eia__codes_firing_types#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for boiler firing types in the EIA-860.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA – Mix of multiple EIA Forms
Primary key:: code

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_firing_types.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_firing_types.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_firing_types.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_firing_types.parquet"
)

Columns#

Field Name	Type	Description
code	string	Originally reported short code.
description	string	Long human-readable description of the meaning of a code/label.
label	string	Longer human-readable code using snake_case

core_eia__codes_fuel_transportation_modes#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for fuel transportation modes reported in the EIA-860 and EIA-923.

Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA – Mix of multiple EIA Forms
Primary key:: code

Usage Warnings#

Contains information from multiple raw inputs.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_fuel_transportation_modes.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_fuel_transportation_modes.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_fuel_transportation_modes.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_fuel_transportation_modes.parquet"
)

Columns#

Field Name	Type	Description
code	string	Originally reported short code.
description	string	Long human-readable description of the meaning of a code/label.
label	string	Longer human-readable code using snake_case

core_eia__codes_fuel_types_agg#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for aggregated fuel types used in the Annual Energy Review or Monthly Energy Review.

Most-recent data:: 2025
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA – Mix of multiple EIA Forms
Primary key:: code

Additional Details#

See EIA-923 Fuel Code table for additional information.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_fuel_types_agg.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_fuel_types_agg.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_fuel_types_agg.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_fuel_types_agg.parquet"
)

Columns#

Field Name	Type	Description
code	string	Originally reported short code.
description	string	Long human-readable description of the meaning of a code/label.

core_eia__codes_mercury_compliance_strategies#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for compliance strategies used to control mercury in the EIA-860.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA – Mix of multiple EIA Forms
Primary key:: code

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_mercury_compliance_strategies.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_mercury_compliance_strategies.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_mercury_compliance_strategies.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_mercury_compliance_strategies.parquet"
)

Columns#

Field Name	Type	Description
code	string	Originally reported short code.
description	string	Long human-readable description of the meaning of a code/label.
label	string	Longer human-readable code using snake_case

core_eia__codes_momentary_interruptions#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for utility definitions of momentary service interruptions.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA – Mix of multiple EIA Forms
Primary key:: code

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_momentary_interruptions.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_momentary_interruptions.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_momentary_interruptions.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_momentary_interruptions.parquet"
)

Columns#

Field Name	Type	Description
code	string	Originally reported short code.
description	string	Long human-readable description of the meaning of a code/label.
label	string	Longer human-readable code using snake_case

core_eia__codes_nox_compliance_strategies#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for compliance strategies used to control nitrogen oxide in the EIA-860.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA – Mix of multiple EIA Forms
Primary key:: code

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_nox_compliance_strategies.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_nox_compliance_strategies.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_nox_compliance_strategies.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_nox_compliance_strategies.parquet"
)

Columns#

Field Name	Type	Description
code	string	Originally reported short code.
description	string	Long human-readable description of the meaning of a code/label.
label	string	Longer human-readable code using snake_case

core_eia__codes_nox_control_status#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for the operational status of nitrogen oxide control units associated with boilers in the EIA-860 data.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA – Mix of multiple EIA Forms
Primary key:: code

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_nox_control_status.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_nox_control_status.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_nox_control_status.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_nox_control_status.parquet"
)

Columns#

Field Name	Type	Description
code	string	Originally reported short code.
description	string	Long human-readable description of the meaning of a code/label.
label	string	Longer human-readable code using snake_case

core_eia__codes_nox_units#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for units of measurement for nitrogen oxide in the EIA-860.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA – Mix of multiple EIA Forms
Primary key:: code

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_nox_units.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_nox_units.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_nox_units.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_nox_units.parquet"
)

Columns#

Field Name	Type	Description
code	string	Originally reported short code.
description	string	Long human-readable description of the meaning of a code/label.
label	string	Longer human-readable code using snake_case

core_eia__codes_operational_status#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for operational status reported to EIA.

Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA – Mix of multiple EIA Forms
Primary key:: code

Usage Warnings#

Contains information from multiple raw inputs.

Additional Details#

Compiled from EIA-860 instructions and EIA-923 file layout spreadsheets.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_operational_status.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_operational_status.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_operational_status.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_operational_status.parquet"
)

Columns#

Field Name	Type	Description
code	string	Originally reported short code.
description	string	Long human-readable description of the meaning of a code/label.
label	string	Longer human-readable code using snake_case
operational_status	string	The operating status of the asset. For generators this is based on which tab the generator was listed in in EIA 860.

core_eia__codes_particulate_compliance_strategies#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for compliance strategies used to control particulate matter in the EIA-860.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA – Mix of multiple EIA Forms
Primary key:: code

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_particulate_compliance_strategies.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_particulate_compliance_strategies.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_particulate_compliance_strategies.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_particulate_compliance_strategies.parquet"
)

Columns#

Field Name	Type	Description
code	string	Originally reported short code.
description	string	Long human-readable description of the meaning of a code/label.
label	string	Longer human-readable code using snake_case

core_eia__codes_particulate_units#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for units of measurement for particulate matter in the EIA-860.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA – Mix of multiple EIA Forms
Primary key:: code

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_particulate_units.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_particulate_units.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_particulate_units.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_particulate_units.parquet"
)

Columns#

Field Name	Type	Description
code	string	Originally reported short code.
description	string	Long human-readable description of the meaning of a code/label.
label	string	Longer human-readable code using snake_case

core_eia__codes_prime_movers#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for prime movers reported in the EIA-860 and EIA-923.

Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA – Mix of multiple EIA Forms
Primary key:: code

Usage Warnings#

Contains information from multiple raw inputs.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_prime_movers.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_prime_movers.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_prime_movers.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_prime_movers.parquet"
)

Columns#

Field Name	Type	Description
code	string	Originally reported short code.
description	string	Long human-readable description of the meaning of a code/label.
label	string	Longer human-readable code using snake_case

core_eia__codes_regulations#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for levels of statutes and codes under which boilers operate in the EIA-860.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA – Mix of multiple EIA Forms
Primary key:: code

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_regulations.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_regulations.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_regulations.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_regulations.parquet"
)

Columns#

Field Name	Type	Description
code	string	Originally reported short code.
description	string	Long human-readable description of the meaning of a code/label.
label	string	Longer human-readable code using snake_case

core_eia__codes_reporting_frequencies#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for the reporting frequencies used by plants in the EIA-923.

Most-recent data:: 2025
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA – Mix of multiple EIA Forms
Primary key:: code

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_reporting_frequencies.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_reporting_frequencies.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_reporting_frequencies.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_reporting_frequencies.parquet"
)

Columns#

Field Name	Type	Description
code	string	Originally reported short code.
description	string	Long human-readable description of the meaning of a code/label.
label	string	Longer human-readable code using snake_case

core_eia__codes_sector_consolidated#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for EIA consolidated NAICS sectors.

Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA – Mix of multiple EIA Forms
Primary key:: code

Usage Warnings#

Contains information from multiple raw inputs.

Additional Details#

Codes and descriptions taken from the EIA-923 File Layout spreadsheet.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_sector_consolidated.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_sector_consolidated.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_sector_consolidated.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_sector_consolidated.parquet"
)

Columns#

Field Name	Type	Description
code	string	Originally reported short code.
description	string	Long human-readable description of the meaning of a code/label.
label	string	Longer human-readable code using snake_case

core_eia__codes_so2_compliance_strategies#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for compliance strategies used to control sulfur dioxide in the EIA-860.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA – Mix of multiple EIA Forms
Primary key:: code

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_so2_compliance_strategies.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_so2_compliance_strategies.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_so2_compliance_strategies.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_so2_compliance_strategies.parquet"
)

Columns#

Field Name	Type	Description
code	string	Originally reported short code.
description	string	Long human-readable description of the meaning of a code/label.
label	string	Longer human-readable code using snake_case

core_eia__codes_so2_units#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for units of measurement for sulfur dioxide in the EIA-860.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA – Mix of multiple EIA Forms
Primary key:: code

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_so2_units.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_so2_units.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_so2_units.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_so2_units.parquet"
)

Columns#

Field Name	Type	Description
code	string	Originally reported short code.
description	string	Long human-readable description of the meaning of a code/label.
label	string	Longer human-readable code using snake_case

core_eia__codes_sorbent_types#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for flue gas desulfurization sorbent types used in EIA-860.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA – Mix of multiple EIA Forms
Primary key:: code

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_sorbent_types.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_sorbent_types.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_sorbent_types.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_sorbent_types.parquet"
)

Columns#

Field Name	Type	Description
code	string	Originally reported short code.
description	string	Long human-readable description of the meaning of a code/label.
label	string	Longer human-readable code using snake_case

core_eia__codes_steam_plant_types#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for steam plants in the EIA-860.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA – Mix of multiple EIA Forms
Primary key:: code

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_steam_plant_types.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_steam_plant_types.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_steam_plant_types.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_steam_plant_types.parquet"
)

Columns#

Field Name	Type	Description
code	string	Originally reported short code.
description	string	Long human-readable description of the meaning of a code/label.
label	string	Longer human-readable code using snake_case

core_eia__codes_storage_enclosure_types#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for energy storage enclosures.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA – Mix of multiple EIA Forms
Primary key:: code

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_storage_enclosure_types.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_storage_enclosure_types.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_storage_enclosure_types.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_storage_enclosure_types.parquet"
)

Columns#

Field Name	Type	Description
code	string	Originally reported short code.
description	string	Long human-readable description of the meaning of a code/label.
label	string	Longer human-readable code using snake_case

core_eia__codes_storage_technology_types#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for energy storage technologies.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA – Mix of multiple EIA Forms
Primary key:: code

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_storage_technology_types.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_storage_technology_types.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_storage_technology_types.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_storage_technology_types.parquet"
)

Columns#

Field Name	Type	Description
code	string	Originally reported short code.
description	string	Long human-readable description of the meaning of a code/label.
label	string	Longer human-readable code using snake_case

core_eia__codes_wet_dry_bottom#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for boiler bottoms in the EIA-860.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA – Mix of multiple EIA Forms
Primary key:: code

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_wet_dry_bottom.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_wet_dry_bottom.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_wet_dry_bottom.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_wet_dry_bottom.parquet"
)

Columns#

Field Name	Type	Description
code	string	Originally reported short code.
description	string	Long human-readable description of the meaning of a code/label.
label	string	Longer human-readable code using snake_case

core_eia__codes_wind_quality_class#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for wind quality classes.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA – Mix of multiple EIA Forms
Primary key:: code

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_wind_quality_class.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_wind_quality_class.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_wind_quality_class.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_wind_quality_class.parquet"
)

Columns#

Field Name	Type	Description
code	string	Originally reported short code.
description	string	Long human-readable description of the meaning of a code/label.
extreme_fifty_year_gust_ms	number	The extreme 50-year wind gusts at this generator in meters per hour.
label	string	Longer human-readable code using snake_case
turbulence_intensity_a	number	The upper bounds of the turbulence intensity at the wind site (ratio of standard deviation of fluctuating wind velocity to the mean wind speed).
turbulence_intensity_b	number	The lower bounds of the turbulence intensity at the wind site (ratio of standard deviation of fluctuating wind velocity to the mean wind speed).
wind_speed_avg_ms	number	Average wind speed in meters per second.

core_eia__entity_boilers#

Noticed any irregularities with this table? File a report.

Entity table containing static information about boilers compiled from the EIA-860 and EIA-923.

Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA – Mix of multiple EIA Forms
Primary key:: plant_id_eia, boiler_id

Usage Warnings#

Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.
Contains information from multiple raw inputs.

Additional Details#

This is one of two tables where canonical values for boilers are set. It contains values which are expected to remain fixed, while core_eia860__scd_boilers contains those which may vary from year to year. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process. All tables downstream of this one inherit the canonical values established here.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__entity_boilers.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__entity_boilers.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__entity_boilers.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__entity_boilers.parquet"
)

Columns#

Field Name	Type	Description
boiler_id	string	Alphanumeric boiler ID.
boiler_manufacturer	string	Name of boiler manufacturer.
boiler_manufacturer_code	string	EIA short code for boiler manufacturer.
plant_id_eia	integer	The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.

core_eia__entity_generators#

Noticed any irregularities with this table? File a report.

Entity table containing static information about generators compiled from across the EIA-860 and EIA-923.

Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA – Mix of multiple EIA Forms
Primary key:: plant_id_eia, generator_id

Usage Warnings#

Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.
Contains information from multiple raw inputs.

Additional Details#

This is one of two tables where canonical values for generators are set. It contains values which are expected to remain fixed, while core_eia860__scd_generators contains those which may vary from year to year. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process. All tables downstream of this one inherit the canonical values established here.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__entity_generators.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__entity_generators.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__entity_generators.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__entity_generators.parquet"
)

Columns#

Field Name	Type	Description
associated_combined_heat_power	boolean	Indicates whether the generator is associated with a combined heat and power system
bypass_heat_recovery	boolean	Can this generator operate while bypassing the heat recovery steam generator?
can_switch_when_operating	boolean	Indicates whether a fuel switching generator can switch fuels while operating.
duct_burners	boolean	Indicates whether the unit has duct-burners for supplementary firing of the turbine exhaust gas
fluidized_bed_tech	boolean	Indicates whether the generator uses fluidized bed technology
generator_id	string	Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string!
generator_operating_date	date	Date the generator began commercial operation. If harvested values are inconsistent, we default to using the most recently reported date.
original_planned_generator_operating_date	date	The date the generator was originally scheduled to be operational
other_combustion_tech	boolean	Indicates whether the generator uses other combustion technologies
plant_id_eia	integer	The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.
previously_canceled	boolean	Indicates whether the generator was previously reported as indefinitely postponed or canceled
pulverized_coal_tech	boolean	Indicates whether the generator uses pulverized coal technology
rto_iso_lmp_node_id	string	The designation used to identify the price node in RTO/ISO Locational Marginal Price reports
rto_iso_location_wholesale_reporting_id	string	The designation used to report the specific location of the wholesale sales transactions to FERC for the Electric Quarterly Report
solid_fuel_gasification	boolean	Indicates whether the generator is part of a solid fuel gasification system
stoker_tech	boolean	Indicates whether the generator uses stoker technology
subcritical_tech	boolean	Indicates whether the generator uses subcritical technology
supercritical_tech	boolean	Indicates whether the generator uses supercritical technology
topping_bottoming_code	string	If the generator is associated with a combined heat and power system, indicates whether the generator is part of a topping cycle or a bottoming cycle
ultrasupercritical_tech	boolean	Indicates whether the generator uses ultra-supercritical technology

core_eia__entity_plants#

Noticed any irregularities with this table? File a report.

Entity table containing static information about plants, compiled from across all EIA-860 and EIA-923 data.

Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA – Mix of multiple EIA Forms
Primary key:: plant_id_eia

Usage Warnings#

Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.
Contains information from multiple raw inputs.

Additional Details#

This is one of two tables where canonical values for plants are set. It contains values which are expected to remain fixed, while core_eia860__scd_plants contains those which may vary from year to year. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process. All tables downstream of this one inherit the canonical values established here.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__entity_plants.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__entity_plants.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__entity_plants.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__entity_plants.parquet"
)

Columns#

Field Name	Type	Description
city	string	Name of the city.
county	string	County name.
latitude	number	Latitude of the plant’s location, in degrees.
longitude	number	Longitude of the plant’s location, in degrees.
plant_id_eia	integer	The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.
plant_name_eia	string	Plant name.
state	string	Two letter US state abbreviation.
street_address	string	Physical street address.
timezone	string	IANA timezone name
zip_code	string	Five digit US Zip Code.

core_eia__entity_utilities#

Noticed any irregularities with this table? File a report.

Entity table containing static information about utilities, compiled from all EIA data.

Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA – Mix of multiple EIA Forms
Primary key:: utility_id_eia

Usage Warnings#

Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.
Contains information from multiple raw inputs.

Additional Details#

This is one of two tables where canonical values for utilities are set. It contains values which are expected to remain fixed, while core_eia860__scd_utilities contains those which may vary from year to year. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process. All tables downstream of this one inherit the canonical values established here.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__entity_utilities.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__entity_utilities.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__entity_utilities.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__entity_utilities.parquet"
)

Columns#

Field Name	Type	Description
utility_id_eia	integer	The EIA Utility Identification number.
utility_name_eia	string	The name of the utility.

core_eia__yearly_fuel_receipts_costs_aggs#

Noticed any irregularities with this table? File a report.

Annual time series of aggregated fuel receipts and costs from the EIA API.

Most-recent data:: 2026
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA – Mix of multiple EIA Forms
Primary key:: fuel_agg, geo_agg, sector_agg, temporal_agg, report_date

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__yearly_fuel_receipts_costs_aggs.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__yearly_fuel_receipts_costs_aggs.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__yearly_fuel_receipts_costs_aggs.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__yearly_fuel_receipts_costs_aggs.parquet"
)

Columns#

Field Name	Type	Description
fuel_agg	string	Category of fuel aggregation in EIA bulk electricity data.
fuel_cost_per_mmbtu	number	Average fuel cost per MMBTU of heat content in nominal USD.
fuel_received_mmbtu	number	Aggregated fuel receipts, in MMBTU, in EIA bulk electricity data.
geo_agg	string	Category of geographic aggregation in EIA bulk electricity data.
report_date	date	Date reported.
sector_agg	string	Category of sectoral aggregation in EIA bulk electricity data.
temporal_agg	string	Category of temporal aggregation in EIA bulk electricity data.

core_eiaaeo__yearly_projected_electric_sales#

Noticed any irregularities with this table? File a report.

Annual time series of projected electricity sales by region and customer class.

Most-recent data:: 2025
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA Annual Energy Outlook (AEO)
Primary key:: report_year, electricity_market_module_region_eiaaeo, model_case_eiaaeo, projection_year, customer_class

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eiaaeo__yearly_projected_electric_sales.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eiaaeo__yearly_projected_electric_sales.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eiaaeo__yearly_projected_electric_sales.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eiaaeo__yearly_projected_electric_sales.parquet"
)

Columns#

Field Name	Type	Description
customer_class	string	High level categorization of customer type (e.g., commercial, residential).
electricity_market_module_region_eiaaeo	string	AEO projection region.
model_case_eiaaeo	string	Factors such as economic growth, future oil prices, the ultimate size of domestic energy resources, and technological change are often uncertain. To illustrate some of these uncertainties, EIA runs side cases to show how the model responds to changes in key input variables compared with the Reference case. See https://www.eia.gov/outlooks/aeo/assumptions/case_descriptions.php for more details.
projection_year	integer	The year of the projected value.
report_year	integer	Four-digit year in which the data was reported.
sales_mwh	number	Quantity of electricity sold in MWh.

core_eiaaeo__yearly_projected_energy_use_by_sector_and_type#

Noticed any irregularities with this table? File a report.

Annual time series of projected energy use for commercial, electric power, industrial, residential, and transportation sectors, drawn from AEO Table 2.

Most-recent data:: 2025
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA Annual Energy Outlook (AEO)
Primary key:: report_year, region_name_eiaaeo, model_case_eiaaeo, projection_year, energy_use_sector, energy_use_type

Usage Warnings#

Some columns contain subtotals; use caution when choosing columns to aggregate.

Additional Details#

The series in Table 2 which track energy use by sector do not always define each type of use the same way across sectors. There is detailed information about what is included or excluded in each use type for each sector in the footnotes of the EIA’s online AEO data browser:

https://www.eia.gov/outlooks/aeo/data/browser/#/?id=2-AEO2023

Use caution when aggregating across use types! Energy Use has a tricky system of subtotals, and summing all types within a sector will result in double-counting. Consult the EIA’s data browser for visibility into which use types are subtotals, and what they contain: subtotal series are displayed indented, and include all lines above them which are one level out, up to the next indented line. Delivered Energy and Total are special cases which include those plus all subtotals above. In this way, “Delivered Energy” includes purchased electricity, renewable energy, and an array of fuels based on sector, and explicitly excludes electricity-related losses.

AEO Energy Use figures are variously referred to as delivered energy, energy consumption, energy use, and energy demand, depending on which use types are being discussed, and which org and which document is describing them. In PUDL we say energy use or energy consumption.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eiaaeo__yearly_projected_energy_use_by_sector_and_type.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eiaaeo__yearly_projected_energy_use_by_sector_and_type.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eiaaeo__yearly_projected_energy_use_by_sector_and_type.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eiaaeo__yearly_projected_energy_use_by_sector_and_type.parquet"
)

Columns#

Field Name	Type	Description
energy_use_mmbtu	number	Energy use, in MMBTU; also referred to as energy consumption, energy demand, or delivered energy, depending on type.
energy_use_sector	string	Sector for energy use figures in AEO Table 2. Similar to customer class, but with some missing and some extra values.
energy_use_type	string	Type of energy use, indicating the name of the series from AEO Table 2. Includes fuels, electricity, losses, and various subtotals; consult table documentation for aggregation guidelines.
model_case_eiaaeo	string	Factors such as economic growth, future oil prices, the ultimate size of domestic energy resources, and technological change are often uncertain. To illustrate some of these uncertainties, EIA runs side cases to show how the model responds to changes in key input variables compared with the Reference case. See https://www.eia.gov/outlooks/aeo/assumptions/case_descriptions.php for more details.
projection_year	integer	The year of the projected value.
region_name_eiaaeo	string	EIA AEO region for energy consumption. Includes US Census Divisions plus United States.
region_type_eiaaeo	string	Region type for EIA AEO energy consumption, indicating whether region_name_eiaaeo is a US Census Division or country (United States)
report_year	integer	Four-digit year in which the data was reported.

core_eiaaeo__yearly_projected_fuel_cost_in_electric_sector_by_type#

Noticed any irregularities with this table? File a report.

Annual time series of projected fuel prices for the electric power sector, including electricity-only and combined-heat-and-power plants that have a regulatory status.

Most-recent data:: 2025
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA Annual Energy Outlook (AEO)
Primary key:: report_year, electricity_market_module_region_eiaaeo, model_case_eiaaeo, projection_year, fuel_type_eiaaeo

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eiaaeo__yearly_projected_fuel_cost_in_electric_sector_by_type.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eiaaeo__yearly_projected_fuel_cost_in_electric_sector_by_type.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eiaaeo__yearly_projected_fuel_cost_in_electric_sector_by_type.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eiaaeo__yearly_projected_fuel_cost_in_electric_sector_by_type.parquet"
)

Columns#

Field Name	Type	Description
electricity_market_module_region_eiaaeo	string	AEO projection region.
fuel_cost_per_mmbtu	number	Average fuel cost per MMBTU of heat content in nominal USD.
fuel_cost_real_per_mmbtu_eiaaeo	number	Average fuel cost per MMBTU of heat content in real USD, standardized to the value of a USD in the year defined by `real_cost_basis_year`.
fuel_type_eiaaeo	string	Fuel type reported for AEO end-use sector generation data.
model_case_eiaaeo	string	Factors such as economic growth, future oil prices, the ultimate size of domestic energy resources, and technological change are often uncertain. To illustrate some of these uncertainties, EIA runs side cases to show how the model responds to changes in key input variables compared with the Reference case. See https://www.eia.gov/outlooks/aeo/assumptions/case_descriptions.php for more details.
projection_year	integer	The year of the projected value.
real_cost_basis_year	integer	Four-digit year which is the basis for any ‘real cost’ monetary values (as opposed to nominal values).
report_year	integer	Four-digit year in which the data was reported.

core_eiaaeo__yearly_projected_generation_in_electric_sector_by_technology#

Noticed any irregularities with this table? File a report.

Annual time series of projected generation capacity & total generation in the electric sector, broken out by technology.

Most-recent data:: 2025
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA Annual Energy Outlook (AEO)
Primary key:: report_year, electricity_market_module_region_eiaaeo, model_case_eiaaeo, projection_year, technology_description_eiaaeo

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eiaaeo__yearly_projected_generation_in_electric_sector_by_technology.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eiaaeo__yearly_projected_generation_in_electric_sector_by_technology.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eiaaeo__yearly_projected_generation_in_electric_sector_by_technology.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eiaaeo__yearly_projected_generation_in_electric_sector_by_technology.parquet"
)

Columns#

Field Name	Type	Description
electricity_market_module_region_eiaaeo	string	AEO projection region.
gross_generation_mwh	number	Gross electricity generation for the specified period in megawatt-hours (MWh).
model_case_eiaaeo	string	Factors such as economic growth, future oil prices, the ultimate size of domestic energy resources, and technological change are often uncertain. To illustrate some of these uncertainties, EIA runs side cases to show how the model responds to changes in key input variables compared with the Reference case. See https://www.eia.gov/outlooks/aeo/assumptions/case_descriptions.php for more details.
projection_year	integer	The year of the projected value.
report_year	integer	Four-digit year in which the data was reported.
summer_capacity_mw	number	The net summer capacity.
summer_capacity_planned_additions_mw	number	The total planned additions to net summer generating capacity.
summer_capacity_retirements_mw	number	The total retirements from net summer generating capacity.
summer_capacity_unplanned_additions_mw	number	The total unplanned additions to net summer generating capacity.
technology_description_eiaaeo	string	Generation technology reported for AEO.

core_eiaaeo__yearly_projected_generation_in_end_use_sectors_by_fuel_type#

Noticed any irregularities with this table? File a report.

Annual time series of projected generation capacity and total generation in the end-use sector, broken out by fuel type.

Most-recent data:: 2025
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EIA Annual Energy Outlook (AEO)
Primary key:: report_year, electricity_market_module_region_eiaaeo, model_case_eiaaeo, projection_year, fuel_type_eiaaeo

Usage Warnings#

The solar photovoltaic capacity portion of Renewable Sources is reported in megawatts direct current; other technologies are reported in megawatts of alternating current.

Additional Details#

Includes combined-heat-and-power plants and electricity-only plants in the commercial and industrial sectors; and small on-site generating systems in the residential, commercial, and industrial sectors used primarily for own-use generation, but which may also sell some power to the grid.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eiaaeo__yearly_projected_generation_in_end_use_sectors_by_fuel_type.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eiaaeo__yearly_projected_generation_in_end_use_sectors_by_fuel_type.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eiaaeo__yearly_projected_generation_in_end_use_sectors_by_fuel_type.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eiaaeo__yearly_projected_generation_in_end_use_sectors_by_fuel_type.parquet"
)

Columns#

Field Name	Type	Description
electricity_market_module_region_eiaaeo	string	AEO projection region.
fuel_type_eiaaeo	string	Fuel type reported for AEO end-use sector generation data.
gross_generation_mwh	number	Gross electricity generation for the specified period in megawatt-hours (MWh).
model_case_eiaaeo	string	Factors such as economic growth, future oil prices, the ultimate size of domestic energy resources, and technological change are often uncertain. To illustrate some of these uncertainties, EIA runs side cases to show how the model responds to changes in key input variables compared with the Reference case. See https://www.eia.gov/outlooks/aeo/assumptions/case_descriptions.php for more details.
projection_year	integer	The year of the projected value.
report_year	integer	Four-digit year in which the data was reported.
summer_capacity_mw	number	The net summer capacity.

core_epa__assn_eia_epacamd#

Noticed any irregularities with this table? File a report.

Association table providing connections between EPA units and EIA plants, boilers, and generators.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EPA – Mix of multiple EPA sources
Primary key:: This table has no primary key.

Additional Details#

This crosswalk table comes from the PUDL fork of the EPA camd-eia-crosswalk Github repo: https://github.com/catalyst-cooperative/camd-eia-crosswalk-latest.

The camd-eia-crosswalk README and our Data Source documentation page on EPA Hourly Continuous Emission Monitoring System (CEMS) depict the complicated relationship between EIA and EPA data, specifically the nature of EPA vs. EIA “units” and the level of granularity that one can connect the two sources.

The original EPA crosswalk runs on 2018 EIA data. We adapted the crosswalk code to run on each new year of EIA data, capturing changes in plant information over time.

Our version of the crosswalk clarifies some of the column names and removes unmatched rows. The pudl.etl.glue_assets.core_epa__assn_eia_epacamd() function doc strings explain what changes are made from the EPA’s version.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_epa__assn_eia_epacamd.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_epa__assn_eia_epacamd.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_epa__assn_eia_epacamd.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_epa__assn_eia_epacamd.parquet"
)

Columns#

Field Name	Type	Description
boiler_id	string	Alphanumeric boiler ID.
emissions_unit_id_epa	string	Emissions (smokestack) unit monitored by EPA CEMS.
generator_id	string	Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string!
generator_id_epa	string	Generator ID used by the EPA.
plant_id_eia	integer	The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.
plant_id_epa	integer	The ORISPL ID used by EPA to refer to the plant. Usually but not always the same as plant_id_eia.
report_year	integer	Four-digit year in which the data was reported.

core_epa__assn_eia_epacamd_subplant_ids#

Noticed any irregularities with this table? File a report.

Association table providing connections between EPA units and EIA units/generators, at the subplant level.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EPA – Mix of multiple EPA sources
Primary key:: This table has no primary key. The primary keys would have been: plant_id_eia, generator_id, subplant_id and emissions_unit_id_epa, but there are some null records in the generator_id column. ~2 percent of all EPA CAMD records are not successfully mapped to EIA generators.

Additional Details#

This table is an augmented version of the core_epa__assn_eia_epacamd crosswalk table which initially comes from the EPA’s Github repo camd-eia-crosswalk: https://github.com/USEPA/camd-eia-crosswalk.

This table identifies subplants within plant_ids, which are the smallest coherent units for aggregation. A plant_id refers to a legal entity that often contains multiple distinct power plants, even of different technology or fuel types.

EPA CEMS data combines information from several parts of a power plant:

emissions from smokestacks
fuel use from combustors
electricity production from generators

But smokestacks, combustors, and generators can be connected in complex, many-to-many relationships. This complexity makes attribution difficult for, as an example, allocating pollution to energy producers. Furthermore, heterogeneity within plant_ids make aggregation to the parent entity difficult or inappropriate.

This table inherits from the EPA’s crosswalk, the IDs from EPA CAMD core_epacems__hourly_emissions table itself, the core_eia860__assn_boiler_generator table and the core_eia860__scd_generators table. While the core_epa__assn_eia_epacamd table is the core backbone of the table, EPA CAMD IDs ensure there is complete coverage of EPA CAMD reporting units. The EIA 860 table addition ensures there is also complete coverage of those units as well.

For more information about the how this subplant_id is made, see the documentation for pudl.etl.glue_assets.make_subplant_ids() and pudl.etl.glue_assets.update_subplant_ids().

But by analyzing the relationships between combustors and generators, as provided in the core_epa__assn_eia_epacamd crosswalk, we can identify distinct power plants. These are the smallest coherent units of aggregation.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_epa__assn_eia_epacamd_subplant_ids.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_epa__assn_eia_epacamd_subplant_ids.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_epa__assn_eia_epacamd_subplant_ids.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_epa__assn_eia_epacamd_subplant_ids.parquet"
)

Columns#

Field Name	Type	Description
emissions_unit_id_epa	string	Emissions (smokestack) unit monitored by EPA CEMS.
generator_id	string	Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string!
plant_id_eia	integer	The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.
plant_id_epa	integer	The ORISPL ID used by EPA to refer to the plant. Usually but not always the same as plant_id_eia.
subplant_id	integer	Sub-plant ID links EPA CEMS emissions units to EIA units.
unit_id_pudl	integer	Dynamically assigned PUDL unit id. WARNING: This ID is not guaranteed to be static long term as the input data and algorithm may evolve over time.

core_epacems__hourly_emissions#

Noticed any irregularities with this table? File a report.

Hourly time series of emissions and plant operational data as required by 40 CFR Part 75.

Most-recent data:: 2025
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: EPA Hourly Continuous Emission Monitoring System (CEMS)
Primary key:: plant_id_epa, emissions_unit_id_epa, operating_datetime_utc

Usage Warnings#

Large table; do not attempt to open with Excel.
Not all IDs are present.
Contains information from multiple raw inputs.

Additional Details#

Continuous Emissions Monitoring Systems (CEMS) are used to determine the rate of gas or particulate matter exiting a point source of emissions. The EPA Clean Air Markets Division (CAMD) has collected data on power plant emissions from CEMS units stretching back to 1995. The CEMS dataset includes hourly gross load, SO2, CO2, and NOx emissions associated with a given emissions-unit during startup, shutdown, and instances of malfunction. An EPA CEMS emissions-unit or smokestack unit is not the same as an EIA unit. See core_epa__assn_eia_epacamd for details about how to connect CEMS data to corresponding EIA units.

Only fossil-combustion units over 25 MW are required to install and use CEMS, so there are some units that do report in EIA-860 or EIA-923 that do not have any CEMS data.

Access methods#

Warning

This table will likely not fit in your computer’s memory. Make sure to filter or aggregate to reduce the amount of data read in. This is easier in polars and duckdb than in pandas, so we only provide examples for these.

polars

import polars as pl
df = (
    # Use scan_parquet (lazy evaluation) to plan the query before execution
    pl.scan_parquet(
        "s3://pudl.catalyst.coop/nightly/core_epacems__hourly_emissions.parquet",
        storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
    )
    # Select only a subset of columns to reduce network and memory usage
    .select(
        [
            "state",
            "plant_id_eia",
            "plant_id_epa",
            "operating_datetime_utc",
            "operating_time_hours",
            "emissions_unit_id_epa",
            "gross_load_mw",
            "heat_content_mmbtu",
            "co2_mass_tons",
        ]
    )
    .filter(
        # Select data for only a single plant to reduce network and memory usage
        (pl.col("plant_id_eia") == 470)
        # The row groups in this Parquet file are partitioned by state and year,
        # so indicating the state allows the query to skip irrelevant groups.
        & (pl.col("state") == "CO")
    )
    # Now execute the planned query
    .collect()
)

SQL (DuckDB)

SELECT
    -- Select only a subset of columns to reduce network and memory usage
    state,
    plant_id_eia,
    plant_id_epa,
    operating_datetime_utc,
    operating_time_hours,
    emissions_unit_id_epa,
    gross_load_mw,
    heat_content_mmbtu,
    co2_mass_tons
FROM 's3://pudl.catalyst.coop/nightly/core_epacems__hourly_emissions.parquet'
WHERE
    -- Select data for only a single plant to reduce network and memory usage
    plant_id_eia = 470
    -- The row groups in this Parquet file are partitioned by state and year,
    -- so indicating the state allows the query to skip irrelevant groups.
    AND state = 'CO';

Columns#

Field Name	Type	Description
co2_mass_measurement_code	string	Identifies whether the reported value of emissions was measured, calculated, or measured and substitute.
co2_mass_tons	number	Carbon dioxide emissions in short tons.
emissions_unit_id_epa	string	Emissions (smokestack) unit monitored by EPA CEMS.
gross_load_mw	number	Average power in megawatts delivered during time interval measured.
heat_content_mmbtu	number	The energy contained in fuel burned, measured in million BTU.
nox_mass_lbs	number	NOx emissions in pounds.
nox_mass_measurement_code	string	Identifies whether the reported value of emissions was measured, calculated, or measured and substitute.
operating_datetime_utc	datetime	Date and time measurement began (UTC).
operating_time_hours	number	Length of time interval measured.
plant_id_eia	integer	The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.
plant_id_epa	integer	The ORISPL ID used by EPA to refer to the plant. Usually but not always the same as plant_id_eia.
so2_mass_lbs	number	Sulfur dioxide emissions in pounds.
so2_mass_measurement_code	string	Identifies whether the reported value of emissions was measured, calculated, or measured and substitute.
state	string	Two letter US state abbreviation.
steam_load_lbs	number	Total steam pressure produced by a unit during the reported hour.
year	integer	Year the data was reported in, used for partitioning EPA CEMS.

core_ferc1__codes_power_purchase_types#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for electric power purchase types.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: FERC Form 1 – Annual Report of Major Electric Utilities
Primary key:: code

Usage Warnings#

FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__codes_power_purchase_types.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__codes_power_purchase_types.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc1__codes_power_purchase_types.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__codes_power_purchase_types.parquet"
)

Columns#

Field Name	Type	Description
code	string	Originally reported short code.
description	string	Long human-readable description of the meaning of a code/label.
label	string	Longer human-readable code using snake_case

core_ferc1__yearly_balance_sheet_assets_sched110#

Noticed any irregularities with this table? File a report.

Annual time series of utility assets and other debits.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 110)
Primary key:: utility_id_ferc1, report_year, asset_type

Usage Warnings#

Some columns contain subtotals; use caution when choosing columns to aggregate.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_balance_sheet_assets_sched110.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_balance_sheet_assets_sched110.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc1__yearly_balance_sheet_assets_sched110.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_balance_sheet_assets_sched110.parquet"
)

Columns#

Field Name	Type	Description
asset_type	string	Type of asset being reported to the core_ferc1__yearly_balance_sheet_assets_sched110 table.
balance	string	Indication of whether a column is a credit or debit, as reported in the XBRL taxonomy.
ending_balance	number	Account balance at end of year.
ferc_account	string	Actual FERC Account number (e.g. ‘359.1’) if available, or a PUDL assigned ID when FERC accounts have been split or combined in reporting.
record_id	string	Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.
report_year	integer	Four-digit year in which the data was reported.
row_type_xbrl	string	Indicates whether the value reported in the row is calculated, or uniquely reported within the table.
starting_balance	number	Account balance at beginning of year.
utility_id_ferc1	integer	PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.
utility_type	string	Listing of utility plant types. Examples include Electric Utility, Gas Utility, and Other Utility.

core_ferc1__yearly_balance_sheet_liabilities_sched110#

Noticed any irregularities with this table? File a report.

Annual time series of utility liabilities and other credits.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 110)
Primary key:: utility_id_ferc1, report_year, liability_type

Usage Warnings#

Some columns contain subtotals; use caution when choosing columns to aggregate.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_balance_sheet_liabilities_sched110.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_balance_sheet_liabilities_sched110.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc1__yearly_balance_sheet_liabilities_sched110.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_balance_sheet_liabilities_sched110.parquet"
)

Columns#

Field Name	Type	Description
balance	string	Indication of whether a column is a credit or debit, as reported in the XBRL taxonomy.
ending_balance	number	Account balance at end of year.
ferc_account	string	Actual FERC Account number (e.g. ‘359.1’) if available, or a PUDL assigned ID when FERC accounts have been split or combined in reporting.
liability_type	string	Type of liability being reported to the core_ferc1__yearly_balance_sheet_liabilities_sched110 table.
record_id	string	Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.
report_year	integer	Four-digit year in which the data was reported.
row_type_xbrl	string	Indicates whether the value reported in the row is calculated, or uniquely reported within the table.
starting_balance	number	Account balance at beginning of year.
utility_id_ferc1	integer	PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.
utility_type	string	Listing of utility plant types. Examples include Electric Utility, Gas Utility, and Other Utility.

core_ferc1__yearly_cash_flows_sched120#

Noticed any irregularities with this table? File a report.

Annual time series of utility cash flow.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 120)
Primary key:: utility_id_ferc1, report_year, amount_type

Usage Warnings#

Some columns contain subtotals; use caution when choosing columns to aggregate.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_cash_flows_sched120.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_cash_flows_sched120.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc1__yearly_cash_flows_sched120.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_cash_flows_sched120.parquet"
)

Columns#

Field Name	Type	Description
amount	number	Reported amount of dollars. This could be a balance or a change in value.
amount_type	string	Label describing the type of amount being reported. This could be a balance or a change in value.
balance	string	Indication of whether a column is a credit or debit, as reported in the XBRL taxonomy.
record_id	string	Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.
report_year	integer	Four-digit year in which the data was reported.
row_type_xbrl	string	Indicates whether the value reported in the row is calculated, or uniquely reported within the table.
utility_id_ferc1	integer	PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.

core_ferc1__yearly_depreciation_by_function_sched219#

Noticed any irregularities with this table? File a report.

Annual time series of ending balances in accumulated provision for depreciation of electric utility plant.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 219 - Section B)
Primary key:: utility_id_ferc1, report_year, depreciation_type, plant_function, plant_status, utility_type

Usage Warnings#

Some columns contain subtotals; use caution when choosing columns to aggregate.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

Electric utility plant refers to FERC Account 108. Section B includes the Balances at End of Year According to Functional Classification (plant_function).

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_depreciation_by_function_sched219.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_depreciation_by_function_sched219.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc1__yearly_depreciation_by_function_sched219.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_depreciation_by_function_sched219.parquet"
)

Columns#

Field Name	Type	Description
balance	string	Indication of whether a column is a credit or debit, as reported in the XBRL taxonomy.
depreciation_type	string	Type of depreciation provision within FERC Account 108, including cost ofremoval, depreciation expenses, salvage, cost of retired plant, etc.
ending_balance	number	Account balance at end of year.
plant_function	string	Functional role played by utility plant (steam production, nuclear production, distribution, transmission, etc.).
plant_status	string	Utility plant financial status (in service, future, leased, total).
record_id	string	Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.
report_year	integer	Four-digit year in which the data was reported.
row_type_xbrl	string	Indicates whether the value reported in the row is calculated, or uniquely reported within the table.
utility_id_ferc1	integer	PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.
utility_type	string	Listing of utility plant types. Examples include Electric Utility, Gas Utility, and Other Utility.

core_ferc1__yearly_depreciation_changes_sched219#

Noticed any irregularities with this table? File a report.

Annual time series of changes in accumulated provision for depreciation of electric utility plant.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 219 - Section A)
Primary key:: utility_id_ferc1, report_year, depreciation_type, plant_status, utility_type

Usage Warnings#

Some columns contain subtotals; use caution when choosing columns to aggregate.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

Electric utility plant refers to FERC Account 108.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_depreciation_changes_sched219.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_depreciation_changes_sched219.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc1__yearly_depreciation_changes_sched219.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_depreciation_changes_sched219.parquet"
)

Columns#

Field Name	Type	Description
balance	string	Indication of whether a column is a credit or debit, as reported in the XBRL taxonomy.
depreciation_type	string	Type of depreciation provision within FERC Account 108, including cost ofremoval, depreciation expenses, salvage, cost of retired plant, etc.
dollar_value	number	Dollar value of reported income, expense, asset, or liability.
ferc_account	string	Actual FERC Account number (e.g. ‘359.1’) if available, or a PUDL assigned ID when FERC accounts have been split or combined in reporting.
plant_status	string	Utility plant financial status (in service, future, leased, total).
record_id	string	Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.
report_year	integer	Four-digit year in which the data was reported.
row_type_xbrl	string	Indicates whether the value reported in the row is calculated, or uniquely reported within the table.
utility_id_ferc1	integer	PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.
utility_type	string	Listing of utility plant types. Examples include Electric Utility, Gas Utility, and Other Utility.

core_ferc1__yearly_depreciation_factors_sched336#

Noticed any irregularities with this table? File a report.

Annual time series of factors used in estimating depreciation charges.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 336 - Section C)
Primary key:: This table has no primary key. This table is too unstructured to have a primary key.

Usage Warnings#

Some columns contain subtotals; use caution when choosing columns to aggregate.
Contains columns which may appear categorical, but are actually free text.
The rate and percentage (pct) columns are reported either as values between 0-1 or 0-100. It is often consistent within a given utility-year but use caution when comparing value across different utilities and even the same utility filing in a different year.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

This table contains details at a variety of levels of granularity.There are many free-form text fields in this table which results in respondents filling out this table very differently from each other or from year to year. We recommend using this table carefully with one utility-year at a time - not attempting to perform analysis across long time-series or across utilities without much caution and cleaning. This table only contains information from Section C: Factors Used in Estimating Depreciation Charges.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_depreciation_factors_sched336.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_depreciation_factors_sched336.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc1__yearly_depreciation_factors_sched336.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_depreciation_factors_sched336.parquet"
)

Columns#

Field Name	Type	Description
account_num	string	Account number(s) in connection with factors used in estimating depreciation charges.
depreciable_plant_base	number	Depreciable plant balance (depreciable base) to which rates are applied.
depreciation_factors	string	Label of the factor of depreciation factors. This field contains is an unstructured, free-form strings. It often includes FERC account IDs, sometimes includes plant names and sometimes includes headers indicating the beginning of a new section - such as a plant or asset type header which is followed by sub-components like FERC account IDs or plant names depending on the section.
depreciation_rate	string	Depreciation rate applied to utility plant balance.Be aware that the formatting of this column is not expected to be standard - expect some values between 0-1 and some between 0-100.
mortality_curve_type	string	Description of the type of mortality curve selected in plant mortality studies prepared to assist in estimating average service lives.
net_salvage_pct	number	Percentage representing the estimated value of utility plant at the end of its service life. Be aware that the formatting of this column is not expected to be standard - expect some values between 0-1 and some between 0-100.
order_num	number	This field is defined in FERC-XBRL documentation as a field that is used to sequence a table.FERC-XBRL documentation notes: ‘This field is added to a table to control ordering of the items on the table.’FERC’s documentation also notes that this field should always be an integer - although there are many instances of floating point values which seem to increment by decimal points. Nonetheless, this field can be used to help understand the original order of the table. This field did not exist prior to FERC publishing Form 1 as XBRL and thus is always null prior to 2021.
record_id	string	Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.
remaining_life_avg	number	Estimated weighted average of remaining life of utility plant assets.The original DBF data was reported as years while the XBRL data was reported as strings in this time duration format: PnYnMnDTnHnMnS. PUDL converts these string time durations into years.
report_year	integer	Four-digit year in which the data was reported.
service_life_avg	number	Estimated average service life of utility plant. The original DBF data was reported as years while the XBRL data was reported as strings in this time duration format: PnYnMnDTnHnMnS. PUDL converts these string time durations into years.
utility_id_ferc1	integer	PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.

core_ferc1__yearly_depreciation_summary_sched336#

Noticed any irregularities with this table? File a report.

Annual time series of depreciation and amortization of electric plant.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 336 - Section A)
Primary key:: utility_id_ferc1, report_year, plant_function, ferc_account_label

Usage Warnings#

Some columns contain subtotals; use caution when choosing columns to aggregate.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

Electric Plant refers to FERC Accounts 403, 404, and 405. This table only contains information from Section A: Summary of depreciation and amortization changes.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_depreciation_summary_sched336.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_depreciation_summary_sched336.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc1__yearly_depreciation_summary_sched336.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_depreciation_summary_sched336.parquet"
)

Columns#

Field Name	Type	Description
dollar_value	number	Dollar value of reported income, expense, asset, or liability.
ferc_account	string	Actual FERC Account number (e.g. ‘359.1’) if available, or a PUDL assigned ID when FERC accounts have been split or combined in reporting.
ferc_account_label	string	Long FERC account identifier derived from values reported in the XBRL taxonomies. May also refer to aggregations of individual FERC accounts.
plant_function	string	Functional role played by utility plant (steam production, nuclear production, distribution, transmission, etc.).
record_id	string	Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.
report_year	integer	Four-digit year in which the data was reported.
row_type_xbrl	string	Indicates whether the value reported in the row is calculated, or uniquely reported within the table.
utility_id_ferc1	integer	PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.
utility_type	string	Listing of utility plant types. Examples include Electric Utility, Gas Utility, and Other Utility.

core_ferc1__yearly_energy_dispositions_sched401#

Noticed any irregularities with this table? File a report.

Annual time series of dispositions of electric energy sold, exchanged, or stored.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 401a)
Primary key:: utility_id_ferc1, report_year, energy_disposition_type

Usage Warnings#

Some columns contain subtotals; use caution when choosing columns to aggregate.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

Electric Energy Account, dispositions only. Schedule 401a. Electricity utilities delivered to end users, internal losses, etc.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_energy_dispositions_sched401.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_energy_dispositions_sched401.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc1__yearly_energy_dispositions_sched401.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_energy_dispositions_sched401.parquet"
)

Columns#

Field Name	Type	Description
energy_disposition_type	string	Type of energy disposition reported in the core_ferc1__yearly_energy_dispositions_sched401. Dispositions include sales to customers, re-sales of energy, energy used internally, losses, etc.
energy_mwh	number	Sources and uses of energy in MWh.
record_id	string	Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.
report_year	integer	Four-digit year in which the data was reported.
row_type_xbrl	string	Indicates whether the value reported in the row is calculated, or uniquely reported within the table.
utility_id_ferc1	integer	PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.

core_ferc1__yearly_energy_sources_sched401#

Noticed any irregularities with this table? File a report.

Annual time series of sources of electric energy generated or purchased, exchanged and wheeled.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 401a)
Primary key:: utility_id_ferc1, report_year, energy_source_type

Usage Warnings#

Some columns contain subtotals; use caution when choosing columns to aggregate.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

Electric Energy Account, sources only. Schedule 401a. Amount of electricity the utility obtained from each of several sources.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_energy_sources_sched401.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_energy_sources_sched401.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc1__yearly_energy_sources_sched401.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_energy_sources_sched401.parquet"
)

Columns#

Field Name	Type	Description
energy_mwh	number	Sources and uses of energy in MWh.
energy_source_type	string	Type of energy source reported in the core_ferc1__yearly_energy_sources_sched401 table. There are three groups of energy sources: generation, power exchanges and transmission.
record_id	string	Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.
report_year	integer	Four-digit year in which the data was reported.
row_type_xbrl	string	Indicates whether the value reported in the row is calculated, or uniquely reported within the table.
utility_id_ferc1	integer	PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.

core_ferc1__yearly_hydroelectric_plants_sched406#

Noticed any irregularities with this table? File a report.

Annual time series of plant statistics for large hydroelectric generating plants.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 406)
Primary key:: This table has no primary key. The best approximation for primary keys for this table would be: report_year, utility_id_ferc1, plant_name_ferc1. FERC does not publish plant IDs. The main identifying column is plant_name_ferc1 but that is a free-form string field and there are duplicate records.

Usage Warnings#

The plant_type and construction_type are standardized into categorical values from free-form strings.
FERC does not restrict respondents to report unique and non-duplicative plant records. There are sporadic instances of respondents reporting portions of plants and then the total plant (ex: unit 1, unit 2 and total). Use caution when aggregating.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

Large plants have an installed nameplate capacity of more than 10 MW.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_hydroelectric_plants_sched406.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_hydroelectric_plants_sched406.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc1__yearly_hydroelectric_plants_sched406.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_hydroelectric_plants_sched406.parquet"
)

Columns#

Field Name	Type	Description
asset_retirement_cost	number	Asset retirement cost (USD).
avg_num_employees	number	The average number of employees assigned to each plant.
capacity_mw	number	Total installed (nameplate) capacity, in megawatts.
capex_equipment	number	Cost of plant: equipment (USD).
capex_facilities	number	Cost of plant: reservoirs, dams, and waterways (USD).
capex_land	number	Cost of plant: land and land rights (USD).
capex_per_mw	number	Cost of plant per megawatt of installed (nameplate) capacity. Nominal USD.
capex_roads	number	Cost of plant: roads, railroads, and bridges (USD).
capex_structures	number	Cost of plant: structures and improvements (USD).
capex_total	number	Total cost of plant (USD).
construction_type	string	Type of plant construction (‘outdoor’, ‘semioutdoor’, or ‘conventional’). Categorized by PUDL based on our best guess of intended value in FERC1 freeform strings.
construction_year	integer	Year the plant’s oldest still operational unit was built.
installation_year	integer	Year the plant’s most recently built unit was installed.
net_capacity_adverse_conditions_mw	number	Net plant capability under the least favorable operating conditions, in megawatts.
net_capacity_favorable_conditions_mw	number	Net plant capability under the most favorable operating conditions, in megawatts.
net_generation_mwh	number	Net electricity generation for the specified period in megawatt-hours (MWh).
opex_dams	number	Production expenses: maintenance of reservoirs, dams, and waterways (USD).
opex_electric	number	Production expenses: electric expenses (USD).
opex_engineering	number	Production expenses: maintenance, supervision, and engineering (USD).
opex_generation_misc	number	Production expenses: miscellaneous power generation expenses (USD).
opex_hydraulic	number	Production expenses: hydraulic expenses (USD).
opex_misc_plant	number	Production expenses: maintenance of miscellaneous hydraulic plant (USD).
opex_operations	number	Production expenses: operations, supervision, and engineering (USD).
opex_per_mwh	number	Total production expenses (USD per MWh generated).
opex_plant	number	Production expenses: maintenance of electric plant (USD).
opex_rents	number	Production expenses: rents (USD).
opex_structures	number	Production expenses: maintenance of structures (USD).
opex_total	number	Total production expenses, excluding fuel (USD).
opex_water_for_power	number	Production expenses: water for power (USD).
peak_demand_mw	number	Net peak demand for 60 minutes. Note: in some cases peak demand for other time periods may have been reported instead, if hourly peak demand was unavailable.
plant_hours_connected_while_generating	number	Hours the plant was connected to load while generating in the report year.
plant_name_ferc1	string	Name of the plant, as reported to FERC. This is a freeform string, not guaranteed to be consistent across references to the same plant.
plant_type	string	Type of plant.
project_num	integer	FERC Licensed Project Number.
record_id	string	Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.
report_year	integer	Four-digit year in which the data was reported.
utility_id_ferc1	integer	PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.

core_ferc1__yearly_identification_certification#

Noticed any irregularities with this table? File a report.

Annual time series of respondent identification and corporate officer certification information.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: FERC Form 1 – Annual Report of Major Electric Utilities (Part I and II)
Primary key:: utility_id_ferc1, report_year, report_filing_type

Usage Warnings#

Contains columns which may appear categorical, but are actually free text.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

This table contains information identifying a utility’s contact information, addresses and individuals responsible for accounting and certification. Note that the entity_id_gleif field does not conform to the expected format and largely includes the names of respondents.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_identification_certification.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_identification_certification.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc1__yearly_identification_certification.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_identification_certification.parquet"
)

Columns#

Field Name	Type	Description
attestation_date	date	Date that corporate officer’s certification was signed.
attestation_name	string	Name of person signing the corporate officer’s certification.
attestation_signature	string	Signature of corporate officer to certify the report.
attestation_title	string	Title of person signing the corporate officer’s certification.
company_id_ferc	string	The Company Identifier (CID) obtained through FERC’s Company Registration system.
contact_address	string	Street address for contact identified in contact_name.
contact_city	string	City for contact identified in contact_name.
contact_email	string	Email for contact identified in contact_name.
contact_name	string	Name of contact person for the respondent.
contact_phone	string	Phone number for contact identified in contact_name.
contact_state	string	State for contact identified in contact_name.
contact_title	string	Title of contact person for the respondent.
contact_zip	string	Zip code for contact identified in contact_name.
entity_id_gleif	string	Legal Entity Identification Number issued by Global Legal Entity Identifier Foundation (GLEIF). The Legal Entity Identifier (LEI) is a 20-character, alpha-numeric code based on the ISO 17442 standard developed by the International Organization for Standardization (ISO).
filing_date	date	Date on which the filing was submitted.
filing_software_vendor_name	string	Name of software company used for creating the instance file submitted. If the filer created the submission, is null
name_change_date	date	Date of last name change of the company.
office_city	string	City of principal office at end of period.
office_state	string	State of principal office at end of period.
office_street_address	string	Street address of principal office at end of period.
office_zip	string	Zipcode of principal office at end of period.
prior_utility_name_ferc1	string	Former name of the responding utility, when it changed within the last year.
record_id	string	Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.
report_filing_type	string	Type of report submitted: O (original) or R (revision).
report_year	integer	Four-digit year in which the data was reported.
utility_id_ferc1	integer	PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.
utility_name_ferc1	string	Name of the responding utility, as it is reported in FERC Form 1. For human readability only.

core_ferc1__yearly_income_statements_sched114#

Noticed any irregularities with this table? File a report.

Annual time series of utility income statements.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 114)
Primary key:: utility_id_ferc1, report_year, utility_type, income_type

Usage Warnings#

Some columns contain subtotals; use caution when choosing columns to aggregate.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_income_statements_sched114.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_income_statements_sched114.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc1__yearly_income_statements_sched114.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_income_statements_sched114.parquet"
)

Columns#

Field Name	Type	Description
balance	string	Indication of whether a column is a credit or debit, as reported in the XBRL taxonomy.
dollar_value	number	Dollar value of reported income, expense, asset, or liability.
ferc_account	string	Actual FERC Account number (e.g. ‘359.1’) if available, or a PUDL assigned ID when FERC accounts have been split or combined in reporting.
income_type	string	Type of income reported in core_ferc1__yearly_income_statements_sched114 table.
record_id	string	Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.
report_year	integer	Four-digit year in which the data was reported.
row_type_xbrl	string	Indicates whether the value reported in the row is calculated, or uniquely reported within the table.
utility_id_ferc1	integer	PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.
utility_type	string	Listing of utility plant types. Examples include Electric Utility, Gas Utility, and Other Utility.

core_ferc1__yearly_operating_expenses_sched320#

Noticed any irregularities with this table? File a report.

Annual time series of operating and maintenance costs associated with producing electricity.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 320)
Primary key:: utility_id_ferc1, report_year, expense_type

Usage Warnings#

Some columns contain subtotals; use caution when choosing columns to aggregate.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_operating_expenses_sched320.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_operating_expenses_sched320.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc1__yearly_operating_expenses_sched320.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_operating_expenses_sched320.parquet"
)

Columns#

Field Name	Type	Description
dollar_value	number	Dollar value of reported income, expense, asset, or liability.
expense_type	string	The type of expense.
ferc_account	string	Actual FERC Account number (e.g. ‘359.1’) if available, or a PUDL assigned ID when FERC accounts have been split or combined in reporting.
record_id	string	Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.
report_year	integer	Four-digit year in which the data was reported.
row_type_xbrl	string	Indicates whether the value reported in the row is calculated, or uniquely reported within the table.
utility_id_ferc1	integer	PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.
utility_type	string	Listing of utility plant types. Examples include Electric Utility, Gas Utility, and Other Utility.

core_ferc1__yearly_operating_revenues_sched300#

Noticed any irregularities with this table? File a report.

Annual time series of utilities’ electric operating revenues.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 300)
Primary key:: utility_id_ferc1, report_year, revenue_type

Usage Warnings#

Some columns contain subtotals; use caution when choosing columns to aggregate.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

This table includes only the structured part of schedule 300. There are a number of revenue_type’s that do not have sales_mwh,or avg_customers_per_month provided, in which case these columnswill be NULL.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_operating_revenues_sched300.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_operating_revenues_sched300.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc1__yearly_operating_revenues_sched300.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_operating_revenues_sched300.parquet"
)

Columns#

Field Name	Type	Description
avg_customers_per_month	number	Average number of customers per month.
dollar_value	number	Dollar value of reported income, expense, asset, or liability.
ferc_account	string	Actual FERC Account number (e.g. ‘359.1’) if available, or a PUDL assigned ID when FERC accounts have been split or combined in reporting.
record_id	string	Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.
report_year	integer	Four-digit year in which the data was reported.
revenue_type	string	Label describing types of revenues.
row_type_xbrl	string	Indicates whether the value reported in the row is calculated, or uniquely reported within the table.
sales_mwh	number	Quantity of electricity sold in MWh.
utility_id_ferc1	integer	PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.
utility_type	string	Listing of utility plant types. Examples include Electric Utility, Gas Utility, and Other Utility.

core_ferc1__yearly_other_regulatory_assets_sched232#

Noticed any irregularities with this table? File a report.

Annual time series of utilities’ other regulatory assets, including rate order docket number.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 232)
Primary key:: This table has no primary key. Respondents are able to enter any number of assets across many rows. There are no IDs or set fields enforced in the original table.

Usage Warnings#

Contains columns which may appear categorical, but are actually free text.
Some columns contain subtotals; use caution when choosing columns to aggregate.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_other_regulatory_assets_sched232.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_other_regulatory_assets_sched232.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc1__yearly_other_regulatory_assets_sched232.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_other_regulatory_assets_sched232.parquet"
)

Columns#

Field Name	Type	Description
account_detail	string	Description of the account number credited from making debit adjustment to other regulatory liabilities.
additional_description	string	Additional free-form description column. In most instances, the values in this column are exactly the same as the description column,but it differs from the standard description enough that we kept both columns. This field did not exist prior to FERC publishing Form 1 as XBRL and thus is always null prior to 2021.
credits_written_off_recovered	number	Recovered amount of divestiture of other assets lacking physical substance resulting from the ratemaking actions of regulatory agencies.
debits	number	The increase (decrease) during the period in the value of other assets resulting from the ratemaking actions of regulatory agencies.
description	string	Free-form description and purpose of other regulatory assets.
ending_balance	number	Account balance at end of year.
report_year	integer	Four-digit year in which the data was reported.
starting_balance	number	Account balance at beginning of year.
utility_id_ferc1	integer	PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.

core_ferc1__yearly_other_regulatory_liabilities_sched278#

Noticed any irregularities with this table? File a report.

Annual time series of utilities’ other regulatory liabilities, including rate order docket number.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 278)
Primary key:: This table has no primary key. Respondents are able to enter any number of liabilities across many rows. There are no IDs or set fields enforced in the original table.

Usage Warnings#

Contains columns which may appear categorical, but are actually free text.
Some columns contain subtotals; use caution when choosing columns to aggregate.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_other_regulatory_liabilities_sched278.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_other_regulatory_liabilities_sched278.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc1__yearly_other_regulatory_liabilities_sched278.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_other_regulatory_liabilities_sched278.parquet"
)

Columns#

Field Name	Type	Description
account_detail	string	Description of the account number credited from making debit adjustment to other regulatory liabilities.
additional_description	string	Additional free-form description column. In most instances, the values in this column are exactly the same as the description column,but it differs from the standard description enough that we kept both columns. This field did not exist prior to FERC publishing Form 1 as XBRL and thus is always null prior to 2021.
decrease_in_other_regulatory_liabilities	number	The decrease during the reporting period of other regulatory liabilities.
description	string	Long human-readable description of the meaning of a code/label.
ending_balance	number	Account balance at end of year.
increase_in_other_regulatory_liabilities	number	The increase during the reporting period of other regulatory liabilities.
report_year	integer	Four-digit year in which the data was reported.
starting_balance	number	Account balance at beginning of year.
utility_id_ferc1	integer	PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.

core_ferc1__yearly_plant_in_service_sched204#

Noticed any irregularities with this table? File a report.

Annual time series of utilities’ balances and changes to FERC Electric Plant in Service accounts.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 204)
Primary key:: utility_id_ferc1, report_year, ferc_account_label

Usage Warnings#

Some columns contain subtotals; use caution when choosing columns to aggregate.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

Account numbers correspond to the FERC Uniform System of Accounts for Electric Plant, which is defined in Code of Federal Regulations (CFR) Title 18, Chapter I, Subchapter C, Part 101. (See e.g. https://www.law.cornell.edu/cfr/text/18/part-101). Each FERC respondent reports starting and ending balances for each account annually. Balances are organization wide, and are not broken down on a per-plant basis. End of year balance should equal beginning year balance plus the sum of additions, retirements, adjustments, and transfers.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_plant_in_service_sched204.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_plant_in_service_sched204.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc1__yearly_plant_in_service_sched204.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_plant_in_service_sched204.parquet"
)

Columns#

Field Name	Type	Description
additions	number	Cost of acquisition of items classified within the account.
adjustments	number	Cost of adjustments to the account.
ending_balance	number	Account balance at end of year.
ferc_account	string	Actual FERC Account number (e.g. ‘359.1’) if available, or a PUDL assigned ID when FERC accounts have been split or combined in reporting.
ferc_account_label	string	Long FERC account identifier derived from values reported in the XBRL taxonomies. May also refer to aggregations of individual FERC accounts.
plant_status	string	Utility plant financial status (in service, future, leased, total).
record_id	string	Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.
report_year	integer	Four-digit year in which the data was reported.
retirements	number	Cost of disposal of items classified within the account.
row_type_xbrl	string	Indicates whether the value reported in the row is calculated, or uniquely reported within the table.
starting_balance	number	Account balance at beginning of year.
transfers	number	Cost of transfers into (out of) the account.
utility_id_ferc1	integer	PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.
utility_type	string	Listing of utility plant types. Examples include Electric Utility, Gas Utility, and Other Utility.

core_ferc1__yearly_pumped_storage_plants_sched408#

Noticed any irregularities with this table? File a report.

Annual time series of plant statistics for hydroelectric pumped storage plants with an installed nameplate capacity of 10+ MW.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 408)
Primary key:: This table has no primary key. The best approximation for primary keys for this table would be: report_year, utility_id_ferc1, plant_name_ferc1. FERC does not publish plant IDs. The main identifying column is plant_name_ferc1 but that is a free-form string field and there are duplicate records.

Usage Warnings#

FERC does not restrict respondents to report unique and non-duplicative plant records. There are sporadic instances of respondents reporting portions of plants and then the total plant (ex: unit 1, unit 2 and total). Use caution when aggregating.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

As reported in Schedule 408 of FERC Form 1.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_pumped_storage_plants_sched408.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_pumped_storage_plants_sched408.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc1__yearly_pumped_storage_plants_sched408.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_pumped_storage_plants_sched408.parquet"
)

Columns#

Field Name	Type	Description
asset_retirement_cost	number	Asset retirement cost (USD).
avg_num_employees	number	The average number of employees assigned to each plant.
capacity_mw	number	Total installed (nameplate) capacity, in megawatts.
capex_equipment_electric	number	Cost of plant: accessory electric equipment (USD).
capex_equipment_misc	number	Cost of plant: miscellaneous power plant equipment (USD).
capex_facilities	number	Cost of plant: reservoirs, dams, and waterways (USD).
capex_land	number	Cost of plant: land and land rights (USD).
capex_per_mw	number	Cost of plant per megawatt of installed (nameplate) capacity. Nominal USD.
capex_roads	number	Cost of plant: roads, railroads, and bridges (USD).
capex_structures	number	Cost of plant: structures and improvements (USD).
capex_total	number	Total cost of plant (USD).
capex_wheels_turbines_generators	number	Cost of plant: water wheels, turbines, and generators (USD).
construction_type	string	Type of plant construction (‘outdoor’, ‘semioutdoor’, or ‘conventional’). Categorized by PUDL based on our best guess of intended value in FERC1 freeform strings.
construction_year	integer	Year the plant’s oldest still operational unit was built.
energy_used_for_pumping_mwh	number	Energy used for pumping, in megawatt-hours.
installation_year	integer	Year the plant’s most recently built unit was installed.
net_generation_mwh	number	Net electricity generation for the specified period in megawatt-hours (MWh).
net_load_mwh	number	Net output for load (net generation - energy used for pumping) in megawatt-hours.
opex_dams	number	Production expenses: maintenance of reservoirs, dams, and waterways (USD).
opex_electric	number	Production expenses: electric expenses (USD).
opex_engineering	number	Production expenses: maintenance, supervision, and engineering (USD).
opex_generation_misc	number	Production expenses: miscellaneous power generation expenses (USD).
opex_misc_plant	number	Production expenses: maintenance of miscellaneous hydraulic plant (USD).
opex_operations	number	Production expenses: operations, supervision, and engineering (USD).
opex_per_mwh	number	Total production expenses (USD per MWh generated).
opex_plant	number	Production expenses: maintenance of electric plant (USD).
opex_production_before_pumping	number	Total production expenses before pumping (USD).
opex_pumped_storage	number	Production expenses: pumped storage (USD).
opex_pumping	number	Production expenses: We are here to PUMP YOU UP! (USD).
opex_rents	number	Production expenses: rents (USD).
opex_structures	number	Production expenses: maintenance of structures (USD).
opex_total	number	Total production expenses, excluding fuel (USD).
opex_water_for_power	number	Production expenses: water for power (USD).
peak_demand_mw	number	Net peak demand for 60 minutes. Note: in some cases peak demand for other time periods may have been reported instead, if hourly peak demand was unavailable.
plant_capability_mw	number	Net plant capability in megawatts.
plant_hours_connected_while_generating	number	Hours the plant was connected to load while generating in the report year.
plant_name_ferc1	string	Name of the plant, as reported to FERC. This is a freeform string, not guaranteed to be consistent across references to the same plant.
project_num	integer	FERC Licensed Project Number.
record_id	string	Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.
report_year	integer	Four-digit year in which the data was reported.
utility_id_ferc1	integer	PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.

core_ferc1__yearly_purchased_power_and_exchanges_sched326#

Noticed any irregularities with this table? File a report.

Annual time series of purchased power (Account 555) including power exchanges (transactions involving a balancing of debits and credits for energy, capacity, etc.) and any settlements for imbalanced exchanges.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 326)
Primary key:: This table has no primary key. The primary key for this table would be record_id, however a small number of these values are duplicated in 2021, 2022, and 2024.

Usage Warnings#

Contains columns which may appear categorical, but are actually free text.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

This table has data about inter-utility power purchases. This includes how much electricity was purchased, how much it cost, and who it was purchased from. Unfortunately the field describing which other utility the power was being bought from (seller_name) is poorly standardized, making it difficult to correlate with other data.

Purchased Power is considered FERC Account 555 according to FERC’s Uniform System of Accounts. Reported on pages 326-327 of FERC Form 1.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_purchased_power_and_exchanges_sched326.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_purchased_power_and_exchanges_sched326.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc1__yearly_purchased_power_and_exchanges_sched326.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_purchased_power_and_exchanges_sched326.parquet"
)

Columns#

Field Name	Type	Description
billing_demand_mw	number	Monthly average billing demand (for requirements purchases, and any transactions involving demand charges). In megawatts.
coincident_peak_demand_mw	number	Average monthly coincident peak (CP) demand (for requirements purchases, and any transactions involving demand charges). Monthly CP demand is the metered demand during the hour (60-minute integration) in which the supplier’s system reaches its monthly peak. In megawatts.
delivered_mwh	number	Gross megawatt-hours delivered in power exchanges and used as the basis for settlement.
demand_charges	number	Demand charges (USD).
energy_charges	number	Energy charges (USD).
non_coincident_peak_demand_mw	number	Average monthly non-coincident peak (NCP) demand (for requirements purhcases, and any transactions involving demand charges). Monthly NCP demand is the maximum metered hourly (60-minute integration) demand in a month. In megawatts.
other_charges	number	Other charges, including out-of-period adjustments (USD).
purchase_type_code	string	Categorization based on the original contractual terms and conditions of the service. Must be one of ‘requirements’, ‘long_firm’, ‘intermediate_firm’, ‘short_firm’, ‘long_unit’, ‘intermediate_unit’, ‘electricity_exchange’, ‘other_service’, or ‘adjustment’. Requirements service is ongoing high reliability service, with load integrated into system resource planning. ‘Long term’ means 5+ years. ‘Intermediate term’ is 1-5 years. ‘Short term’ is less than 1 year. ‘Firm’ means not interruptible for economic reasons. ‘unit’ indicates service from a particular designated generating unit. ‘exchange’ is an in-kind transaction.
purchased_mwh	number	Megawatt-hours shown on bills rendered to the respondent. Includes both electricity purchased for storage and non-storage purposes, which were lumped together prior to 2021.
purchased_other_than_storage_mwh	number	Number of megawatt hours purchased during the period for other than energy storage.
purchased_storage_mwh	number	Number of megawatt hours purchased during the period for energy storage.
received_mwh	number	Gross megawatt-hours received in power exchanges and used as the basis for settlement.
record_id	string	Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.
report_year	integer	Four-digit year in which the data was reported.
seller_name	string	Name of the seller, or the other party in an exchange transaction.
tariff	string	FERC Rate Schedule Number or Tariff. (Note: may be incomplete if originally reported on multiple lines.)
total_settlement	number	Sum of demand, energy, and other charges (USD). For power exchanges, the settlement amount for the net receipt of energy. If more energy was delivered than received, this amount is negative.
utility_id_ferc1	integer	PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.

core_ferc1__yearly_retained_earnings_sched118#

Noticed any irregularities with this table? File a report.

Annual time series of utilities’ statements of retained earnings.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 118)
Primary key:: utility_id_ferc1, report_year, earnings_type

Usage Warnings#

Some columns contain subtotals; use caution when choosing columns to aggregate.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_retained_earnings_sched118.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_retained_earnings_sched118.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc1__yearly_retained_earnings_sched118.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_retained_earnings_sched118.parquet"
)

Columns#

Field Name	Type	Description
balance	string	Indication of whether a column is a credit or debit, as reported in the XBRL taxonomy.
earnings_type	string	Label describing types of earnings.
ending_balance	number	Account balance at end of year.
ferc_account	string	Actual FERC Account number (e.g. ‘359.1’) if available, or a PUDL assigned ID when FERC accounts have been split or combined in reporting.
record_id	string	Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.
report_year	integer	Four-digit year in which the data was reported.
row_type_xbrl	string	Indicates whether the value reported in the row is calculated, or uniquely reported within the table.
starting_balance	number	Account balance at beginning of year.
utility_id_ferc1	integer	PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.
utility_type	string	Listing of utility plant types. Examples include Electric Utility, Gas Utility, and Other Utility.

core_ferc1__yearly_sales_by_rate_schedules_sched304#

Noticed any irregularities with this table? File a report.

Annual time series of utilities’ electric sales from all rate schedules in effect throughout the year.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 304)
Primary key:: This table has no primary key.

Usage Warnings#

Some columns contain subtotals; use caution when choosing columns to aggregate.
Values in rate_schedule_description are free-form strings.
Data prior to 2021 does not include information in columns: rate_schedule_type and billing_status.
Units of revenue_per_kwh are suspected to include a mix of dollars and possibly cents.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

The pre-2021 data in this table (extracted from FoxProDB vs. XBRL) is extremely unstructured. Where the post-2020 data (from XBRL) sorts the data into rate schedule types: residential, industrial, commercial, public_lighting, public_authorities, railroads, interdepartmental, provision_for_rate_refund, commercial_and_industrial, total, and billing status: billed, unbilled, total, the pre-2021 data stuffs all of that information (if you’re lucky) into the rate_schedule_description column. There’s no point trying to parse through the pre 2021 rate_schedule_description column en masse because it’s just too messy. The contents of rate_schedule_description often contain numbers and acronyms that have little to no meaning out of context. The table is structured somewhat like the FERC1 small generators table with headings about rate structure type also embedded into the rate_schedule_description column. To all who dare, beware.

This table is a combination of one pre-2021 (DBF) table and nine post-2020 (XBRL) tables–one for each rate schedule type plus totals–hence increase in data clarity post-2020. The rate_schedule_type and billing_status columns are only relevant for post-2020 data as they can be reliably parsed from each of the tables and incorporated into columns. The rate_schedule_description is supposed to contain sub-rate_schedule_type names for charges (Ex: Residential 1, Residential 2, etc.). However, the pre-2021 data contains a little bit of everything (or nothing) and the post-2020 has some totals or wonky data thrown in. That’s to say, even when working with post-2020 data, be wary of aggregating the data. That’s what the “total” rows are for.

The values that come from from the totals table are marked with the string “total” in the rate_schedule_description column. The totals table is a product of the transition to XBRL, so these distinguishable totals are only available for data post-2020 (otherwise you could try keyword searching for “total” in rate_schedule_description). The total table contains two types of totals, the utility totals across all rate schedules in a given year (marked with rate_schedule_description = “total” and rate_schedule_type = “total”) and each of the utility’s individual rate schedule totals in a given year (marked with rate_schedule_description = “total” and rate_schedule_type = “residential” or any other rate schedule type).

The rate schedule based XBRL tables only report billed values whereas the total tables report billed, unbilled, and total values. (See the column description for more info on the difference between billed and unbilled). This is important to consider if you’re endeavoring to compare the subtotal values with the total values. We have not attempted to fix or verify any subtotals or totals that don’t add up.

Another important note is the possibility of unit discrepancies in certain columns. The revenue_per_kwh column does not specify reporting units, and closer inspection of the data reveals two clear peaks approximate two orders of magnitude apart. This indicates that values may be reported in both dollars and cents. However, because the price of energy per kwh varies so much regionally, we cannot guarantee which is which and have not put any cleaning mechanisms in place to account for this.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_sales_by_rate_schedules_sched304.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_sales_by_rate_schedules_sched304.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc1__yearly_sales_by_rate_schedules_sched304.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_sales_by_rate_schedules_sched304.parquet"
)

Columns#

Field Name	Type	Description
avg_customers_per_month	number	Average number of customers per month.
billing_status	string	Whether an amount is billed, unbilled, or both. Billed amounts pertain to the exchange of energy and unbilled amounts pertain to other sources of revenue such as contracts with peaker plants to keep them on standby or charging rent to host cell antennas on transmission towers.
dollar_value	number	Dollar value of reported income, expense, asset, or liability.
kwh_per_customer	number	kWh per customer.
rate_schedule_description	string	Free-form description of what the rate schedule name is. Not standardized. Often a sub-category of rate_schedule_type.
rate_schedule_type	string	Categorization of rate schedule type.
record_id	string	Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.
report_year	integer	Four-digit year in which the data was reported.
revenue_per_kwh	number	The amount of revenue per kWh by rate schedule acquired in the given report year.
sales_mwh	number	Quantity of electricity sold in MWh.
utility_id_ferc1	integer	PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.

core_ferc1__yearly_small_plants_sched410#

Noticed any irregularities with this table? File a report.

Annual time series of plant statistics for internal combustion plants, gas turbine-plants, conventional hydro plants, and pumped storage plants with less than 10 MW installed nameplate capacity and steam plants with less than 25 MW installed nameplate capacity.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 410)
Primary key:: This table has no primary key. The best approximation for primary keys for this table would be: report_year, utility_id_ferc1, plant_name_ferc1. FERC does not publish plant IDs. The main identifying column is plant_name_ferc1 but that is a free-form string field and there are duplicate records.

Usage Warnings#

FERC does not restrict respondents to report unique and non-duplicative plant records. There are sporadic instances of respondents reporting portions of plants and then the total plant (ex: unit 1, unit 2 and total). Use caution when aggregating.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

As reported on FERC Form 1 Schedule 410 (pages 410-411) and extracted from the FERC Visual FoxPro and XBRL. See our pudl.extract.ferc1.TABLE_NAME_MAP_FERC1 for links to the raw tables.

The raw version of this table is more like a digitized PDF than an actual data table. The rows contain lots of information in addition to what the columns might suggest. For instance, a single column may contain header rows, note rows, and total rows. This extraneous information is useful, but it prevents proper analysis when mixed in with the rest of the values data in the column. We employ a couple of data transformations to extract these rows from the data and preserve some of the information they contain (fuel type, plant type, FERC license, or general notes about the plant) in separate columns.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_small_plants_sched410.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_small_plants_sched410.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc1__yearly_small_plants_sched410.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_small_plants_sched410.parquet"
)

Columns#

Field Name	Type	Description
capacity_mw	number	Total installed (nameplate) capacity, in megawatts.
capex_per_mw	number	Cost of plant per megawatt of installed (nameplate) capacity. Nominal USD.
capex_total	number	Total cost of plant (USD).
construction_year	integer	Year the plant’s oldest still operational unit was built.
fuel_cost_per_mmbtu	number	Average fuel cost per MMBTU of heat content in nominal USD.
fuel_type	string	Type of fuel.
license_id_ferc1	integer	FERC issued operating license ID for the facility, if available. This value is extracted from the original plant name where possible.
net_generation_mwh	number	Net electricity generation for the specified period in megawatt-hours (MWh).
opex_fuel	number	Production expenses: fuel (USD).
opex_maintenance	number	Production expenses: Maintenance (USD).
opex_operations	number	Production expenses: operations, supervision, and engineering (USD).
peak_demand_mw	number	Net peak demand for 60 minutes. Note: in some cases peak demand for other time periods may have been reported instead, if hourly peak demand was unavailable.
plant_name_ferc1	string	Name of the plant, as reported to FERC. This is a freeform string, not guaranteed to be consistent across references to the same plant.
plant_type	string	Type of plant.
record_id	string	Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.
report_year	integer	Four-digit year in which the data was reported.
utility_id_ferc1	integer	PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.

core_ferc1__yearly_steam_plants_fuel_sched402#

Noticed any irregularities with this table? File a report.

Annual time series of fuel cost and quantity for steam plants with a capacity of 25+ MW, internal combustion and gas-turbine plants of 10+ MW, and all nuclear plants.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 402)
Primary key:: This table has no primary key.

Usage Warnings#

The fuel_type_code_pudl is inferred from a free-form string field.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

This table is a subset of the steam plant table reported on page 402 of FERC Form 1.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_steam_plants_fuel_sched402.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_steam_plants_fuel_sched402.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc1__yearly_steam_plants_fuel_sched402.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_steam_plants_fuel_sched402.parquet"
)

Columns#

Field Name	Type	Description
fuel_consumed_units	number	Consumption of the fuel type in physical unit. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production.
fuel_cost_per_mmbtu	number	Average fuel cost per MMBTU of heat content in nominal USD.
fuel_cost_per_unit_burned	number	Average cost of fuel consumed in the report year per reported fuel unit (USD).
fuel_cost_per_unit_delivered	number	Average cost of fuel delivered in the report year per reported fuel unit (USD).
fuel_mmbtu_per_unit	number	Heat content of the fuel in millions of Btus per physical unit.
fuel_type_code_pudl	string	Simplified fuel type code used in PUDL
fuel_units	string	Reported unit of measure for fuel.
plant_name_ferc1	string	Name of the plant, as reported to FERC. This is a freeform string, not guaranteed to be consistent across references to the same plant.
record_id	string	Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.
report_year	integer	Four-digit year in which the data was reported.
utility_id_ferc1	integer	PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.

core_ferc1__yearly_steam_plants_sched402#

Noticed any irregularities with this table? File a report.

Annual time series of plant statistics for steam plants with a capacity of 25+ MW, internal combustion and gas-turbine plants of 10+ MW, and all nuclear plants.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 402)
Primary key:: This table has no primary key. The best approximation for primary keys for this table would be: report_year, utility_id_ferc1, plant_name_ferc1. FERC does not publish plant IDs. The main identifying column is plant_name_ferc1 but that is a free-form string field and there are duplicate records.

Usage Warnings#

FERC does not restrict respondents to report unique and non-duplicative plant records. There are sporadic instances of respondents reporting portions of plants and then the total plant (ex: unit 1, unit 2 and total). Use caution when aggregating.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_steam_plants_sched402.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_steam_plants_sched402.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc1__yearly_steam_plants_sched402.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_steam_plants_sched402.parquet"
)

Columns#

Field Name	Type	Description
asset_retirement_cost	number	Asset retirement cost (USD).
avg_num_employees	number	The average number of employees assigned to each plant.
capacity_mw	number	Total installed (nameplate) capacity, in megawatts.
capex_equipment	number	Cost of plant: equipment (USD).
capex_land	number	Cost of plant: land and land rights (USD).
capex_per_mw	number	Cost of plant per megawatt of installed (nameplate) capacity. Nominal USD.
capex_structures	number	Cost of plant: structures and improvements (USD).
capex_total	number	Total cost of plant (USD).
construction_type	string	Type of plant construction (‘outdoor’, ‘semioutdoor’, or ‘conventional’). Categorized by PUDL based on our best guess of intended value in FERC1 freeform strings.
construction_year	integer	Year the plant’s oldest still operational unit was built.
installation_year	integer	Year the plant’s most recently built unit was installed.
net_generation_mwh	number	Net electricity generation for the specified period in megawatt-hours (MWh).
not_water_limited_capacity_mw	number	Plant capacity in MW when not limited by condenser water.
opex_allowances	number	Allowances.
opex_boiler	number	Maintenance of boiler (or reactor) plant.
opex_coolants	number	Cost of coolants and water (nuclear plants only)
opex_electric	number	Production expenses: electric expenses (USD).
opex_engineering	number	Production expenses: maintenance, supervision, and engineering (USD).
opex_fuel	number	Production expenses: fuel (USD).
opex_misc_power	number	Miscellaneous steam (or nuclear) expenses.
opex_misc_steam	number	Maintenance of miscellaneous steam (or nuclear) plant.
opex_operations	number	Production expenses: operations, supervision, and engineering (USD).
opex_per_mwh	number	Total production expenses (USD per MWh generated).
opex_plants	number	Maintenance of electrical plant.
opex_production_total	number	Total operating expenses.
opex_rents	number	Production expenses: rents (USD).
opex_steam	number	Steam expenses.
opex_steam_other	number	Steam from other sources.
opex_structures	number	Production expenses: maintenance of structures (USD).
opex_transfer	number	Steam transferred (Credit).
peak_demand_mw	number	Net peak demand for 60 minutes. Note: in some cases peak demand for other time periods may have been reported instead, if hourly peak demand was unavailable.
plant_capability_mw	number	Net plant capability in megawatts.
plant_hours_connected_while_generating	number	Hours the plant was connected to load while generating in the report year.
plant_name_ferc1	string	Name of the plant, as reported to FERC. This is a freeform string, not guaranteed to be consistent across references to the same plant.
plant_type	string	Type of plant.
record_id	string	Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.
report_year	integer	Four-digit year in which the data was reported.
utility_id_ferc1	integer	PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.
water_limited_capacity_mw	number	Plant capacity in MW when limited by condenser water.

core_ferc1__yearly_transmission_lines_sched422#

Noticed any irregularities with this table? File a report.

Annual time series of statistics about transmission lines.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 422)
Primary key:: This table has no primary key. Each record of this table is supposed to represent one stretch of a transmission line, but there are no IDs and many nulls in the fields which would nominally distinguish unique transmission lines.

Usage Warnings#

Contains columns which may appear categorical, but are actually free text.
Some columns contain subtotals; use caution when choosing columns to aggregate.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

Information describing transmission lines, the cost of lines, annual operating and capital expenses, etc. This table includes transmission lines having nominal voltage of 132 kilovolts or greater. Transmission lines below these voltages are required to be reported in group totals only for each voltage.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_transmission_lines_sched422.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_transmission_lines_sched422.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc1__yearly_transmission_lines_sched422.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_transmission_lines_sched422.parquet"
)

Columns#

Field Name	Type	Description
capex_land	number	Cost of Land and land rights for the transmission line.
capex_other	number	Construction and other costs for the transmission line.
capex_total	number	Total costs for the transmission line.
conductor_size_and_material	string	Size of transmission conductor and material of the transmission line.
designed_voltage_kv	number	Manufactured (Designed) voltage, expressed in kilo-volts, for three-phase 60 cycle alternative current transmission lines
end_point	string	The end point of a transmission line.
num_transmission_circuits	integer	Number of circuits in a transmission line.
operating_voltage_kv	number	The operating voltage, expressed kilo-volts, for three-phase 60 cycle alternative current transmission lines.
opex_maintenance	number	Maintenance expenses for the transmission line.
opex_operations	number	Operating expenses for the transmission line.
opex_rents	number	Rent expenses for the transmission line.
opex_total	number	Overall expenses for the transmission line.
record_id	string	Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.
report_year	integer	Four-digit year in which the data was reported.
start_point	string	The starting point of a transmission line.
supporting_structure_type	string	Supporting structure of the transmission line.
transmission_line_and_structures_length_miles	number	Length (in pole miles or circuit miles (if transmission lines are underground)) for lines that are agrregated with other lines / structures (whose cost are aggregated and combined with other structures).
transmission_line_length_miles	number	Length (in pole miles or circuit miles (if transmission lines are underground)) for lines that are stand alone structures (whose cost are reported on a stand-alone basis).
utility_id_ferc1	integer	PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.

core_ferc1__yearly_utility_plant_summary_sched200#

Noticed any irregularities with this table? File a report.

Annual time series of utility plant and accumulated provisions for depreciation, amortization and depletion of utility plant assets.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 200)
Primary key:: utility_id_ferc1, report_year, utility_type, utility_plant_asset_type

Usage Warnings#

Some columns contain subtotals; use caution when choosing columns to aggregate.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_utility_plant_summary_sched200.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_utility_plant_summary_sched200.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc1__yearly_utility_plant_summary_sched200.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_utility_plant_summary_sched200.parquet"
)

Columns#

Field Name	Type	Description
ending_balance	number	Account balance at end of year.
record_id	string	Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.
report_year	integer	Four-digit year in which the data was reported.
row_type_xbrl	string	Indicates whether the value reported in the row is calculated, or uniquely reported within the table.
utility_id_ferc1	integer	PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.
utility_plant_asset_type	string	Type of utility plant asset reported in the core_ferc1__yearly_utility_plant_summary_sched200 table. Assets include those leased to others, held for future use, construction work-in-progress and details of accumulated depreciation.
utility_type	string	Listing of utility plant types. Examples include Electric Utility, Gas Utility, and Other Utility.
utility_type_other	string	Freeform description of type of utility reported in one of the other three other utility_type sections in the core_ferc1__yearly_utility_plant_summary_sched200 table. This field is reported only in the DBF reporting years (1994-2020).

core_ferc714__hourly_planning_area_demand#

Noticed any irregularities with this table? File a report.

Hourly time series of electricity demand by planning area.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: FERC Form 714 – Annual Electric Balancing Authority Area and Planning Area Report (Part III, Schedule 2a)
Primary key:: respondent_id_ferc714, datetime_utc

Usage Warnings#

The datetime_utc timestamps have been cleaned due to inconsistent datetime reporting. See below for additional details.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

This table includes data from the pre-2021 CSV raw source as well as the newer 2021 through present XBRL raw source.

This table includes three respondent ID columns: one from the CSV raw source, one from the XBRL raw source and another that is PUDL-derived that links those two source ID’s together. This table has filled in source IDs for all records so you can select the full timeseries for a given respondent from any of these three IDs.

An important caveat to note is that there was some cleaning done to the datetime_utc timestamps. The Form 714 includes sparse documentation for respondents for how to interpret timestamps - the form asks respondents to provide 24 instances of hourly demand for each day. The form is labeled with hour 1-24. There is no indication if hour 1 begins at midnight.

The XBRL data contained several formats of timestamps. Most records corresponding to hour 1 of the Form have a timestamp with hour 1 as T1. About two thirds of the records in the hour 24 location of the form have a timestamp with an hour reported as T24 while the remaining third report this as T00 of the next day. T24 is not a valid format for the hour of a datetime, so we convert these T24 hours into T00 of the next day. A smaller subset of the respondents reports the 24th hour as the last second of the day - we also convert these records to the T00 of the next day.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc714__hourly_planning_area_demand.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc714__hourly_planning_area_demand.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc714__hourly_planning_area_demand.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc714__hourly_planning_area_demand.parquet"
)

Columns#

Field Name	Type	Description
datetime_utc	datetime	Date and time converted to Coordinated Universal Time (UTC).
demand_mwh	number	Electricity demand (energy) within a given timeframe.
report_date	date	Date reported.
respondent_id_ferc714	integer	PUDL-assigned identifying a respondent to FERC Form 714. This ID associates natively reported respondent IDs from the original CSV and XBRL data sources.
respondent_id_ferc714_csv	integer	FERC Form 714 respondent ID from CSV reported data - published from years: 2006-2020. This ID is linked to the newer years of reported XBRL data through the PUDL-assigned respondent_id_ferc714 ID. This ID was originally reported as respondent_id. Note that this ID does not correspond to FERC respondent IDs from other forms.
respondent_id_ferc714_xbrl	string	FERC Form 714 respondent ID from XBRL reported data - published from years: 2021-present. This ID is linked to the older years of reported CSV data through the PUDL-assigned respondent_id_ferc714 ID. This ID was originally reported as entity_id. Note that this ID does not correspond to FERC respondent IDs from other forms.
timezone	string	IANA timezone name

core_ferc714__respondent_id#

Noticed any irregularities with this table? File a report.

Respondent identification.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: FERC Form 714 – Annual Electric Balancing Authority Area and Planning Area Report (Part I, Schedule 1)
Primary key:: respondent_id_ferc714

Usage Warnings#

FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc714__respondent_id.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc714__respondent_id.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc714__respondent_id.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc714__respondent_id.parquet"
)

Columns#

Field Name	Type	Description
eia_code	integer	EIA utility or balancing area authority ID associated with this FERC Form 714 respondent. Note that many utilities are also balancing authorities and in many cases EIA uses the same integer ID to identify a utility in its role as a balancing authority AND as a utility, but there is no requirement that these IDs be the same, and in a number of cases they are different.
respondent_id_ferc714	integer	PUDL-assigned identifying a respondent to FERC Form 714. This ID associates natively reported respondent IDs from the original CSV and XBRL data sources.
respondent_id_ferc714_csv	integer	FERC Form 714 respondent ID from CSV reported data - published from years: 2006-2020. This ID is linked to the newer years of reported XBRL data through the PUDL-assigned respondent_id_ferc714 ID. This ID was originally reported as respondent_id. Note that this ID does not correspond to FERC respondent IDs from other forms.
respondent_id_ferc714_xbrl	string	FERC Form 714 respondent ID from XBRL reported data - published from years: 2021-present. This ID is linked to the older years of reported CSV data through the PUDL-assigned respondent_id_ferc714 ID. This ID was originally reported as entity_id. Note that this ID does not correspond to FERC respondent IDs from other forms.
respondent_name_ferc714	string	Name of the utility, balancing area authority, or planning authority responding to FERC Form 714.

core_ferc714__yearly_planning_area_demand_forecast#

Noticed any irregularities with this table? File a report.

Annual time series of 10-year forecasted summer and winter peak demand and annual net energy per planning area.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: FERC Form 714 – Annual Electric Balancing Authority Area and Planning Area Report (Part III, Schedule 2b)
Primary key:: respondent_id_ferc714, report_year, forecast_year

Usage Warnings#

This table includes data from the pre-2021 CSV raw source as well as the newer 2021 through present XBRL raw source. We created the respondent_id_ferc714 field to blend disparate IDs from the CSV and XBRL data over time. See the core_ferc714__respondent_id table for links to the original source IDs.
There is a small handful of respondents (~11) that report more than 10 years and an even smaller handful that report less than 10 (~9).
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

This table contains forecasted net demand (MWh) as well as summer and winter peak demand (MW) for the next ten years after after the report_year.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc714__yearly_planning_area_demand_forecast.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc714__yearly_planning_area_demand_forecast.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc714__yearly_planning_area_demand_forecast.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc714__yearly_planning_area_demand_forecast.parquet"
)

Columns#

Field Name	Type	Description
forecast_year	integer	Four-digit year that applies to a particular forecasted value.
net_demand_forecast_mwh	number	Net forecasted electricity demand for the specific period in megawatt-hours (MWh).
report_year	integer	Four-digit year in which the data was reported.
respondent_id_ferc714	integer	PUDL-assigned identifying a respondent to FERC Form 714. This ID associates natively reported respondent IDs from the original CSV and XBRL data sources.
summer_peak_demand_forecast_mw	number	The maximum forecasted hourly sumemr load (for the months of June through September).
winter_peak_demand_forecast_mw	number	The maximum forecasted hourly winter load (for the months of January through March).

core_ferc__codes_accounts#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for account numbers from the FERC Uniform System of Accounts for Electric Plant.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: FERC – Mix of multiple FERC Forms
Primary key:: ferc_account_id

Usage Warnings#

FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

These codes are defined in Code of Federal Regulations (CFR) Title 18, Chapter I, Subchapter C, Part 101. (See e.g. https://www.law.cornell.edu/cfr/text/18/part-101).

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc__codes_accounts.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc__codes_accounts.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc__codes_accounts.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc__codes_accounts.parquet"
)

Columns#

Field Name	Type	Description
ferc_account_description	string	Description of the FERC account.
ferc_account_id	string	Account identifier from FERC’s Uniform System of Accounts for Electric Plant. Includes higher level labeled categories.

core_ferc__entity_companies#

Noticed any irregularities with this table? File a report.

Entity table containing static information about the companies that submit required filings to FERC.

Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: FERC – Mix of multiple FERC Forms
Primary key:: company_id_ferc

Usage Warnings#

FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc__entity_companies.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc__entity_companies.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc__entity_companies.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc__entity_companies.parquet"
)

Columns#

Field Name	Type	Description
address_2	string	Second line of the address.
city	string	Name of the city.
company_id_ferc	string	The Company Identifier (CID) obtained through FERC’s Company Registration system.
company_name	string	The name of the FERC-reporting organization or company.
company_website	string	The website URL of the company, which can provide additional information about the organization.
program	string	The specific program or initiative associated with the FERC organization, which can provide context for the company’s activities.
state	string	Two letter US state abbreviation.
street_address	string	Physical street address.
zip_code	string	Five digit US Zip Code.
zip_code_4	string	Four digit US Zip Code suffix.

core_ferceqr__contracts#

Noticed any irregularities with this table? File a report.

Contains information about contracts between companies selling and buying electricity market products.

Most-recent data:: 2026q1
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: FERC Form 920 – Electric Quarterly Report (EQR)
Primary key:: This table has no primary key. The FERC EQR contracts table has no natural primary key.

Usage Warnings#

This table is experimental and/or a work in progress and may change in the future.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Access methods#

Browse and query this table online

Note

This table is partitioned into multiple Parquet files which can be treated as a single Parquet dataset.

pandas

import pandas as pd
# Select all Parquet files under this bucket prefix.
df = pd.read_parquet(
    "s3://pudl.catalyst.coop/ferceqr/core_ferceqr__contracts/",
    dtype_backend="pyarrow",
)

polars

import polars as pl
# Select all Parquet files under this bucket prefix.
df = pl.read_parquet(
    "s3://pudl.catalyst.coop/ferceqr/core_ferceqr__contracts/",
    storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

-- Use a wildcard to select all parquet files with this bucket prefix.
SELECT * FROM 's3://pudl.catalyst.coop/ferceqr/core_ferceqr__contracts/*.parquet';

R

library(arrow)
library(dplyr)

# Establish S3 connection
bucket <- s3_bucket("pudl.catalyst.coop/ferceqr/core_ferceqr__contracts")

# Read all parquet files in that prefix
df <- open_dataset(bucket) |> collect()

Columns#

Field Name	Type	Description
actual_termination_date	date	The date the contract actually terminates.
begin_date	datetime	First date and time for the sale of the product at the rate specified.
class_name	string	F - Firm: For transmission sales, a service or product that always has priority over non-firm service. For power sales, a service or product that is not interruptible for economic reasons. NF - Non-firm: For transmission sales, a service that is reserved and/or scheduled on an as-available basis and is subject to curtailment or interruption at a lesser priority compared to Firm service. For an energy sale, a service or product for which delivery or receipt of the energy may be interrupted for any reason or no reason, without liability on the part of either the buyer or seller. UP - Unit Power Sale: Designates a dedicated sale of energy and capacity from one or more than one specified generation unit(s). N/A: To be used only when the other available Class Names do not apply.
commencement_date_of_contract_term	date	The date the terms of the contract reported in fields 18, 23 and 25 through 44 (as defined in the data dictionary) became effective. If those terms became effective on multiple dates (i.e., due to one or more amendments), the date to be reported in this field is the date the most recent amendment became effective. If the contract or the most recent reported amendment does not have an effective date, the date when service began pursuant to the contract or most recent reported amendment may be used. If the terms reported in fields 18, 23 and 25 through 44 have not been amended since January 1, 2009, the initial date the contract became effective (or absent an effective date the initial date when service began) may be used.
contract_affiliate	boolean	If True, this field indicates the customer is an affiliate of the seller. The customer is an affiliate if it controls, is controlled by, or is under common control with the seller. This includes a division that operates as a functional unit. A customer of a seller who is an Exempt Wholesale Generator may be defined as an affiliate under the Public Utility Holding Company Act and the FPA.
contract_execution_date	date	The date the contract was signed. If the parties signed on different dates, use the most recent date signed.
contract_service_agreement_id	string	Unique identifier given to each service agreement that can be used by the Seller to produce the agreement, if requested. The identifier may be the number assigned by FERC for those service agreements that have been filed with and accepted by the Commission, or it may be generated as part of an internal identification system.
contract_termination_date	date	The date that the contract expires.
contract_unique_id	string	An identifier beginning with the letter “C” and followed by a number (e.g., ‘C1’, ‘C2’) used to designate a record containing contract information. Note that these contract IDs may only be unique within the context of a particular seller, seller-buyer pair, or timeframe. FERC documentation of the field is limited.
customer_company_name	string	The name of the purchaser of contract products and services. Unlike the seller_company_name this name is not guaranteed to match a name in the core_ferceqr__quarterly_identity name. In addition, the same customer company may appear with different names in different contracts and transactions, since this field is an unconstrained string chosen by the seller.
end_date	datetime	Last date and time for the sale of the product at the rate specified.
extension_provision_description	string	Description of terms that provide for the continuation of the contract.
ferc_tariff_reference	string	The FERC tariff reference cites the document that specifies the terms and conditions under which a Seller is authorized to make transmission sales, power sales or sales of related jurisdictional services at cost-based rates or at market-based rates. If the sales are market-based, the tariff that is specified in the FERC order granting the Seller Market Based Rate Authority must be listed. If a non-public utility does not have a FERC Tariff Reference, it should enter “NPU” for the FERC Tariff Reference.
increment_name	string	5: Terms of the contract (if specifically noted in the contract) set for more than 0 minutes and less than or equal to 5 minutes (> 0 and ≤ 5 minutes). 15: Terms of the contract (if specifically noted in the contract) set for more than 5 minutes and less than or equal to 15 minutes (> 5 and ≤ 15 minutes). H: Terms of the contract (if specifically noted in the contract) set for more than 15 minutes and less than or equal to 6 hours (> 15 minutes and ≤ 6 hours). D: Terms of the contract (if specifically noted in the contract) set for more than 6 and up to 60 hours (> 6 and ≤ 60 hours). W: Terms of the contract (if specifically noted in the contract) set for over 60 hours and up to 168 hours (> 60 and ≤ 168 hours). M: Terms of the contract (if specifically noted in the contract) set for more than 168 hours up to, but not including, one year (> 168 hours and < 1 year). Y: Terms of the contract (if specifically noted in the contract) set for one year or more (≥ 1 year). N/A: Terms of the contract do not specify an increment.
increment_peaking_name	string	FP: The product described may be sold during those hours designated as on-peak and off-peak at the point of delivery. OP: The product described may be sold only during those hours designated as off-peak at the point of delivery. P: The product described may be sold only during those hours designated as on-peak at the point of delivery. N/A: To be used only when the increment peaking name is not specified in the contract.
point_of_delivery_balancing_authority	string	The registered Balancing Authority (formerly called NERC Control Area) where a jurisdictional product is delivered and/or service ends for a transmission or transmission-related jurisdictional sale. The Balancing Authority will be identified with the abbreviation used in OASIS applications. If delivery occurs at the interconnection of two control areas, the control area that the product is entering should be used. If delivery occurs at a trading hub, the term ‘Hub’ should be used.
point_of_delivery_specific_location	string	The specific location at which the product is delivered if designated in the contract. If receipt occurs at a trading hub, a standardized hub name must be used.
point_of_receipt_balancing_authority	string	The registered Balancing Authority (formerly called NERC Control Area) where service begins for a transmission or transmission-related jurisdictional sale. The Balancing Authority will be identified with the abbreviation used in OASIS applications. If receipt occurs at a trading hub, the term ‘Hub’ should be used.
point_of_receipt_specific_location	string	The specific location at which the product is received if designated in the contract. If receipt occurs at a trading hub, a standardized hub name must be used. If more points of receipt are listed in the contract than can fit into the 50 character space, a description of the collection of points may be used. ‘Various’ alone, is unacceptable unless the contract itself uses that terminology.
product_name	string	Description of product being offered. Note that allowed values differ slightly from those in core_ferceqr__transactions. BLACK START SERVICE: Service available after a system-wide blackout where a generator participates in system restoration activities without the availability of an outside electric supply (Ancillary Service). CAPACITY: A quantity of demand that is charged on a $/KW or $/MW basis. CUSTOMER CHARGE: Fixed contractual charges assessed on a per customer basis that could include billing service. DIRECT ASSIGNMENT FACILITIES CHARGE: Charges for facilities or portions of facilities that are constructed or used for the sole use/benefit of a particular customer. EMERGENCY ENERGY: Contractual provisions to supply energy or capacity to another entity during critical situations. ENERGY: A quantity of electricity that is sold or transmitted over a period of time. ENERGY IMBALANCE: Service provided when a difference occurs between the scheduled and the actual delivery of energy to a load obligation (Ancillary Service). For Contracts, reported if the contract provides for sale of the product. For Transactions, sales by third-party providers (i.e., non-transmission function) are reported. EXCHANGE: Transaction whereby the receiver accepts delivery of energy for a supplier’s account and returns energy at times, rates, and in amounts as mutually agreed if the receiver is not an RTO/ISO. FUEL CHARGE: Charge based on the cost or amount of fuel used for generation. GENERATOR IMBALANCE: Service provided when a difference occurs between the output of a generator located in the Transmission Provider’s Control Area and a delivery schedule from that generator to (1) another Control Area or (2) a load within the Transmission Provider’s Control Area over a single hour (Ancillary Service). For Contracts, reported if the contract provides for sale of the product. For Transactions, sales by third-party providers (i.e., non-transmission function) are reported. GRANDFATHERED BUNDLED: Services provided for bundled transmission, ancillary services and energy under contracts effective prior to Order No. 888’s OATTs. INTERCONNECTION AGREEMENT: Contract that provides the terms and conditions for a generator, distribution system owner, transmission owner, transmission provider, or transmission system to physically connect to a transmission system or distribution system. MEMBERSHIP AGREEMENT: Agreement to participate and be subject to rules of a system operator. MUST RUN AGREEMENT: An agreement that requires a unit to run. NEGOTIATED-RATE TRANSMISSION: Transmission performed under a negotiated rate contract (applies only to merchant transmission companies). NETWORK: Transmission service under contract providing network service. NETWORK OPERATING AGREEMENT: An executed agreement that contains the terms and conditions under which a network customer operates its facilities and the technical and operational matters associated with the implementation of network integration transmission service. OTHER: Product name not otherwise included. POINT-TO-POINT AGREEMENT: Transmission service under contract between specified Points of Receipt and Delivery. PRIMARY FREQUENCY RESPONSE: Service provided as a stand-by resource to support autonomous, pre-programmed changes in output to rapidly arrest large changes in frequency until dispatched resources can take over. REACTIVE SUPPLY & VOLTAGE CONTROL: Production or absorption of reactive power to maintain voltage levels on transmission systems (Ancillary Service). REAL POWER TRANSMISSION LOSS: The loss of energy, resulting from transporting power over a transmission system. REASSIGNMENT AGREEMENT: Transmission capacity reassignment agreement. REGULATION & FREQUENCY RESPONSE: Service providing for continuous balancing of resources (generation and interchange) with load, and for maintaining scheduled interconnection frequency by committing on-line generation where output is raised or lowered and by other non-generation resources capable of providing this service as necessary to follow the moment-by-moment changes in load (Ancillary Service). For Contracts, reported if the contract provides for sale of the product. For Transactions, sales by third-party providers (i.e., non-transmission function) are reported. REQUIREMENTS SERVICE: Firm, load-following power supply necessary to serve a specified share of customer’s aggregate load during the term of the agreement. Requirements service may include some or all of the energy, capacity and ancillary service products. SCHEDULE SYSTEM CONTROL & DISPATCH: Scheduling, confirming and implementing an interchange schedule with other Balancing Authorities, including intermediary Balancing Authorities providing transmission service, and ensuring operational security during the interchange transaction (Ancillary Service). SPINNING RESERVE: Unloaded synchronized generating capacity that is immediately responsive to system frequency and that is capable of being loaded in a short time period or non-generation resources capable of providing this service (Ancillary Service). For Contracts, reported if the contract provides for sale of the product. For Transactions, sales by third-party providers (i.e., non-transmission function) are reported. SUPPLEMENTAL RESERVE: Service needed to serve load in the event of a system contingency, available with greater delay than SPINNING RESERVE. This service may be provided by generating units that are on-line but unloaded, by quick-start generation, or by interruptible load or other non-generation resources capable of providing this service (Ancillary Service). For Contracts, reported if the contract provides for sale of the product. For Transactions, sales by third-party providers (i.e., non-transmission function) are reported. SYSTEM OPERATING AGREEMENTS: An executed agreement that contains the terms and conditions under which a system or network customer shall operate its facilities and the technical and operational matters associated with the implementation of network. TOLLING ENERGY: Energy sold from a plant whereby the buyer provides fuel to a generator (seller) and receives power in return for pre-established fees. TRANSMISSION OWNERS AGREEMENT: The agreement that establishes the terms and conditions under which a transmission owner transfers operational control over designated transmission facilities. UPLIFT: A make-whole payment by an RTO/ISO to a utility.
product_type_name	string	CB: Energy, capacity or ancillary services sold under a FERC-approved cost-based rate tariff. CR: An agreement under which a transmission provider sells, assigns or transfers all or portion of its rights to an eligible customer. CR-AD: Transmission capacity reassignments reported in Atlantic Daylight time. CR-AP: Transmission capacity reassignments reported in Atlantic Prevailing time. CR-AS: Transmission capacity reassignments reported in Atlantic Standard time. CR-CD: Transmission capacity reassignments reported in Central Daylight time. CR-CP: Transmission capacity reassignments reported in Central Prevailing time. CR-CS: Transmission capacity reassignments reported in Central Standard time. CR-ED: Transmission capacity reassignments reported in Eastern Daylight time. CR-EP: Transmission capacity reassignments reported in Eastern Prevailing time. CR-ES: Transmission capacity reassignments reported in Eastern Standard time. CR-MD: Transmission capacity reassignments reported in Mountain Daylight time. CR-MP: Transmission capacity reassignments reported in Mountain Prevailing time. CR-MS: Transmission capacity reassignments reported in Mountain Standard time. CR-PD: Transmission capacity reassignments reported in Pacific Daylight time. CR-PP: Transmission capacity reassignments reported in Pacific Prevailing time. CR-PS: Transmission capacity reassignments reported in Pacific Standard time. MB: Energy, capacity or ancillary services sold under the seller’s FERC-approved market-based rate tariff. T: The product is sold under a FERC-approved transmission tariff. NPU: The product is sold by a non-public utility that is required to file the EQR under section 220 of the Federal Power Act. OTHER: The product cannot be characterized by the other product type names.
quantity	number	Quantity for the contract product identified.
rate	number	The charge for the product per unit as stated in the contract.
rate_description	string	Text description of rate. If the rate is currently available on the FERC website, a citation of the FERC Accession Number and the relevant FERC tariff including page number or section may be included instead of providing the entire rate algorithm. If the rate is not available on the FERC website, include the rate algorithm, if rate is calculated. If the algorithm would exceed the 300 character field limit, it may be provided in a descriptive summary (including bases and methods of calculations) with a detailed citation of the relevant FERC tariff including page number and section.
rate_maximum	number	Maximum rate to be charged per the contract, if a range is specified.
rate_minimum	number	Minimum rate to be charged per the contract, if a range is specified.
rate_units	string	Measure stated in the contract for the product sold. FERC EQR.
seller_company_id_ferc	string	The Company Identifier (CID) obtained through FERC’s Company Registration system corresponding to the selling company.
seller_company_name	string	The name of the company that is authorized to make sales as indicated in the company’s FERC tariff(s) or that is required to file the EQR under section 220 of the Federal Power Act. This name must match the name provided as seller_company_name in the core_ferceqr__quarterly_identity table. There are a handful of cases in which this requirement is violated, so any joins between tables should rely on company_id_ferc, not the company names.
term_name	string	Contracts with durations of one year or greater are long-term (LT). Contracts with shorter durations are short-term (ST).
units	string	Measure stated in the contract for the product sold.
year_quarter	string	Year-quarter corresponding to record. Formatted like YYYYq{1-4}.

core_ferceqr__quarterly_identity#

Noticed any irregularities with this table? File a report.

Quarterly time series of individuals who filed FERC EQR for a company in a given quarter.

Most-recent data:: 2026q1
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: FERC Form 920 – Electric Quarterly Report (EQR)
Primary key:: This table has no primary key. The primary key ought to be [‘year_quarter’, ‘company_id_ferc’, ‘filer_unique_id’], where filer_unique_id is an employee-level ID. However, a handful of companies have erroneously reported the same filer_unique_id for multiple employees, resulting in duplicate records. In other cases, there appear to be multiple filings in a given quarter for the same company and filer, resulting in additional duplicates. Thus, there is no reliable natural primary key for the identity table.

Usage Warnings#

This table is experimental and/or a work in progress and may change in the future.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Access methods#

Browse and query this table online

Note

This table is partitioned into multiple Parquet files which can be treated as a single Parquet dataset.

pandas

import pandas as pd
# Select all Parquet files under this bucket prefix.
df = pd.read_parquet(
    "s3://pudl.catalyst.coop/ferceqr/core_ferceqr__quarterly_identity/",
    dtype_backend="pyarrow",
)

polars

import polars as pl
# Select all Parquet files under this bucket prefix.
df = pl.read_parquet(
    "s3://pudl.catalyst.coop/ferceqr/core_ferceqr__quarterly_identity/",
    storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

-- Use a wildcard to select all parquet files with this bucket prefix.
SELECT * FROM 's3://pudl.catalyst.coop/ferceqr/core_ferceqr__quarterly_identity/*.parquet';

R

library(arrow)
library(dplyr)

# Establish S3 connection
bucket <- s3_bucket("pudl.catalyst.coop/ferceqr/core_ferceqr__quarterly_identity")

# Read all parquet files in that prefix
df <- open_dataset(bucket) |> collect()

Columns#

Field Name	Type	Description
company_id_ferc	string	The Company Identifier (CID) obtained through FERC’s Company Registration system.
company_name	string	Name of the reporting company.
contact_address	string	Street address for contact identified in contact_name.
contact_city	string	City for contact identified in contact_name.
contact_country_name	string	Country (USA, Canada, Mexico, or United Kingdom) for contact identified in contact_name.
contact_email	string	Email for contact identified in contact_name.
contact_name	string	(Seller) – The name of the contact for the company authorized to make sales as indicated in the company’s FERC tariff(s) or that is required to file the EQR under section 220 of the Federal Power Act. (Agent) – Name of the person who prepared the filing
contact_phone	string	Phone number for contact identified in contact_name.
contact_state	string	State for contact identified in contact_name.
contact_title	string	Title of of utility contact 1.
contact_zip	string	Zip code for contact identified in contact_name.
filer_unique_id	string	(Seller) An identifier (e.g., “FS1”, “FS2”) used to designate a record containing Seller identification information. One record for each seller company must be included in an EQR for a given quarter. (Agent) – An identifier (i.e., “FA1”) used to designate a record containing Agent identification information. One record with the FA1 identifier must be included in an EQR for a given quarter.
transactions_reported_to_index_price_publishers	boolean	Sellers should indicate whether they have reported their sales transactions to index price publisher(s). If they have, Sellers should indicate specifically which index publisher(s) in Field Number 73.
year_quarter	string	Year-quarter corresponding to record. Formatted like YYYYq{1-4}.

core_ferceqr__quarterly_index_pub#

Noticed any irregularities with this table? File a report.

Quarterly time series of electricity market price indices that individual EQR filers reported transactions to.

Most-recent data:: 2026q1
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: FERC Form 920 – Electric Quarterly Report (EQR)
Primary key:: year_quarter, company_id_ferc, filer_unique_id

Usage Warnings#

This table is experimental and/or a work in progress and may change in the future.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Access methods#

Browse and query this table online

Note

This table is partitioned into multiple Parquet files which can be treated as a single Parquet dataset.

pandas

import pandas as pd
# Select all Parquet files under this bucket prefix.
df = pd.read_parquet(
    "s3://pudl.catalyst.coop/ferceqr/core_ferceqr__quarterly_index_pub/",
    dtype_backend="pyarrow",
)

polars

import polars as pl
# Select all Parquet files under this bucket prefix.
df = pl.read_parquet(
    "s3://pudl.catalyst.coop/ferceqr/core_ferceqr__quarterly_index_pub/",
    storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

-- Use a wildcard to select all parquet files with this bucket prefix.
SELECT * FROM 's3://pudl.catalyst.coop/ferceqr/core_ferceqr__quarterly_index_pub/*.parquet';

R

library(arrow)
library(dplyr)

# Establish S3 connection
bucket <- s3_bucket("pudl.catalyst.coop/ferceqr/core_ferceqr__quarterly_index_pub")

# Read all parquet files in that prefix
df <- open_dataset(bucket) |> collect()

Columns#

Field Name	Type	Description
company_id_ferc	string	The Company Identifier (CID) obtained through FERC’s Company Registration system.
filer_unique_id	string	(Seller) An identifier (e.g., “FS1”, “FS2”) used to designate a record containing Seller identification information. One record for each seller company must be included in an EQR for a given quarter. (Agent) – An identifier (i.e., “FA1”) used to designate a record containing Agent identification information. One record with the FA1 identifier must be included in an EQR for a given quarter.
index_price_publisher_name	string	Name of index price publisher, which can be one of the following: AM - Argus Media, EIG - Energy Intelligence Group, Inc., IP - Intelligence Press, P - Platts, B - Bloomberg, PDX - Powerdex, SNL - SNL Energy
seller_company_name	string	The name of the company that is authorized to make sales as indicated in the company’s FERC tariff(s) or that is required to file the EQR under section 220 of the Federal Power Act. This name must match the name provided as seller_company_name in the core_ferceqr__quarterly_identity table. There are a handful of cases in which this requirement is violated, so any joins between tables should rely on company_id_ferc, not the company names.
transactions_reported	string	Description of the types of transactions reported to the index publisher identified in this record.
year_quarter	string	Year-quarter corresponding to record. Formatted like YYYYq{1-4}.

core_ferceqr__transactions#

Noticed any irregularities with this table? File a report.

Contains information about individual electricity market transactions that took place during a given reporting quarter. Reported by the seller.

Most-recent data:: 2026q1
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: FERC Form 920 – Electric Quarterly Report (EQR)
Primary key:: year_quarter, seller_company_id_ferc, transaction_unique_id

Usage Warnings#

This table is experimental and/or a work in progress and may change in the future.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Access methods#

Browse and query this table online

Note

This table is partitioned into multiple Parquet files which can be treated as a single Parquet dataset.

Warning

This table will likely not fit in your computer’s memory. Make sure to filter or aggregate to reduce the amount of data read in. This is easier in polars and duckdb than in pandas, so we only provide examples for these.

polars

import polars as pl
# Select some data relevant to tracking solar PPA prices
df = (
    # Limit query to data from a single quarterly Parquet file
    pl.scan_parquet(
        "s3://pudl.catalyst.coop/ferceqr/core_ferceqr__transactions/2013q3.parquet",
        storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
    )
    .select(
        [
            "seller_company_name",
            "product_name",
            "price",
            "standardized_price",
            "transaction_quantity",
            "standardized_quantity",
            "total_transmission_charge",
            "total_transaction_charge",
            "rate_units",
            "trade_date"
        ]
    )
    .filter(
        pl.col("seller_company_name").str.contains("(?i)solar")
        & pl.col("standardized_price").is_not_null()
        & (pl.col("product_name") == "ENERGY")
        & (pl.col("rate_units") == "$/MWH")
        & (pl.col("trade_date") >= 0)
    )
    .collect()
)

SQL (DuckDB)

-- Select some data relevant to tracking solar PPA prices
SELECT
    -- Select a subset of columns to reduce network and memory usage
    seller_company_name,
    product_name,
    price,
    standardized_price,
    transaction_quantity,
    standardized_quantity,
    total_transmission_charge,
    total_transaction_charge,
    rate_units,
    trade_date
-- Limit to 2013Q3 data by querying only a single Parquet file
FROM 's3://pudl.catalyst.coop/ferceqr/core_ferceqr__transactions/2013q3.parquet'
WHERE seller_company_name ILIKE '%solar%'
    AND standardized_price IS NOT NULL
    AND product_name = 'ENERGY'
    AND rate_units = '$/MWH'
    AND trade_date >= '1970-01-01'::DATE;

Columns#

Field Name	Type	Description
class_name	string	F - Firm: For transmission sales, a service or product that always has priority over non-firm service. For power sales, a service or product that is not interruptible for economic reasons. NF - Non-firm: For transmission sales, a service that is reserved and/or scheduled on an as-available basis and is subject to curtailment or interruption at a lesser priority compared to Firm service. For an energy sale, a service or product for which delivery or receipt of the energy may be interrupted for any reason or no reason, without liability on the part of either the buyer or seller. UP - Unit Power Sale: Designates a dedicated sale of energy and capacity from one or more than one specified generation unit(s). N/A: To be used only when the other available Class Names do not apply.
contract_service_agreement_id	string	Unique identifier given to each service agreement that can be used by the Seller to produce the agreement, if requested. The identifier may be the number assigned by FERC for those service agreements that have been filed with and accepted by the Commission, or it may be generated as part of an internal identification system.
customer_company_name	string	The name of the purchaser of contract products and services. Unlike the seller_company_name this name is not guaranteed to match a name in the core_ferceqr__quarterly_identity name. In addition, the same customer company may appear with different names in different contracts and transactions, since this field is an unconstrained string chosen by the seller.
exchange_brokerage_service	string	If a broker service is used to consummate or effectuate a transaction, the term “Broker” shall be provided. If an exchange is used, the specific exchange that is used shall be selected from the Commission-provided list. Allowed values include BROKER, ICE, NODAL, and NYMEX.
ferc_tariff_reference	string	The FERC tariff reference cites the document that specifies the terms and conditions under which a Seller is authorized to make transmission sales, power sales or sales of related jurisdictional services at cost-based rates or at market-based rates. If the sales are market-based, the tariff that is specified in the FERC order granting the Seller Market Based Rate Authority must be listed. If a non-public utility does not have a FERC Tariff Reference, it should enter “NPU” for the FERC Tariff Reference.
increment_name	string	5: Terms of the contract (if specifically noted in the contract) set for more than 0 minutes and less than or equal to 5 minutes (> 0 and ≤ 5 minutes). 15: Terms of the contract (if specifically noted in the contract) set for more than 5 minutes and less than or equal to 15 minutes (> 5 and ≤ 15 minutes). H: Terms of the contract (if specifically noted in the contract) set for more than 15 minutes and less than or equal to 6 hours (> 15 minutes and ≤ 6 hours). D: Terms of the contract (if specifically noted in the contract) set for more than 6 and up to 60 hours (> 6 and ≤ 60 hours). W: Terms of the contract (if specifically noted in the contract) set for over 60 hours and up to 168 hours (> 60 and ≤ 168 hours). M: Terms of the contract (if specifically noted in the contract) set for more than 168 hours up to, but not including, one year (> 168 hours and < 1 year). Y: Terms of the contract (if specifically noted in the contract) set for one year or more (≥ 1 year). N/A: Terms of the contract do not specify an increment.
increment_peaking_name	string	FP: The product described may be sold during those hours designated as on-peak and off-peak at the point of delivery. OP: The product described may be sold only during those hours designated as off-peak at the point of delivery. P: The product described may be sold only during those hours designated as on-peak at the point of delivery. N/A: To be used only when the increment peaking name is not specified in the contract.
point_of_delivery_balancing_authority	string	The registered Balancing Authority (formerly called NERC Control Area) where a jurisdictional product is delivered and/or service ends for a transmission or transmission-related jurisdictional sale. The Balancing Authority will be identified with the abbreviation used in OASIS applications. If delivery occurs at the interconnection of two control areas, the control area that the product is entering should be used. If delivery occurs at a trading hub, the term ‘Hub’ should be used.
point_of_delivery_specific_location	string	The specific location at which the product is delivered if designated in the contract. If receipt occurs at a trading hub, a standardized hub name must be used.
price	number	Actual price charged for the product per unit. The price reported cannot be averaged or otherwise aggregated.
product_name	string	Description of product being offered. Note that allowed values differ slightly from those in core_ferceqr__contracts. BLACK START SERVICE: Service available after a system-wide blackout where a generator participates in system restoration activities without the availability of an outside electric supply (Ancillary Service). BOOKED OUT POWER: Energy or capacity contractually committed bilaterally for delivery but not actually delivered due to some offsetting or countervailing trade (Transaction only). CAPACITY: A quantity of demand that is charged on a $/KW or $/MW basis. CUSTOMER CHARGE: Fixed contractual charges assessed on a per customer basis that could include billing service. ENERGY: A quantity of electricity that is sold or transmitted over a period of time. ENERGY IMBALANCE: Service provided when a difference occurs between the scheduled and the actual delivery of energy to a load obligation (Ancillary Service). For Contracts, reported if the contract provides for sale of the product. For Transactions, sales by third-party providers (i.e., non-transmission function) are reported. EXCHANGE: Transaction whereby the receiver accepts delivery of energy for a supplier’s account and returns energy at times, rates, and in amounts as mutually agreed if the receiver is not an RTO/ISO. FUEL CHARGE: Charge based on the cost or amount of fuel used for generation. GENERATOR IMBALANCE: Service provided when a difference occurs between the output of a generator located in the Transmission Provider’s Control Area and a delivery schedule from that generator to (1) another Control Area or (2) a load within the Transmission Provider’s Control Area over a single hour (Ancillary Service). For Contracts, reported if the contract provides for sale of the product. For Transactions, sales by third-party providers (i.e., non-transmission function) are reported. GRANDFATHERED BUNDLED: Services provided for bundled transmission, ancillary services and energy under contracts effective prior to Order No. 888’s OATTs. NEGOTIATED-RATE TRANSMISSION: Transmission performed under a negotiated rate contract (applies only to merchant transmission companies). OTHER: Product name not otherwise included. PRIMARY FREQUENCY RESPONSE: Service provided as a stand-by resource to support autonomous, pre-programmed changes in output to rapidly arrest large changes in frequency until dispatched resources can take over. REACTIVE SUPPLY & VOLTAGE CONTROL: Production or absorption of reactive power to maintain voltage levels on transmission systems (Ancillary Service). REAL POWER TRANSMISSION LOSS: The loss of energy, resulting from transporting power over a transmission system. REGULATION & FREQUENCY RESPONSE: Service providing for continuous balancing of resources (generation and interchange) with load, and for maintaining scheduled interconnection frequency by committing on-line generation where output is raised or lowered and by other non-generation resources capable of providing this service as necessary to follow the moment-by-moment changes in load (Ancillary Service). For Contracts, reported if the contract provides for sale of the product. For Transactions, sales by third-party providers (i.e., non-transmission function) are reported. REQUIREMENTS SERVICE: Firm, load-following power supply necessary to serve a specified share of customer’s aggregate load during the term of the agreement. Requirements service may include some or all of the energy, capacity and ancillary service products. SCHEDULE SYSTEM CONTROL & DISPATCH: Scheduling, confirming and implementing an interchange schedule with other Balancing Authorities, including intermediary Balancing Authorities providing transmission service, and ensuring operational security during the interchange transaction (Ancillary Service). SPINNING RESERVE: Unloaded synchronized generating capacity that is immediately responsive to system frequency and that is capable of being loaded in a short time period or non-generation resources capable of providing this service (Ancillary Service). For Contracts, reported if the contract provides for sale of the product. For Transactions, sales by third-party providers (i.e., non-transmission function) are reported. SUPPLEMENTAL RESERVE: Service needed to serve load in the event of a system contingency, available with greater delay than SPINNING RESERVE. This service may be provided by generating units that are on-line but unloaded, by quick-start generation, or by interruptible load or other non-generation resources capable of providing this service (Ancillary Service). For Contracts, reported if the contract provides for sale of the product. For Transactions, sales by third-party providers (i.e., non-transmission function) are reported. TOLLING ENERGY: Energy sold from a plant whereby the buyer provides fuel to a generator (seller) and receives power in return for pre-established fees. UPLIFT: A make-whole payment by an RTO/ISO to a utility.
rate_units	string	Measure stated in the contract for the product sold. FERC EQR.
seller_company_id_ferc	string	The Company Identifier (CID) obtained through FERC’s Company Registration system corresponding to the selling company.
seller_company_name	string	The name of the company that is authorized to make sales as indicated in the company’s FERC tariff(s) or that is required to file the EQR under section 220 of the Federal Power Act. This name must match the name provided as seller_company_name in the core_ferceqr__quarterly_identity table. There are a handful of cases in which this requirement is violated, so any joins between tables should rely on company_id_ferc, not the company names.
seller_transaction_id	string	Unique reference number assigned by the Seller for each transaction. May only be unique in the context of the seller’s internal record keeping. This is an unrestricted text field.
standardized_price	number	For product names energy, capacity, and booked out power only. Specify the price in $/MWh if the product is energy or booked out power and specify the price in $/MW-month if the product is capacity or booked out power.
standardized_quantity	number	For product names energy, capacity, and booked out power only. Specify the quantity in MWh if the product is energy or booked out power and specify the quantity in MW-month if the product is capacity or booked out power.
term_name	string	Contracts with durations of one year or greater are long-term (LT). Contracts with shorter durations are short-term (ST).
timezone	string	A code representing the so-called ‘time zone’ in which the sale was made. However, these codes do not actually correspond to time zones. Rather, they indicate an offset from UTC, which changes according to daylight savings vs. standard time. The codes are as follows: AD: Atlantic Daylight Time (UTC-3), AP: Atlantic Prevailing Time, AS: Atlantic Standard Time (UTC-4), CD: Central Daylight Time (UTC-5), CP: Central Prevailing Time, CS: Central Standard Time (UTC-6), ED: Eastern Daylight Time (UTC-4), EP: Eastern Prevailing Time, ES: Eastern Standard Time (UTC-5), MD: Mountain Daylight Time (UTC-6), MP: Mountain Prevailing Time, MS: Mountain Standard Time (UTC-7), PD: Pacific Daylight Time (UTC-7), PP: Pacific Prevailing Time, PS: Pacific Standard Time (UTC-8), UT: Coordinated Universal Time (UTC+0)
total_transaction_charge	number	transaction_quantity * price + total_transmission_charge.
total_transmission_charge	number	Payments received for transmission services when explicitly identified.
trade_date	date	The date upon which the parties made the legally binding agreement on the price of a transaction.
transaction_begin_date	datetime	First date and time the product is sold during the quarter.
transaction_end_date	datetime	Last date and time the product is sold during the quarter.
transaction_quantity	number	The quantity of the product in this transaction record.
transaction_unique_id	string	An identifier beginning with the letter “T” and followed by a number (e.g., “T1”, “T2”) used to designate a record containing transaction information. One record for each transaction record must be included in an EQR for a given quarter. A new transaction record must be used every time a price changes in a sale. Note, these ID’s are only unique for a single company_identifier and year_quarter.
type_of_rate	string	FIXED: A fixed charge per unit of consumption. No variables are used to determine this rate. FORMULA: A calculation of a rate based upon a formula that does not contain an electric index component. ELECTRIC INDEX: A calculation of a rate based upon an index or a formula that contains an electric index component. An electric index includes an index published by an index publisher such as those required to be listed in Field Number 73 or a price published by an RTO/ISO (e.g., PJM West or Illinois Hub). RTO/ISO: If the price is the result of an RTO/ISO market or the sale is made to the RTO/ISO.
year_quarter	string	Year-quarter corresponding to record. Formatted like YYYYq{1-4}.

core_gridpathratoolkit__assn_generator_aggregation_group#

Noticed any irregularities with this table? File a report.

Association table providing connections between individual generator IDs and their corresponding aggregation group when creating aggregated profiles.

Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: GridPath Resource Adequacy Toolkit Data
Primary key:: plant_id_eia, generator_id

Usage Warnings#

The original data is no longer being collected or reported in this way.

Additional Details#

This table defines which individual generator profiles are combined when creating aggregated capacity factors / generation profiles. Generator capacity is used to weight the contribution of each generator in the resulting aggregated profiles, and is made available in this table for convenience and legibility. The resulting aggregated profiles are stored in out_gridpathratoolkit__hourly_available_capacity_factor.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_gridpathratoolkit__assn_generator_aggregation_group.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_gridpathratoolkit__assn_generator_aggregation_group.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_gridpathratoolkit__assn_generator_aggregation_group.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_gridpathratoolkit__assn_generator_aggregation_group.parquet"
)

Columns#

Field Name	Type	Description
aggregation_group	string	A label identifying a group of aggregated generator capacity factors.
capacity_mw	number	Total installed (nameplate) capacity, in megawatts.
generator_id	string	Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string!
include_generator	boolean	Every row in the aggregation table describes a single generator. Groups of rows with the same aggregation are combined using a capacity weighted average to produce an aggregate generation profile. A few generators are not included in that aggregation process. This column determines whether a generator is included.
plant_id_eia	integer	The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.

core_nrelatb__yearly_projected_cost_performance#

Noticed any irregularities with this table? File a report.

Annual time series of projections of costs and performance for various technologies.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: NREL Annual Technology Baseline (ATB) for Electricity
Primary key:: This table has no primary key. The fields technology_description_detail_1 and technology_description_detail_2 occasionally contain nulls, preventing this table from having a primary key.

Additional Details#

This table includes NREL’s projections for capital and operational costs as well as operational attributes for a variety of technologies for each of the NREL ATB scenarios. Each new year of ATB data includes projections for the next 25 years.

All of the NREL tables are related. If you want to understand the financial assumptions of a given scenario in this table, see core_nrelatb__yearly_projected_financial_cases. The other two core_nrealatb tables include attributes about technologies in these projections.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_nrelatb__yearly_projected_cost_performance.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_nrelatb__yearly_projected_cost_performance.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_nrelatb__yearly_projected_cost_performance.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_nrelatb__yearly_projected_cost_performance.parquet"
)

Columns#

Field Name	Type	Description
capacity_factor	number	Fraction of potential generation that was actually reported for a plant part. Energy generated over time period / nameplate capacity * time period (hours/years/etc.).
capex_construction_finance_factor	number	Portion of all-in capital cost associated with construction period financing. This factor is applied to an overnight capital cost to represent the financing costs incurred during the construction period.
capex_grid_connection_per_kw	number	Overnight capital cost includes a nominal-distance spur line (<1 mi) for all technologies, and for offshore wind, it includes export cable and construction period transit costs for a 30-km distance from shore. Project-specific costs lines that are based on distance to existing transmission are not included. This only applies to offshore wind.
capex_overnight_additional_per_kw	number	capex for retrofits if plant could be constructed overnight (i.e., excludes construction period financing); includes on-site electrical equipment (e.g., switchyard), a nominal-distance spur line (<1 mi), and necessary upgrades at a transmission substation.
capex_overnight_per_kw	number	capex if plant could be constructed overnight (i.e., excludes construction period financing); includes on-site electrical equipment (e.g., switchyard), a nominal-distance spur line (<1 mi), and necessary upgrades at a transmission substation.
capex_per_kw	number	Capital cost (USD). Expenditures required to achieve commercial operation of the generation plant.
cost_recovery_period_years	integer	The period over which the initial capital investment to build a plant is recovered.
fuel_cost_per_mwh	number	Fuel costs in USD$/MWh. NREL-derived values using heat rates.
heat_rate_mmbtu_per_mwh	number	Fuel content per unit of electricity generated.
heat_rate_penalty	number	Heat rate penalty for retrofitting. This column only has contents to retrofit technologies. It seems to be a rate between 0.35 and 0.09
levelized_cost_of_energy_per_mwh	number	Levelized cost of energy (LCOE) is a summary metric that combines the primary technology cost and performance parameters: capital expenditures, operations expenditures, and capacity factor.
model_case_nrelatb	string	NREL’s financial assumption cases. There are two cases which effect project financial assumptions: R&D Only Case and Market + Policies Case. R&D Only includes only projected R&D improvements while Market + Policy case includes policy and tax incentives. https://atb.nrel.gov/electricity/2024/financial_cases_&_methods
model_tax_credit_case_nrelatb	string	NREL’s tax credit assumption cases. There are two types of tax credits: production tax credit (PTC) and investment tax credit (ITC). For more detail, see: https://atb.nrel.gov/electricity/2024/financial_cases_&_methods
net_output_penalty	number	Penalty for retrofitting for net output. This column only has contents to retrofit technologies. It seems to be a rate between -0.25 and -0.08
opex_fixed_per_kw	number	Fixed operation and maintenance expenses. Annual expenditures to operate and maintain equipment that are not incurred on a per-unit-energy basis.
opex_variable_per_mwh	number	Operation and maintenance costs incurred on a per-unit-energy basis.
projection_year	integer	The year of the projected value.
report_year	integer	Four-digit year in which the data was reported.
scenario_atb	string	Technology innovation scenarios. https://atb.nrel.gov/electricity/2023/definitions#scenarios
technology_description	string	High level description of the technology used by the generator to produce electricity.
technology_description_detail_1	string	Technology details indicate resource levels and specific technology subcategories.
technology_description_detail_2	string	Technology details indicate resource levels and specific technology subcategories.

core_nrelatb__yearly_projected_financial_cases#

Noticed any irregularities with this table? File a report.

Annual time series of financial assumptions for each model case (model_case_nrelatb), and sub-type of technology (technology_description).

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: NREL Annual Technology Baseline (ATB) for Electricity
Primary key:: report_year, model_case_nrelatb, projection_year, technology_description

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_nrelatb__yearly_projected_financial_cases.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_nrelatb__yearly_projected_financial_cases.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_nrelatb__yearly_projected_financial_cases.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_nrelatb__yearly_projected_financial_cases.parquet"
)

Columns#

Field Name	Type	Description
inflation_rate	number	Rate of inflation. All dollar values are given in 2021 USD, using the Consumer Price Index for All Urban Consumers for dollar year conversions where the source year dollars do not match 2021.
interest_rate_calculated_real	number	Calculated real interest rate.
interest_rate_during_construction_nominal	number	Also referred to as construction finance cost. Portion of all-in capital cost associated with construction period financing. It is a function of construction duration, capital fraction during construction, and interest during construction.
interest_rate_nominal	number	Nominal interest rate.
model_case_nrelatb	string	NREL’s financial assumption cases. There are two cases which effect project financial assumptions: R&D Only Case and Market + Policies Case. R&D Only includes only projected R&D improvements while Market + Policy case includes policy and tax incentives. https://atb.nrel.gov/electricity/2024/financial_cases_&_methods
projection_year	integer	The year of the projected value.
rate_of_return_on_equity_calculated_real	number	Calculated real rate of return on equity.
rate_of_return_on_equity_nominal	number	Nomial rate of return on equity.
report_year	integer	Four-digit year in which the data was reported.
tax_rate_federal_state	number	Combined federal and state tax rate. The R&D model_case_nrelatb holds tax and inflation rates constant at assumed long-term values: 21 percent federal tax rate, 6 percent state tax rate (though actual state tax rates vary), and 2.5 percent inflation rate excludes effects of tax credits. The Market + Policy model_case_nrelatb applies federal tax credits and expires them as consistent with existing law and guidelines.
technology_description	string	High level description of the technology used by the generator to produce electricity.

core_nrelatb__yearly_projected_financial_cases_by_scenario#

Noticed any irregularities with this table? File a report.

Annual time series of additional financial assumptions for NREL ATB projections that also vary by technology innovation scenario (scenario_atb), tax credit case (model_tax_credit_case_nrelatb), and cost recovery period (cost_recovery_period_years).

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: NREL Annual Technology Baseline (ATB) for Electricity
Primary key:: This table has no primary key. There are a small number of records which have nulls in the cost_recovery_period_years column. Based on NREL’s documentation, this seems to indicate that those records apply to any relevant cost_recovery_period_years. If those records were non-null, the primary keys of this table would be: [‘report_year’, ‘model_case_nrelatb’, ‘model_tax_credit_case_nrelatb’, ‘projection_year’, ‘technology_description’, ‘scenario_atb’, ‘cost_recovery_period_years’]

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_nrelatb__yearly_projected_financial_cases_by_scenario.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_nrelatb__yearly_projected_financial_cases_by_scenario.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_nrelatb__yearly_projected_financial_cases_by_scenario.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_nrelatb__yearly_projected_financial_cases_by_scenario.parquet"
)

Columns#

Field Name	Type	Description
capital_recovery_factor	number	Ratio of a constant annuity to the present value of receiving that annuity for a given length of time.
cost_recovery_period_years	integer	The period over which the initial capital investment to build a plant is recovered.
debt_fraction	number	Fraction of capital financed with debt; Debt fraction is assumed financed with equity; also referred to as the leverage ratio.
fixed_charge_rate	number	Amount of revenue per dollar of investment required that must be collected annually from customers to pay the carrying charges on that investment.
model_case_nrelatb	string	NREL’s financial assumption cases. There are two cases which effect project financial assumptions: R&D Only Case and Market + Policies Case. R&D Only includes only projected R&D improvements while Market + Policy case includes policy and tax incentives. https://atb.nrel.gov/electricity/2024/financial_cases_&_methods
model_tax_credit_case_nrelatb	string	NREL’s tax credit assumption cases. There are two types of tax credits: production tax credit (PTC) and investment tax credit (ITC). For more detail, see: https://atb.nrel.gov/electricity/2024/financial_cases_&_methods
projection_year	integer	The year of the projected value.
report_year	integer	Four-digit year in which the data was reported.
scenario_atb	string	Technology innovation scenarios. https://atb.nrel.gov/electricity/2023/definitions#scenarios
technology_description	string	High level description of the technology used by the generator to produce electricity.
wacc_nominal	number	Nominal weighted average cost of capital - average expected rate that is paid to finance assets.
wacc_real	number	Real weighted average cost of capital - average expected rate that is paid to finance assets.

core_nrelatb__yearly_technology_status#

Noticed any irregularities with this table? File a report.

Annual time series of technology statuses, indicating whether technologies in the ATB scenarios are mature, and whether they are the default technologies.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: NREL Annual Technology Baseline (ATB) for Electricity
Primary key:: This table has no primary key.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_nrelatb__yearly_technology_status.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_nrelatb__yearly_technology_status.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_nrelatb__yearly_technology_status.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_nrelatb__yearly_technology_status.parquet"
)

Columns#

Field Name	Type	Description
is_default	boolean	Indicator of whether the technology is default.
is_technology_mature	boolean	Indicator of whether the technology is mature. Technologies are definedas mature if a representative plant is operating or under constructionin the United States in the Base Year.
report_year	integer	Four-digit year in which the data was reported.
technology_description	string	High level description of the technology used by the generator to produce electricity.
technology_description_detail_1	string	Technology details indicate resource levels and specific technology subcategories.
technology_description_detail_2	string	Technology details indicate resource levels and specific technology subcategories.

core_phmsagas__yearly_distribution_operators#

Noticed any irregularities with this table? File a report.

Annual time series of distribution operator information.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: Pipelines and Hazardous Materials Safety Administration (PHMSA) Annual Natural Gas Report (Part A)
Primary key:: report_id, report_date, operator_id_phmsa

Additional Details#

This table contains operator-level information including office and headquarter location.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_phmsagas__yearly_distribution_operators.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_phmsagas__yearly_distribution_operators.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_phmsagas__yearly_distribution_operators.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_phmsagas__yearly_distribution_operators.parquet"
)

Columns#

Field Name	Type	Description
additional_information	string	Any additional information which will assist in clarifying or classifying the reported data.
headquarters_city	string	City where an operator’s headquarters are located.
headquarters_county	string	County where an operator’s headquarters are located.
headquarters_state	string	State where an operator’s headquarters are located.
headquarters_street_address	string	Street address for an operator’s headquarters.
headquarters_zip	string	Zipcode where an operator’s headquarters are located.
office_city	string	City where an operator’s office is located.
office_county	string	County where an operator’s office is located.
office_state	string	State where an operator’s office is located.
office_street_address	string	Street address of an operator’s office.
office_zip	string	Zipcode where an operator’s office is located.
operator_id_phmsa	integer	PHMSA unique operator ID. A value of zero represents an unknown operator ID.
operator_name_phmsa	string	PHMSA operator name.
report_date	date	Date reported.
report_id	integer	Report number of the PHMSA Gas utility submission.

core_pudl__assn_eia_pudl_plants#

Noticed any irregularities with this table? File a report.

Association table providing connections between EIA plant IDs and manually assigned PUDL plant IDs.

Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: The Public Utility Data Liberation (PUDL) Project
Primary key:: plant_id_eia

Usage Warnings#

Contains information from multiple raw inputs.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__assn_eia_pudl_plants.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__assn_eia_pudl_plants.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_pudl__assn_eia_pudl_plants.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__assn_eia_pudl_plants.parquet"
)

Columns#

Field Name	Type	Description
plant_id_eia	integer	The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.
plant_id_pudl	integer	A manually assigned PUDL plant ID. May not be constant over time.
plant_name_eia	string	Plant name.

core_pudl__assn_eia_pudl_utilities#

Noticed any irregularities with this table? File a report.

Association table providing connections between EIA utility IDs and manually assigned PUDL utility IDs.

Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: The Public Utility Data Liberation (PUDL) Project
Primary key:: utility_id_eia

Usage Warnings#

Contains information from multiple raw inputs.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__assn_eia_pudl_utilities.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__assn_eia_pudl_utilities.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_pudl__assn_eia_pudl_utilities.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__assn_eia_pudl_utilities.parquet"
)

Columns#

Field Name	Type	Description
utility_id_eia	integer	The EIA Utility Identification number.
utility_id_pudl	integer	A manually assigned PUDL utility ID. May not be stable over time.
utility_name_eia	string	The name of the utility.

core_pudl__assn_ferc1_dbf_pudl_utilities#

Noticed any irregularities with this table? File a report.

Association table providing connections between PUDL-assigned FERC1 utility IDs and the native FERC1 DBF utility IDs originally reported as respondent_id.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: The Public Utility Data Liberation (PUDL) Project
Primary key:: utility_id_ferc1_dbf

Usage Warnings#

FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__assn_ferc1_dbf_pudl_utilities.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__assn_ferc1_dbf_pudl_utilities.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_pudl__assn_ferc1_dbf_pudl_utilities.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__assn_ferc1_dbf_pudl_utilities.parquet"
)

Columns#

Field Name	Type	Description
utility_id_ferc1	integer	PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.
utility_id_ferc1_dbf	integer	FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year.

core_pudl__assn_ferc1_pudl_plants#

Noticed any irregularities with this table? File a report.

Association table providing connections between FERC 1 plants and their manually assigned PUDL plant IDs.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: The Public Utility Data Liberation (PUDL) Project
Primary key:: utility_id_ferc1, plant_name_ferc1

Usage Warnings#

FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

FERC does not assign IDs to plants, so each FERC 1 plant is identified by a utility_id_ferc1 and a plant_name_ferc1.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__assn_ferc1_pudl_plants.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__assn_ferc1_pudl_plants.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_pudl__assn_ferc1_pudl_plants.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__assn_ferc1_pudl_plants.parquet"
)

Columns#

Field Name	Type	Description
plant_id_pudl	integer	A manually assigned PUDL plant ID. May not be constant over time.
plant_name_ferc1	string	Name of the plant, as reported to FERC. This is a freeform string, not guaranteed to be consistent across references to the same plant.
utility_id_ferc1	integer	PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.

core_pudl__assn_ferc1_pudl_utilities#

Noticed any irregularities with this table? File a report.

Association table providing connections between PUDL utility IDs and PUDL-assigned FERC1 utility IDs.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: The Public Utility Data Liberation (PUDL) Project
Primary key:: utility_id_ferc1

Usage Warnings#

FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

This table maps two manually assigned utility IDs: a PUDL ID and a FERC1 ID. The PUDL IDs link EIA and FERC1 utilities. The PUDL FERC1 IDs link records from older DBF respondent IDs and new XBRL entity IDs via core_pudl__assn_ferc1_dbf_pudl_utilities and core_pudl__assn_ferc1_xbrl_pudl_utilities respectively. This table is generated from a table stored in the PUDL repository: src/package_data/glue/utility_id_pudl.csv

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__assn_ferc1_pudl_utilities.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__assn_ferc1_pudl_utilities.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_pudl__assn_ferc1_pudl_utilities.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__assn_ferc1_pudl_utilities.parquet"
)

Columns#

Field Name	Type	Description
utility_id_ferc1	integer	PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.
utility_id_pudl	integer	A manually assigned PUDL utility ID. May not be stable over time.
utility_name_ferc1	string	Name of the responding utility, as it is reported in FERC Form 1. For human readability only.

core_pudl__assn_ferc1_xbrl_pudl_utilities#

Noticed any irregularities with this table? File a report.

Association table providing connections between PUDL-assigned FERC1 utility IDs and the native FERC1 XBRL utility IDs originally reported as entity_id.

Most-recent data:: 2024
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: The Public Utility Data Liberation (PUDL) Project
Primary key:: utility_id_ferc1_xbrl

Usage Warnings#

FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__assn_ferc1_xbrl_pudl_utilities.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__assn_ferc1_xbrl_pudl_utilities.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_pudl__assn_ferc1_xbrl_pudl_utilities.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__assn_ferc1_xbrl_pudl_utilities.parquet"
)

Columns#

Field Name	Type	Description
utility_id_ferc1	integer	PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.
utility_id_ferc1_xbrl	string	FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year.

core_pudl__assn_ferc714_csv_pudl_respondents#

Noticed any irregularities with this table? File a report.

Association table providing connections between PUDL-assigned FERC714 respondent IDs and native FERC714 CSV respondent IDs originally reported as respondent_id.

Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: The Public Utility Data Liberation (PUDL) Project
Primary key:: respondent_id_ferc714, respondent_id_ferc714_csv

Usage Warnings#

Contains information from multiple raw inputs.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__assn_ferc714_csv_pudl_respondents.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__assn_ferc714_csv_pudl_respondents.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_pudl__assn_ferc714_csv_pudl_respondents.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__assn_ferc714_csv_pudl_respondents.parquet"
)

Columns#

Field Name	Type	Description
respondent_id_ferc714	integer	PUDL-assigned identifying a respondent to FERC Form 714. This ID associates natively reported respondent IDs from the original CSV and XBRL data sources.
respondent_id_ferc714_csv	integer	FERC Form 714 respondent ID from CSV reported data - published from years: 2006-2020. This ID is linked to the newer years of reported XBRL data through the PUDL-assigned respondent_id_ferc714 ID. This ID was originally reported as respondent_id. Note that this ID does not correspond to FERC respondent IDs from other forms.

core_pudl__assn_ferc714_pudl_respondents#

Noticed any irregularities with this table? File a report.

Entity table containing static information about PUDL-derived FERC 714 respondent IDs.

Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: The Public Utility Data Liberation (PUDL) Project
Primary key:: respondent_id_ferc714

Usage Warnings#

Contains information from multiple raw inputs.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

These IDs are used to connect older CSV data which uses different respondent IDs than the newer XBRL entity IDs. These IDs are manually assigned when new FERC 714 data is integrated, and any newly found utilities are added to the list with a new ID. This table is read in from a CSV stored in the PUDL repository: src/pudl/package_data/glue/respondent_id_ferc714.xlsx

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__assn_ferc714_pudl_respondents.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__assn_ferc714_pudl_respondents.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_pudl__assn_ferc714_pudl_respondents.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__assn_ferc714_pudl_respondents.parquet"
)

Columns#

Field Name	Type	Description
respondent_id_ferc714	integer	PUDL-assigned identifying a respondent to FERC Form 714. This ID associates natively reported respondent IDs from the original CSV and XBRL data sources.

core_pudl__assn_ferc714_xbrl_pudl_respondents#

Noticed any irregularities with this table? File a report.

Association table providing connections between PUDL-assigned FERC714 respondent IDs and native FERC714 XBRL respondent IDs originally reported as entity_id.

Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: The Public Utility Data Liberation (PUDL) Project
Primary key:: respondent_id_ferc714, respondent_id_ferc714_xbrl

Usage Warnings#

Contains information from multiple raw inputs.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__assn_ferc714_xbrl_pudl_respondents.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__assn_ferc714_xbrl_pudl_respondents.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_pudl__assn_ferc714_xbrl_pudl_respondents.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__assn_ferc714_xbrl_pudl_respondents.parquet"
)

Columns#

Field Name	Type	Description
respondent_id_ferc714	integer	PUDL-assigned identifying a respondent to FERC Form 714. This ID associates natively reported respondent IDs from the original CSV and XBRL data sources.
respondent_id_ferc714_xbrl	string	FERC Form 714 respondent ID from XBRL reported data - published from years: 2021-present. This ID is linked to the older years of reported CSV data through the PUDL-assigned respondent_id_ferc714 ID. This ID was originally reported as entity_id. Note that this ID does not correspond to FERC respondent IDs from other forms.

core_pudl__assn_utilities_plants#

Noticed any irregularities with this table? File a report.

Association table providing connections between PUDL utility IDs and PUDL plant IDs.

Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: The Public Utility Data Liberation (PUDL) Project
Primary key:: utility_id_pudl, plant_id_pudl

Additional Details#

This table is read in from a spreadsheet stored in the PUDL repository: src/pudl/package_data/glue/pudl_id_mapping.xlsx

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__assn_utilities_plants.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__assn_utilities_plants.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_pudl__assn_utilities_plants.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__assn_utilities_plants.parquet"
)

Columns#

Field Name	Type	Description
plant_id_pudl	integer	A manually assigned PUDL plant ID. May not be constant over time.
utility_id_pudl	integer	A manually assigned PUDL utility ID. May not be stable over time.

core_pudl__codes_data_maturities#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for maturity levels of data records. Some data sources report less-than-final data. PUDL sometimes includes this data, but use at your own risk.

Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: The Public Utility Data Liberation (PUDL) Project
Primary key:: code

Usage Warnings#

Contains information from multiple raw inputs.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__codes_data_maturities.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__codes_data_maturities.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_pudl__codes_data_maturities.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__codes_data_maturities.parquet"
)

Columns#

Field Name	Type	Description
code	string	Originally reported short code.
description	string	Long human-readable description of the meaning of a code/label.

core_pudl__codes_datasources#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for the data sources that PUDL integrates.

Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: The Public Utility Data Liberation (PUDL) Project
Primary key:: datasource

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__codes_datasources.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__codes_datasources.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_pudl__codes_datasources.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__codes_datasources.parquet"
)

Columns#

Field Name	Type	Description
datasource	string	Code identifying a dataset available within PUDL.
doi	string	Unique digital object identifier of Zenodo archive.
partitions	string	The data partitions used to generate this instance of the database.
pudl_version	string	The version of PUDL used to generate this database.

core_pudl__codes_imputation_reasons#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for the reasons a value might be flagged for imputation.

Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: The Public Utility Data Liberation (PUDL) Project
Primary key:: code

Additional Details#

Any time a column is imputed, there should be a corresponding column which contains codes for every value in the column that was imputed.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__codes_imputation_reasons.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__codes_imputation_reasons.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_pudl__codes_imputation_reasons.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__codes_imputation_reasons.parquet"
)

Columns#

Field Name	Type	Description
code	string	Originally reported short code.
description	string	Long human-readable description of the meaning of a code/label.

core_pudl__codes_subdivisions#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for states, provinces, and other sub-national political subdivisions.

Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: The Public Utility Data Liberation (PUDL) Project
Primary key:: country_code, subdivision_code

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__codes_subdivisions.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__codes_subdivisions.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_pudl__codes_subdivisions.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__codes_subdivisions.parquet"
)

Columns#

Field Name	Type	Description
country_code	string	Three letter ISO-3166 country code (e.g. USA or CAN).
country_name	string	Full country name (e.g. United States of America).
division_code_us_census	string	Three-letter US Census division code as it appears in the bulk electricity data published by the EIA. Note that EIA splits the Pacific division into distinct contiguous (CA, OR, WA) and non-contiguous (AK, HI) states. For reference see this US Census region and division map: https://www2.census.gov/geo/pdfs/maps-data/maps/reference/us_regdiv.pdf
division_name_us_census	string	Longer human readable name describing the US Census division.
is_epacems_state	boolean	Indicates whether the associated state reports data within the EPA’s Continuous Emissions Monitoring System.
region_name_us_census	string	Human-readable name of a US Census region.
state_id_fips	string	Two digit state FIPS code.
subdivision_code	string	Two-letter ISO-3166 political subdivision code (e.g. US state or Canadian province abbreviations like CA or AB).
subdivision_name	string	Full name of political subdivision (e.g. US state or Canadian province names like California or Alberta.
subdivision_type	string	ISO-3166 political subdivision type. E.g. state, province, outlying_area.
timezone_approx	string	IANA timezone name of the timezone which encompasses the largest portion of the population in the associated geographic area.

core_pudl__entity_plants_pudl#

Noticed any irregularities with this table? File a report.

Entity table containing static information about PUDL-assigned plant IDs.

Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: The Public Utility Data Liberation (PUDL) Project
Primary key:: plant_id_pudl

Additional Details#

These IDs are manually generated each year when new FERC and EIA reporting is integrated, and any newly identified plants are added to the list with a new ID. Each ID maps to a power plant which is reported in at least one FERC or EIA data set. This table is read in from a spreadsheet stored in the PUDL repository: src/pudl/package_data/glue/pudl_id_mapping.xlsx

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__entity_plants_pudl.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__entity_plants_pudl.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_pudl__entity_plants_pudl.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__entity_plants_pudl.parquet"
)

Columns#

Field Name	Type	Description
plant_id_pudl	integer	A manually assigned PUDL plant ID. May not be constant over time.
plant_name_pudl	string	Plant name, chosen arbitrarily from the several possible plant names available in the plant matching process. Included for human readability only.

core_pudl__entity_utilities_pudl#

Noticed any irregularities with this table? File a report.

Entity table containing static information about PUDL-assigned utility IDs.

Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: The Public Utility Data Liberation (PUDL) Project
Primary key:: utility_id_pudl

Additional Details#

These IDs are manually generated each year when new FERC and EIA reporting is integrated, and any newly found utilities are added to the list with a new ID. Each ID maps to a power plant owning or operating entity which is reported in at least one FERC or EIA data set. This table is read in from a spreadsheet stored in the PUDL repository: src/pudl/package_data/glue/pudl_id_mapping.xlsx

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__entity_utilities_pudl.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__entity_utilities_pudl.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_pudl__entity_utilities_pudl.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__entity_utilities_pudl.parquet"
)

Columns#

Field Name	Type	Description
utility_id_pudl	integer	A manually assigned PUDL utility ID. May not be stable over time.
utility_name_pudl	string	Utility name, chosen arbitrarily from the several possible utility names available in the utility matching process. Included for human readability only.

core_rus12__entity_borrowers#

Noticed any irregularities with this table? File a report.

Entity table containing static information about active RUS borrowers.

Most-recent data:: 2021
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply
Primary key:: borrower_id_rus

Usage Warnings#

Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Additional Details#

This table contains canonical values for borrowers are set. It contains values which are expected to remain fixed over time.RUS reports many attributes in many different tables across throughout RUS-7 and RUS-12. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. For the borrower_name_rus PUDL chooses the most consistently reported value regardless of if it meets this 70% threshold so that all borrowers will have a name. We chose this because most name changes were insignificant (eg. “and” changed to “&” or “coop” changed to “cooperative”). All tables downstream of this one inherit the canonical values established here..

This table contains all of the Active Distribution Borrowers as of each report year who were eligible to report to RUS Form 12. If these Borrowers have reported to RUS they will have records in the enclosed data tables, however a small number of these Borrowers did not report for various reasons and these Borrowers will not be represented in any of the other tables.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__entity_borrowers.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__entity_borrowers.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus12__entity_borrowers.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__entity_borrowers.parquet"
)

Columns#

Field Name	Type	Description
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
borrower_name_rus	string	The name of the RUS (Rural Utilities Service) borrower.
state	string	Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

core_rus12__monthly_demand_and_energy_at_delivery_points#

Noticed any irregularities with this table? File a report.

Monthly time series of demand and energy delivered to distribution utilities.

Most-recent data:: 2021
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part H - Section E)
Primary key:: report_date, borrower_id_rus, delivery_recipient

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__monthly_demand_and_energy_at_delivery_points.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__monthly_demand_and_energy_at_delivery_points.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus12__monthly_demand_and_energy_at_delivery_points.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__monthly_demand_and_energy_at_delivery_points.parquet"
)

Columns#

Field Name	Type	Description
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
delivered_demand_mw	number	The amount of demand delivered in MW.
delivered_energy_mwh	number	The amount of energy delivered in MWh.
delivery_recipient	string	The recipient of the delivered energy or demand.
report_date	date	Date reported.

core_rus12__monthly_demand_and_energy_at_power_sources#

Noticed any irregularities with this table? File a report.

Monthly time series of demand and energy at point of generation.

Most-recent data:: 2021
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part H - Section D)
Primary key:: report_date, borrower_id_rus

Additional Details#

There are some peak demand dates that have an hour component and others that don’t. This is because there is inconsistent reporting of peak_demand_hour in the raw data. If there is no hour reported, we record the hour as 0 and only the date will show in the data.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__monthly_demand_and_energy_at_power_sources.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__monthly_demand_and_energy_at_power_sources.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus12__monthly_demand_and_energy_at_power_sources.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__monthly_demand_and_energy_at_power_sources.parquet"
)

Columns#

Field Name	Type	Description
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
energy_output_mwh	number	The amount of energy output in MWh.
is_peak_coincident	boolean	Whether or not the peak_mw is coincident or non-coincident peak.
peak_demand_date	datetime	The date of the peak demand.
peak_demand_mw	number	peak demand in a given timeframe.
report_date	date	Date reported.

core_rus12__yearly_balance_sheet_assets#

Noticed any irregularities with this table? File a report.

Annual time series of assets and other debts from the balance sheet.

Most-recent data:: 2021
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part A - Section B)
Primary key:: report_date, borrower_id_rus, asset_type

Usage Warnings#

Some columns contain subtotals; use caution when choosing columns to aggregate.

Additional Details#

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_balance_sheet_assets.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_balance_sheet_assets.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus12__yearly_balance_sheet_assets.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_balance_sheet_assets.parquet"
)

Columns#

Field Name	Type	Description
asset_type	string	Type of asset being reported to the core_rus12__yearly_balance_sheet_assets table.
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
ending_balance	number	Account balance at end of year.
is_total	boolean	Whether or not this record represents a total.
report_date	date	Date reported.

core_rus12__yearly_balance_sheet_liabilities#

Noticed any irregularities with this table? File a report.

Annual time series of liabilities and other credits from the balance sheet.

Most-recent data:: 2021
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part A - Section B)
Primary key:: report_date, borrower_id_rus, liability_type

Usage Warnings#

Some columns contain subtotals; use caution when choosing columns to aggregate.

Additional Details#

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_balance_sheet_liabilities.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_balance_sheet_liabilities.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus12__yearly_balance_sheet_liabilities.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_balance_sheet_liabilities.parquet"
)

Columns#

Field Name	Type	Description
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
ending_balance	number	Account balance at end of year.
is_total	boolean	Whether or not this record represents a total.
liability_type	string	Type of liability being reported to the core_rus12__yearly_balance_sheet_liabilities table.
report_date	date	Date reported.

core_rus12__yearly_depreciation_changes#

Noticed any irregularities with this table? File a report.

Annual time series of changes in accumulated depreciation.

Most-recent data:: 2021
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part H - Section B)
Primary key:: report_date, borrower_id_rus, depreciation_and_amortization_group, depreciation_and_amortization_item

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_depreciation_changes.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_depreciation_changes.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus12__yearly_depreciation_changes.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_depreciation_changes.parquet"
)

Columns#

Field Name	Type	Description
accruals	number	Value of additions into an asset class a.k.a accruals.
adjustments_and_transfers	number	Amount of adjustments and transfers within a class of assets.
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
composite_depreciation_rate	number	The composite depreciation rate within a given category. This is typically expressed as a number between 0 and 100.
depreciation_and_amortization_group	string	High-level category of depreciation and amortization items.
depreciation_and_amortization_item	string	Category of depreciation and amortization items.
ending_balance	number	Account balance at end of year.
is_total	boolean	Whether or not this record represents a total.
report_date	date	Date reported.
retirements_less_net_salvage	number	Cost of retirements minus any net salvage value.

core_rus12__yearly_depreciation_misc#

Noticed any irregularities with this table? File a report.

Annual time series of accumulated depreciation.

Most-recent data:: 2021
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part H - 2nd part of Section B)
Primary key:: report_date, borrower_id_rus, depreciation_and_amortization_item

Additional Details#

This is the second half of the depreciation table. It includes ending balances of six depreciation and amortization values.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_depreciation_misc.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_depreciation_misc.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus12__yearly_depreciation_misc.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_depreciation_misc.parquet"
)

Columns#

Field Name	Type	Description
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
depreciation_and_amortization_item	string	Category of depreciation and amortization items.
ending_balance	number	Account balance at end of year.
report_date	date	Date reported.

core_rus12__yearly_external_financial_risk_ratio#

Noticed any irregularities with this table? File a report.

Annual time series of ratio of investments and loan guarantee balances to total utility plant assets.

Most-recent data:: 2021
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part H - Section F - Subsection III)
Primary key:: report_date, borrower_id_rus

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_external_financial_risk_ratio.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_external_financial_risk_ratio.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus12__yearly_external_financial_risk_ratio.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_external_financial_risk_ratio.parquet"
)

Columns#

Field Name	Type	Description
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
external_financial_risk_ratio	number	total investments + loan guarantee balances / total utility plant assets. This ratio shows how much a utility is financially exposed to outside entities relative to its own assets.
report_date	date	Date reported.

core_rus12__yearly_investments#

Noticed any irregularities with this table? File a report.

Annual time series of investments, loan guarantees and loans.

Most-recent data:: 2021
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part H - Section F, Sub-section I)
Primary key:: This table has no primary key. This is a list of all investments or loans in each year and borrowers can have multiple records with the same investment_description.

Usage Warnings#

Some columns contain subtotals; use caution when choosing columns to aggregate.

Additional Details#

Reporting of investments is required by 7 CFR 1717, Subpart N. Investment categories reported on this Part correspond to Balance Sheet items in Part A Section B in table core_rus12__yearly_balance_sheet_assets.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_investments.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_investments.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus12__yearly_investments.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_investments.parquet"
)

Columns#

Field Name	Type	Description
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
excluded_investments	number	Excluded investment.
for_rural_development	boolean	Whether or not the investment or loan is for rural development. This includes investments in any/all types of projects or products that were made to improve the economy and/or quality of life in the specified area.
included_investments	number	Included investment.
income_or_loss	number	Income or loss from investment.
investment_description	string	Description of investment. This is a free-form text field and thus contains a wide variety of values.
investment_type_code	integer	Investment type code.
report_date	date	Date reported.

core_rus12__yearly_lines_stations_labor_materials_cost#

Noticed any irregularities with this table? File a report.

Annual time series of labor and material cost for lines and stations operated by RUS borrowers.

Most-recent data:: 2021
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part I - Section C)
Primary key:: report_date, borrower_id_rus, labor_or_material, operation_or_maintenance, lines_or_stations

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_lines_stations_labor_materials_cost.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_lines_stations_labor_materials_cost.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus12__yearly_lines_stations_labor_materials_cost.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_lines_stations_labor_materials_cost.parquet"
)

Columns#

Field Name	Type	Description
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
cost	number	Cost value.
labor_or_material	string	Indicates whether the cost reported is for labor or material.
lines_or_stations	string	Indicates whether the cost reported pertains to lines or stations.
operation_or_maintenance	string	Indicates whether the expenditure is for operation or maintenance.
report_date	date	Date reported.

core_rus12__yearly_loans#

Noticed any irregularities with this table? File a report.

Annual time series of loans provided by RUS borrowers.

Most-recent data:: 2021
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part H - Section F - Subsections II & IV)
Primary key:: This table has no primary key. This table has no primary key because some borrowers report multiple loan values from the same entity in a given year.

Additional Details#

This table also includes loan guarantees where the RUS borrower backs a loan from another entity and is therefore liable to pay any remaining balance should the original borrower default.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_loans.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_loans.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus12__yearly_loans.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_loans.parquet"
)

Columns#

Field Name	Type	Description
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
for_rural_development	boolean	Whether or not the investment or loan is for rural development. This includes investments in any/all types of projects or products that were made to improve the economy and/or quality of life in the specified area.
is_loan_guarantee	boolean	Indicates a third-party loan that the reporting utility (referred as a borrower) has co-signed, taking on responsibility for repayment if the primary borrower defaults.
loan_balance	number	The amount of money still owned on a loan at the end of the reporting year.
loan_maturity_date	date	The date on which a loan is scheduled to be fully paid.
loan_original_amount	number	The original amount of a loan.
loan_recipient	string	The organization that received a loan.
report_date	date	Date reported.

core_rus12__yearly_long_term_debt#

Noticed any irregularities with this table? File a report.

Annual time series of long-term debt and debt service requirements for RUS borrowers.

Most-recent data:: 2021
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part H - Section H)
Primary key:: This table has no primary key. This table has no primary key because some borrowers report multiple debt values from the same entity in a given year.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_long_term_debt.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_long_term_debt.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus12__yearly_long_term_debt.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_long_term_debt.parquet"
)

Columns#

Field Name	Type	Description
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
debt_description	string	Description of debt or loan. On the original form, there are nine provided descriptions and a section to add other free-form descriptions.
debt_ending_balance	number	The amount of principal still owned on the debt at the end of the report year.
debt_interest	number	The interest expense on the debt for the report year.
debt_principal	number	The principal paid on the debt during the report year.
debt_total	number	The total amount of debt.
report_date	date	Date reported.

core_rus12__yearly_meeting_and_board#

Noticed any irregularities with this table? File a report.

Annual time series of annual meeting and board information for RUS borrowers.

Most-recent data:: 2021
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part H - Section I)
Primary key:: report_date, borrower_id_rus

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_meeting_and_board.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_meeting_and_board.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus12__yearly_meeting_and_board.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_meeting_and_board.parquet"
)

Columns#

Field Name	Type	Description
board_members_num	integer	The total number of board members.
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
does_manager_have_written_contract	boolean	Whether or not the RUS borrower’s manager has a written contract.
fees_and_expenses_for_board_members	integer	The total amount of fees and expenses for board members.
last_annual_meeting_date	datetime	The date of the last annual meeting.
members_num	integer	The total number of members.
members_present_at_meeting_num	integer	The number of members present at the last annual meeting.
members_voting_by_proxy_or_mail_num	integer	The number of members voting by mail or by proxy.
report_date	date	Date reported.
was_quorum_present	boolean	Whether or not quorum was met.

core_rus12__yearly_non_utility_plant_changes#

Noticed any irregularities with this table? File a report.

Annual time series of changes in non-utility plant.

Most-recent data:: 2021
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part H - Section C)
Primary key:: report_date, borrower_id_rus, non_utility_plant_item

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_non_utility_plant_changes.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_non_utility_plant_changes.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus12__yearly_non_utility_plant_changes.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_non_utility_plant_changes.parquet"
)

Columns#

Field Name	Type	Description
additions	number	Cost of acquisition of items classified within the account.
adjustments_and_transfers	number	Amount of adjustments and transfers within a class of assets.
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
ending_balance	number	Account balance at end of year.
non_utility_plant_item	string	Category describing non-utility plant asset items.
report_date	date	Date reported.
retirements	number	Cost of disposal of items classified within the account.

core_rus12__yearly_plant_costs#

Noticed any irregularities with this table? File a report.

Annual time series of costs of net energy generated by plant.

Most-recent data:: 2021
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part F - Section D)
Primary key:: This table has no primary key. This table has no primary key because there is one plant (Walter Scott) that has duplicate records every year. Based on other RUS-12 tables, it is likely that these duplicate records are the borrowers’ ownership portion of the plant as well as the total plant but there is no clear indication in this table.

The primary key of this table otherwise would be: [‘report_date’, ‘borrower_id_rus’, ‘plant_name_rus’, ‘cost_group’, ‘cost_type’].

Usage Warnings#

Some columns contain subtotals; use caution when choosing columns to aggregate.

Additional Details#

The cost column in this table is expected to be largely non-null, the cost_per_mwh and cost_per_mmbtu columns only apply to some cost_type’s and even plant_type’s and thus are expected to contain many nulls.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_plant_costs.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_plant_costs.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus12__yearly_plant_costs.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_plant_costs.parquet"
)

Columns#

Field Name	Type	Description
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
cost	number	Cost value.
cost_group	string	High-level category of cost type.
cost_per_mmbtu	number	Unit cost of energy production in cost per MMBTU
cost_per_mwh	number	Unit cost of energy production in cost per MWh
cost_type	string	Detailed category of cost type.
is_total	boolean	Whether or not this record represents a total.
plant_name_rus	string	Name of the plant as reported to RUS.
plant_type	string	Type of plant.
report_date	date	Date reported.

core_rus12__yearly_plant_factors_and_maximum_demand#

Noticed any irregularities with this table? File a report.

Annual time series of plant factors and maximum demand for plants owned by RUS borrowers.

Most-recent data:: 2021
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Parts D, E, F IC, F CC, G - Section C)
Primary key:: This table has no primary key. The primary key for this table would be report_date, borrower_id_rus, plant_name_rus, and plant_type but the plant_type column is not reported before 2009 causing lots of duplicate records. Because none of the other fields are static year-over-year, there is no reliable way to backfill the plant_type column in instances where there are different plant_types reported for the same plant in later years.

Additional Details#

The plant_type field was only included in the data starting in 2009. We backfill records with only one reported plant_type per borrower/plant name/year. Records with multiple plant_types per borrower/plant name/year are left with null plant_type because there is no reliable way to determine which plant_type is associated with which record back in time. Plants who’s names are reported differently pre and post 2009 will also miss being backfilled.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_plant_factors_and_maximum_demand.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_plant_factors_and_maximum_demand.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus12__yearly_plant_factors_and_maximum_demand.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_plant_factors_and_maximum_demand.parquet"
)

Columns#

Field Name	Type	Description
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
capacity_factor	number	Fraction of potential generation that was actually reported for a plant part. Energy generated over time period / nameplate capacity * time period (hours/years/etc.).
capacity_factor_running	number	Fraction of potential generation over the time period a plant was in operation. Energy generated over time period / nameplate capacity * time period (hours/years/etc.).
load_factor	number	Fraction of consumption vs demand reported for a plant over a given timeframe. Energy consumed over time period / peak demand * time period (hours/years/etc.).
peak_gross_demand_mw	number	The highest average power output recorded over any single 15 minute interval during the reporting period.
peak_gross_demand_nameplate_mw	number	The theoretical or nameplate peak the plant could produce under the best operating conditions during the reporting period.
plant_name_rus	string	Name of the plant as reported to RUS.
plant_type	string	Type of plant.
report_date	date	Date reported.

core_rus12__yearly_plant_labor#

Noticed any irregularities with this table? File a report.

Annual time series of labor and payroll information for plants owned by RUS borrowers.

Most-recent data:: 2021
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Parts D, E, F, G - Section B)
Primary key:: This table has no primary key. The primary key should be report_date, borrower_id_rus, plant_name_rus, and plant_type, but this table did not report plant_type before 2009 and there are respondents who report multiple rows per plant pre-2009. The data cannot be backfilled because there is no way to distinguish between duplicate rows pre-2009.

Usage Warnings#

Some columns contain subtotals; use caution when choosing columns to aggregate.

Additional Details#

Note the lack of plant_type pre-2009 leading to a lack of reliable primary keys.

Plant-level data from core_rus12__yearly_plant_operations_by_plant can be matched to data from this table, but you must also join this data usingthe borrower_id_rus column as well as the report_date and plant_name. This is because the core_rus12__yearly_plant_operations_by_plant table includes records from each plant owner. The RUS instructions note that:

you should use caution when using total plant data since there are cases where more than one Borrower shares units at the same plant which means that you will be getting duplicate plant total records (and there is no guarantee that the total plant records entered by two borrowers for the same plant will be identical).

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_plant_labor.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_plant_labor.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus12__yearly_plant_labor.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_plant_labor.parquet"
)

Columns#

Field Name	Type	Description
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
employee_hours_worked_total	number	Total number of hours worked by employees.
employees_full_time_num	integer	Number of employees hired full-time for normal operations of the system.
employees_part_time_num	integer	Number employees regularly employed on a part-time basis. Exclude employees hired for short periods of time to complete special jobs.
payroll_maintenance	number	The amount of payroll spent on plant maintenance.
payroll_operations	number	The amount of payroll spent on plant operations.
payroll_other_accounts	number	The amount of plant payroll spent on accounts other than maintenance and operations.
plant_name_rus	string	Name of the plant as reported to RUS.
plant_type	string	Type of plant.
report_date	date	Date reported.

core_rus12__yearly_plant_operations_by_borrower#

Noticed any irregularities with this table? File a report.

Annual time series of borrower portion of plant operational data including fuel consumption and operational hours.

Most-recent data:: 2021
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part D, E, F (CC), F (IC) & G - Section A)
Primary key:: This table has no primary key. This table has no primary key because there are a handful of plants that have duplicate records. The primary key of this table otherwise would be: [report_date, borrower_id_rus, plant_name_rus, plant_name_rus, unit_id_rus, plant_type, is_full_ownership_portion, is_partly_owned_by_borrower].

Additional Details#

The data in this table comes from five different portions of RUS 12 corresponding to different plant types (steam, hydroelectric, combined_cycle, internal_combustion and nuclear). The original form includes plant operations data for each plant type reported in multiple ways: with records corresponding to the portion of plants that borrowers own as well as the whole plant. We split these records into two tables, core_rus12__yearly_plant_operations_by_borrower, which contains records reported at the borrower level, and core_rus12__yearly_plant_operations_by_plant, which contains records reported at the plant level.Records that are wholly owned by one borrower show up in both core_rus12__yearly_plant_operations_by_borrower and core_rus12__yearly_plant_operations_by_plant.

There are two boolean columns used to delineate which records are associated with the borrowers’ share vs the whole plant - which is documented in _OR_PowerSupply Plant File Documentation.rtf in the newer years in the RUS 12 archive. One of these two fields - is_partly_owned_by_borrower - was not reported before 2009. For the pre-2009 years, we assume that all records that report TRUE for is_full_ownership_portion should end up in the by-plant table while all records should end up in the by-borrower portion of the table.Like the post-2009 records, this involves records from the original tables ending up in both of these PUDL tables.

Also there are two cleaning steps that we performed that alter the original data slightly:

One dropped with unexpected ownership label and duplicate data: There is a Wisdom steam plant record that is labeled to be both fully owned by borrower and partly owned for one year, which is an unexpected combo based on the _OR_PowerSupply Plant File Documentation.rst documentation file in the rus12 archive. Luckily this plant has exactly the same records as the other Wisdom steam plant that year with more expected ownership labels.* Replaced two string values for unit_id_rus: There are two instances from 2018 of unit_id_rus’s that have string values in them - WSL GT 12 and WSL ST 10. Based on pre-cleaned data, we were able to clearly identify that we can use just the numeric values in these bad strings - 12 and 10 respectively. This enables us to have an integer type for this unit_id_rus column.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_plant_operations_by_borrower.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_plant_operations_by_borrower.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus12__yearly_plant_operations_by_borrower.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_plant_operations_by_borrower.parquet"
)

Columns#

Field Name	Type	Description
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
capacity_mw	number	Total installed (nameplate) capacity, in megawatts.
fuel_consumption_coal_lbs	number	Annual pounds of coal consumed for fuel.This field is only reported for plant_type steam.
fuel_consumption_gas_cubic_feet	number	Annual cubic feet of natural gas consumed for fuel.This field is only reported for plant_type’s combined_cycle, combined_cycle and steam.
fuel_consumption_oil_gallons	number	Annual gallons of oil consumed for fuel.This field is only reported for plant_type’s combined_cycle, combined_cycle and steam.
fuel_consumption_other	number	Annual other fuel consumed. Neither units nor type of fuel are documented.This field is only reported for plant_type’s combined_cycle, combined_cycle and steam.
gross_generation_mwh	number	Gross electricity generation for the specified period in megawatt-hours (MWh).
is_full_ownership_portion	boolean	Whether or not the plant record represents the full plant - regardless of whether its fully owned by the borrower.
is_partly_owned_by_borrower	boolean	Whether or not the plant record is partially owned by the borrower. This column was not reported before 2009.
operating_hours_in_service	number	Number of operating hours in service.
operating_hours_on_standby	number	Number of operating hours on standby.
operating_hours_out_of_service_scheduled	number	Number of operating hours out of service which were scheduled.
operating_hours_out_of_service_unscheduled	number	Number of operating hours out of service which were unscheduled.
ownership_pct	number	Percentage of the plant owned by the respondent.
plant_name_rus	string	Name of the plant as reported to RUS.
plant_type	string	Type of plant.
report_date	date	Date reported.
times_started	number	Number of times the plant was started. This field is only reported for plant_type’s steam and nuclear.
unit_id_rus	integer	RUS-assigned unit identification code.

core_rus12__yearly_plant_operations_by_plant#

Noticed any irregularities with this table? File a report.

Annual time series of whole plant operational data including fuel consumption and operational hours.

Most-recent data:: 2021
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part D, E, F (CC), F (IC) & G - Section A)
Primary key:: This table has no primary key. This table has no primary key because there are a handful of plants that have duplicate records. The primary key of this table otherwise would be: [report_date, borrower_id_rus, plant_name_rus, plant_name_rus, unit_id_rus, plant_type, is_full_ownership_portion, is_partly_owned_by_borrower].

Additional Details#

The data in this table comes from five different portions of RUS 12 corresponding to different plant types (steam, hydroelectric, combined_cycle, internal_combustion and nuclear). The original form includes plant operations data for each plant type reported in multiple ways: with records corresponding to the portion of plants that borrowers own as well as the whole plant. We split these records into two tables, core_rus12__yearly_plant_operations_by_borrower, which contains records reported at the borrower level, and core_rus12__yearly_plant_operations_by_plant, which contains records reported at the plant level.Records that are wholly owned by one borrower show up in both core_rus12__yearly_plant_operations_by_borrower and core_rus12__yearly_plant_operations_by_plant.

There are two boolean columns used to delineate which records are associated with the borrowers’ share vs the whole plant - which is documented in _OR_PowerSupply Plant File Documentation.rtf in the newer years in the RUS 12 archive. One of these two fields - is_partly_owned_by_borrower - was not reported before 2009. For the pre-2009 years, we assume that all records that report TRUE for is_full_ownership_portion should end up in the by-plant table while all records should end up in the by-borrower portion of the table.Like the post-2009 records, this involves records from the original tables ending up in both of these PUDL tables.

Also there are two cleaning steps that we performed that alter the original data slightly:

One dropped with unexpected ownership label and duplicate data: There is a Wisdom steam plant record that is labeled to be both fully owned by borrower and partly owned for one year, which is an unexpected combo based on the _OR_PowerSupply Plant File Documentation.rst documentation file in the rus12 archive. Luckily this plant has exactly the same records as the other Wisdom steam plant that year with more expected ownership labels.* Replaced two string values for unit_id_rus: There are two instances from 2018 of unit_id_rus’s that have string values in them - WSL GT 12 and WSL ST 10. Based on pre-cleaned data, we were able to clearly identify that we can use just the numeric values in these bad strings - 12 and 10 respectively. This enables us to have an integer type for this unit_id_rus column..

Plant-level data can be matched to the :ref:core_rus12__yearly_plant_labor` and forthcoming core_rus12__yearly_plant_factors_and_maximum_demand, which report Section B and C data for all plants. The RUS instructions note that:

you should use caution when using total plant data since there are cases where more than one Borrower shares units at the same plant which means that you will be getting duplicate plant total records (and there is no guarantee that the total plant records entered by two borrowers for the same plant will be identical).

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_plant_operations_by_plant.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_plant_operations_by_plant.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus12__yearly_plant_operations_by_plant.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_plant_operations_by_plant.parquet"
)

Columns#

Field Name	Type	Description
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
capacity_mw	number	Total installed (nameplate) capacity, in megawatts.
fuel_consumption_coal_lbs	number	Annual pounds of coal consumed for fuel.This field is only reported for plant_type steam.
fuel_consumption_gas_cubic_feet	number	Annual cubic feet of natural gas consumed for fuel.This field is only reported for plant_type’s combined_cycle, combined_cycle and steam.
fuel_consumption_oil_gallons	number	Annual gallons of oil consumed for fuel.This field is only reported for plant_type’s combined_cycle, combined_cycle and steam.
fuel_consumption_other	number	Annual other fuel consumed. Neither units nor type of fuel are documented.This field is only reported for plant_type’s combined_cycle, combined_cycle and steam.
gross_generation_mwh	number	Gross electricity generation for the specified period in megawatt-hours (MWh).
is_partly_owned_by_borrower	boolean	Whether or not the plant record is partially owned by the borrower. This column was not reported before 2009.
operating_hours_in_service	number	Number of operating hours in service.
operating_hours_on_standby	number	Number of operating hours on standby.
operating_hours_out_of_service_scheduled	number	Number of operating hours out of service which were scheduled.
operating_hours_out_of_service_unscheduled	number	Number of operating hours out of service which were unscheduled.
ownership_pct	number	Percentage of the plant owned by the respondent.
plant_name_rus	string	Name of the plant as reported to RUS.
plant_type	string	Type of plant.
report_date	date	Date reported.
times_started	number	Number of times the plant was started. This field is only reported for plant_type’s steam and nuclear.
unit_id_rus	integer	RUS-assigned unit identification code.

core_rus12__yearly_renewable_plants#

Noticed any irregularities with this table? File a report.

Annual time series of renewable energy plant generation information for RUS borrowers.

Most-recent data:: 2021
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part C RE)
Primary key:: report_date, borrower_id_rus, plant_name_rus, prime_mover_id

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_renewable_plants.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_renewable_plants.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus12__yearly_renewable_plants.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_renewable_plants.parquet"
)

Columns#

Field Name	Type	Description
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
capacity_factor	number	Fraction of potential generation that was actually reported for a plant part. Energy generated over time period / nameplate capacity * time period (hours/years/etc.).
capacity_mw	number	Total installed (nameplate) capacity, in megawatts.
comments	string	General comments field.
employees_num	integer	Number of employees.
invested	number	The amount of money invested.
net_generation_mwh	number	Net electricity generation for the specified period in megawatt-hours (MWh).
opex_per_mwh	number	Total production expenses (USD per MWh generated).
ownership_pct	number	Percentage of the plant owned by the respondent.
plant_name_rus	string	Name of the plant as reported to RUS.
power_cost_per_mwh	number	The cost of power per mwh.
primary_renewable_fuel_type	string	Primary renewable fuel type used by the plant.
primary_renewable_fuel_type_id	integer	Unique numeric identifier for each renewable fuel type.
prime_mover_id	integer	Unique numeric identifier for each prime mover type used by RUS borrowers.
prime_mover_type	string	Type of prime mover (e.g. Hydro, Internal Combustion).
renewable_fuel_pct	number	Percentage of renewable fuel used.
report_date	date	Date reported.
rus_funding	number	Amount of funding received from the Rural Utilities Service (RUS).

core_rus12__yearly_sources_and_distribution#

Noticed any irregularities with this table? File a report.

Annual time series of MWh and cost of energy sources and distribution by RUS borrowers.

Most-recent data:: 2021
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part C)
Primary key:: report_date, borrower_id_rus, source_of_energy

Additional Details#

See the sources_and_distribution_by_plant_type table for a breakdown of plant-type-specific cost, capacity, plant_num, and net_energy_received values. Also note that there are several source_of_energy values that don’t have a corresponding cost value.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_sources_and_distribution.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_sources_and_distribution.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus12__yearly_sources_and_distribution.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_sources_and_distribution.parquet"
)

Columns#

Field Name	Type	Description
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
cost	number	Cost value.
net_energy_received_mwh	number	The net amount of energy received into the system.
report_date	date	Date reported.
source_of_energy	string	The source of energy (not plant type).

core_rus12__yearly_sources_and_distribution_by_plant_type#

Noticed any irregularities with this table? File a report.

Annual time series of capacity, plant num, MWh, and cost of energy by plant type for RUS borrowers.

Most-recent data:: 2021
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part C)
Primary key:: report_date, borrower_id_rus, plant_type

Additional Details#

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_sources_and_distribution_by_plant_type.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_sources_and_distribution_by_plant_type.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus12__yearly_sources_and_distribution_by_plant_type.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_sources_and_distribution_by_plant_type.parquet"
)

Columns#

Field Name	Type	Description
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
capacity_mw	number	Total installed (nameplate) capacity, in megawatts.
cost	number	Cost value.
net_energy_received_mwh	number	The net amount of energy received into the system.
plant_num	integer	Number of plants.
plant_type	string	Type of plant.
report_date	date	Date reported.

core_rus12__yearly_statement_of_operations#

Noticed any irregularities with this table? File a report.

Annual time series of opex and cost of electric service for RUS borrowers by time period.

Most-recent data:: 2021
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part A - Section A)
Primary key:: report_date, borrower_id_rus, opex_group, opex_type

Usage Warnings#

Some columns contain subtotals; use caution when choosing columns to aggregate.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_statement_of_operations.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_statement_of_operations.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus12__yearly_statement_of_operations.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_statement_of_operations.parquet"
)

Columns#

Field Name	Type	Description
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
is_total	boolean	Whether or not this record represents a total.
opex_group	string	High level section from the statement of operations table. Most of these types have subcomponents broken out in the `opex_type` column.
opex_report_month	number	Amount of operational expense, cost or income during the report month.
opex_type	string	Type of item from the statement of operations.
opex_ytd	number	The year-to-date amount of operational expense, cost or income.
opex_ytd_budget	number	The year-to-date budget for amount of operational expense, cost or income.
report_date	date	Date reported.

core_rus12__yearly_utility_plant_changes#

Noticed any irregularities with this table? File a report.

Annual time series of changes in utility plant

Most-recent data:: 2021
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part H - Section A)
Primary key:: report_date, borrower_id_rus, utility_plant_group, utility_plant_item

Usage Warnings#

Some columns contain subtotals; use caution when choosing columns to aggregate.

Additional Details#

There are a few layers of nesting in this table, which can be distinguished by the categories in the utility_plant_group column:

intangible_plant: only has one total utility_plant_item

production_plant: has several sub-component utility_plant_item and a standard total

transmission_plant: has several sub-component utility_plant_item and a standard total

distribution_plant: has several sub-component utility_plant_item and a standard total

electric_plant_in_service: has two sub-components and a total which includes those sub-components and the totals of the above categories

utility_plant_in_service: has several sub-components and the total is the sum of those sub-components as well as the total of electric_plant_in_service

total_utility_plant: has one sub-component and the total is the sum of the sub-component and utility_plant_in_service

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_utility_plant_changes.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_utility_plant_changes.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus12__yearly_utility_plant_changes.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_utility_plant_changes.parquet"
)

Columns#

Field Name	Type	Description
additions	number	Cost of acquisition of items classified within the account.
adjustments_and_transfers	number	Amount of adjustments and transfers within a class of assets.
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
ending_balance	number	Account balance at end of year.
is_total	boolean	Whether or not this record represents a total.
report_date	date	Date reported.
retirements	number	Cost of disposal of items classified within the account.
utility_plant_group	string	High-level category of utility plant asset type.
utility_plant_item	string	Sub-category of utility_plant_group describing utility plant asset item.

core_rus7__entity_borrowers#

Noticed any irregularities with this table? File a report.

Entity table containing static information about active RUS borrowers.

Most-recent data:: 2021
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: USDA RUS Form 7 – Financial and Operating Report: Electric Distribution
Primary key:: borrower_id_rus

Usage Warnings#

Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Additional Details#

This table contains canonical values for borrowers are set. It contains values which are expected to remain fixed over time.RUS reports many attributes in many different tables across throughout RUS-7 and RUS-12. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. For the borrower_name_rus PUDL chooses the most consistently reported value regardless of if it meets this 70% threshold so that all borrowers will have a name. We chose this because most name changes were insignificant (eg. “and” changed to “&” or “coop” changed to “cooperative”). All tables downstream of this one inherit the canonical values established here..

This table contains all of the Active Distribution Borrowers as of each report year who were eligible to report to RUS Form 7. If these Borrowers have reported to RUS they will have records in the enclosed data tables, however a small number of these Borrowers did not report for various reasons and these Borrowers will not be represented in any of the other tables.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__entity_borrowers.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__entity_borrowers.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus7__entity_borrowers.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__entity_borrowers.parquet"
)

Columns#

Field Name	Type	Description
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
borrower_name_rus	string	The name of the RUS (Rural Utilities Service) borrower.
state	string	Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

core_rus7__yearly_balance_sheet_assets#

Noticed any irregularities with this table? File a report.

Annual time series of assets and other debts from the balance sheet.

Most-recent data:: 2021
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part C)
Primary key:: report_date, borrower_id_rus, asset_type

Usage Warnings#

Some columns contain subtotals; use caution when choosing columns to aggregate.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_balance_sheet_assets.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_balance_sheet_assets.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus7__yearly_balance_sheet_assets.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_balance_sheet_assets.parquet"
)

Columns#

Field Name	Type	Description
asset_type	string	Type of asset being reported to the core_rus7__yearly_balance_sheet_assets table.
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
ending_balance	number	Account balance at end of year.
is_total	boolean	Whether or not this record represents a total.
report_date	date	Date reported.

core_rus7__yearly_balance_sheet_liabilities#

Noticed any irregularities with this table? File a report.

Annual time series of liabilities and other credits from the balance sheet.

Most-recent data:: 2021
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part C)
Primary key:: report_date, borrower_id_rus, liability_type

Usage Warnings#

Some columns contain subtotals; use caution when choosing columns to aggregate.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_balance_sheet_liabilities.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_balance_sheet_liabilities.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus7__yearly_balance_sheet_liabilities.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_balance_sheet_liabilities.parquet"
)

Columns#

Field Name	Type	Description
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
ending_balance	number	Account balance at end of year.
is_total	boolean	Whether or not this record represents a total.
liability_type	string	Type of liability being reported to the core_rus7__yearly_balance_sheet_liabilities table.
report_date	date	Date reported.

core_rus7__yearly_customer_energy_efficiency_and_conservation_loans#

Noticed any irregularities with this table? File a report.

Annual time series of the repayment status of loans made by a borrower to customers for investments in energy efficiency and conservation initiatives.

Most-recent data:: 2021
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part J)
Primary key:: report_date, borrower_id_rus, loan_status

Additional Details#

Energy Resources Conservation (ERC) loans are defined by the RUS as loans made by an RUS borrower to its consumers for the cost of labor and materials for the following energy conservation measures: caulking, weather-stripping, ceiling insulation, wall insulation, floor insulation, duct insulation, pipe insulation, water heater insulation storm windows, thermal windows, storm or thermal doors, clock thermostats and attic ventilation fans.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_customer_energy_efficiency_and_conservation_loans.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_customer_energy_efficiency_and_conservation_loans.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus7__yearly_customer_energy_efficiency_and_conservation_loans.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_customer_energy_efficiency_and_conservation_loans.parquet"
)

Columns#

Field Name	Type	Description
actual_pct	number	Observed percentage.
anticipated_pct	number	Expected percentage.
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
loan_status	string	The repayment status of a loan.
report_date	date	Date reported.
ytd_dollars	number	Balance this current year, in U.S. dollars.

core_rus7__yearly_distribution_services#

Noticed any irregularities with this table? File a report.

Annual time series of distribution services.

Most-recent data:: 2021
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part B)
Primary key:: report_date, borrower_id_rus, service_status

Usage Warnings#

Some columns contain subtotals; use caution when choosing columns to aggregate.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_distribution_services.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_distribution_services.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus7__yearly_distribution_services.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_distribution_services.parquet"
)

Columns#

Field Name	Type	Description
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
is_total	boolean	Whether or not this record represents a total.
report_date	date	Date reported.
service_status	string	Status of services (e.g., idle, retired) in report period. Idle services exclude seasonals.
services	number	Number of services in system at end of year.

core_rus7__yearly_employee_statistics#

Noticed any irregularities with this table? File a report.

Annual time series of statistics about employment and payroll.

Most-recent data:: 2021
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part H)
Primary key:: report_date, borrower_id_rus

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_employee_statistics.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_employee_statistics.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus7__yearly_employee_statistics.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_employee_statistics.parquet"
)

Columns#

Field Name	Type	Description
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
employee_hours_worked_over_time	integer	The number of overtime hours worked by employees.
employee_hours_worked_regular_time	integer	The number of regular (non-overtime) hours worked by employees.
employees_fte_num	integer	The number of full time employees.
payroll_capitalized	integer	The amount of payroll spent that was capitalized.
payroll_expensed	integer	The amount of payroll spent that was expensed.
payroll_other	integer	The amount of payroll spent that was funded by other means - not capitalized or expensed.
report_date	date	Date reported.

core_rus7__yearly_energy_efficiency#

Noticed any irregularities with this table? File a report.

Annual time series of investments in and impacts of energy efficiency programs.

Most-recent data:: 2021
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part P)
Primary key:: report_date, borrower_id_rus, customer_class, observation_period

Usage Warnings#

The savings_mmbtu likely contains values with incorrect units.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_energy_efficiency.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_energy_efficiency.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus7__yearly_energy_efficiency.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_energy_efficiency.parquet"
)

Columns#

Field Name	Type	Description
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
customer_class	string	High level categorization of customer type.
customers_num	number	Number of customers.
invested	number	The amount of money invested.
observation_period	string	The date range that any given record pertains to. Ex: ‘december’ implies that this record covers the month of December only, while ‘avg’ implies this record pertains to the average of the reporting period.
report_date	date	Date reported.
savings_mmbtu	number	The estimated amount of energy savings from energy efficiency programs. Warning: We found values much larger than expected that we have not yet cleaned - this is likely a reporting unit error.

core_rus7__yearly_energy_purchased#

Noticed any irregularities with this table? File a report.

Annual time series of energy purchased by RUS borrowers.

Most-recent data:: 2020
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part K)
Primary key:: This table has no primary key. The primary key would probably be report_date, borrower_id_rus, fuel_type_code, supplier_code_rus, renewable_energy_program if not for certain EIA utilities represented as Miscellaneous (supplier code 700000).

Additional Details#

A number of records report purchases from an unspecified entity. These fields all have reported utility names beginning with an *, including *Miscellaneous, *Adjustments, and *Residential Renewable Supplier.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_energy_purchased.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_energy_purchased.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus7__yearly_energy_purchased.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_energy_purchased.parquet"
)

Columns#

Field Name	Type	Description
average_energy_cost_dollars_per_mwh	number	The average cost of energy per MWh.
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
comments	string	General comments field.
fuel_cost_adjustment	number	The variable fuel surcharge component of a distribution cooperative’s wholesale purchased power bill, reflecting pass-through of actual fuel cost fluctuations from the supplying utility, reported separately from base power charges and wheeling costs. Included in the total cost.
fuel_type_code_rus	integer	Unique numeric identifier for RUS fuel types.
is_supplier_eia_respondent	boolean	Whether the utility supplying energy to a RUS borrower is an EIA respondent.
purchased_energy_cost_total	number	The total cost of purchased energy. Includes fuel cost adjustment and wheeling and other charges.
purchased_mwh	number	The total electricity purchased.
report_date	date	Date reported.
supplier_code_rus	string	Unique numeric identifier for the utility supplying energy to a RUS borrower.
utility_name_eia	string	The name of the utility.
wheeling_and_other_charges	number	The cost of wheeling and other charges or credits related to fuel. Included in the total cost.

core_rus7__yearly_external_financial_risk_ratio#

Noticed any irregularities with this table? File a report.

Annual time series of ratio of investments and loan guarantee balances to total utility plant assets.

Most-recent data:: 2021
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part Q - Section III)
Primary key:: report_date, borrower_id_rus

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_external_financial_risk_ratio.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_external_financial_risk_ratio.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus7__yearly_external_financial_risk_ratio.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_external_financial_risk_ratio.parquet"
)

Columns#

Field Name	Type	Description
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
external_financial_risk_ratio	number	total investments + loan guarantee balances / total utility plant assets. This ratio shows how much a utility is financially exposed to outside entities relative to its own assets.
report_date	date	Date reported.

core_rus7__yearly_investments#

Noticed any irregularities with this table? File a report.

Annual time series of investments, loan guarantees and loans.

Most-recent data:: 2021
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part P - Section I)
Primary key:: This table has no primary key. This is a list of all investments or loans in each year and borrowers can have multiple records with the same investment_description.

Additional Details#

Reporting of investments is required by 7 CFR 1717, Subpart N. Investment categories reported on this Part correspond to Balance Sheet items in Part C.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_investments.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_investments.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus7__yearly_investments.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_investments.parquet"
)

Columns#

Field Name	Type	Description
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
excluded_investments	number	Excluded investment.
for_rural_development	boolean	Whether or not the investment or loan is for rural development. This includes investments in any/all types of projects or products that were made to improve the economy and/or quality of life in the specified area.
included_investments	number	Included investment.
income_or_loss	number	Income or loss from investment.
investment_description	string	Description of investment. This is a free-form text field and thus contains a wide variety of values.
investment_type_code	integer	Investment type code.
report_date	date	Date reported.

core_rus7__yearly_loans#

Noticed any irregularities with this table? File a report.

Annual time series of loans provided by RUS borrowers.

Most-recent data:: 2021
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part Q - Sections II & IV)
Primary key:: This table has no primary key. Borrowers may receive multiple loans from lending_organizations in a given year.

Additional Details#

This table also includes loan guarantees where the RUS borrower backs a loan from another entity and is therefore liable to pay any remaining balance should the original borrower default.

In 2006, the loan maturity date for borrower ND0051’s loan from ERC - Paulson, David was reported as 2/8/2820. There is no clear way to determine the correct maturity date given that 2006 is the first year of data we have and the same loan does not appear in future years. For this reason we’ve nulled the date.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_loans.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_loans.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus7__yearly_loans.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_loans.parquet"
)

Columns#

Field Name	Type	Description
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
for_rural_development	boolean	Whether or not the investment or loan is for rural development. This includes investments in any/all types of projects or products that were made to improve the economy and/or quality of life in the specified area.
is_loan_guarantee	boolean	Indicates a third-party loan that the reporting utility (referred as a borrower) has co-signed, taking on responsibility for repayment if the primary borrower defaults.
loan_balance	number	The amount of money still owned on a loan at the end of the reporting year.
loan_maturity_date	date	The date on which a loan is scheduled to be fully paid.
loan_original_amount	number	The original amount of a loan.
loan_recipient	string	The organization that received a loan.
report_date	date	Date reported.

core_rus7__yearly_long_term_debt#

Noticed any irregularities with this table? File a report.

Annual time series of long term debt and debt service requirements.

Most-recent data:: 2021
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part N)
Primary key:: This table has no primary key. This table has no native primary key. It is a list of all debts in each year and borrowers can have multiple records with the same investment_description.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_long_term_debt.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_long_term_debt.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus7__yearly_long_term_debt.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_long_term_debt.parquet"
)

Columns#

Field Name	Type	Description
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
debt_description	string	Description of debt or loan. On the original form, there are nine provided descriptions and a section to add other free-form descriptions.
debt_ending_balance	number	The amount of principal still owned on the debt at the end of the report year.
debt_interest	number	The interest expense on the debt for the report year.
debt_principal	number	The principal paid on the debt during the report year.
debt_total	number	The total amount of debt.
report_date	date	Date reported.

core_rus7__yearly_long_term_leases#

Noticed any irregularities with this table? File a report.

Annual time series of long term leases by property type.

Most-recent data:: 2021
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part L)
Primary key:: This table has no primary key. Borrowers may receive multiple leases from lending_organizations in a given year.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_long_term_leases.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_long_term_leases.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus7__yearly_long_term_leases.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_long_term_leases.parquet"
)

Columns#

Field Name	Type	Description
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
lending_organization	string	The organization that provided a lease or loan.
property_type	string	The type of property leased.
rental_cost_ytd	number	Year-to-date rental cost for leased property.
report_date	date	Date reported.

core_rus7__yearly_materials_and_supplies#

Noticed any irregularities with this table? File a report.

Annual time series of cost of electric vs. other materials that were purchased, salvaged, used, or sold.

Most-recent data:: 2021
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part F)
Primary key:: report_date, borrower_id_rus, electric_or_other_materials

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_materials_and_supplies.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_materials_and_supplies.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus7__yearly_materials_and_supplies.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_materials_and_supplies.parquet"
)

Columns#

Field Name	Type	Description
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
electric_or_other_materials	string	Whether the cost is for electric materials or other materials
materials_adjustment	number	An adjustment value for the cost of materials and supplies.
materials_ending_balance	number	The balance at the end of the report year for materials and supplies.
materials_purchased	number	The cost of materials and supplies purchased.
materials_salvaged	number	The cost of materials and supplies salvaged.
materials_sold	number	The cost of materials and supplies sold.
materials_used	number	The cost of materials and supplies used.
report_date	date	Date reported.

core_rus7__yearly_meeting_and_board#

Noticed any irregularities with this table? File a report.

Annual time series of governance information about RUS borrowers’ annual member meetings as well as information about their board.

Most-recent data:: 2021
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part M)
Primary key:: report_date, borrower_id_rus

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_meeting_and_board.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_meeting_and_board.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus7__yearly_meeting_and_board.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_meeting_and_board.parquet"
)

Columns#

Field Name	Type	Description
board_members_num	integer	The total number of board members.
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
does_manager_have_written_contract	boolean	Whether or not the RUS borrower’s manager has a written contract.
fees_and_expenses_for_board_members	integer	The total amount of fees and expenses for board members.
last_annual_meeting_date	datetime	The date of the last annual meeting.
members_num	integer	The total number of members.
members_present_at_meeting_num	integer	The number of members present at the last annual meeting.
members_voting_by_proxy_or_mail_num	integer	The number of members voting by mail or by proxy.
report_date	date	Date reported.
was_quorum_present	boolean	Whether or not quorum was met.

core_rus7__yearly_owed_by_customers#

Noticed any irregularities with this table? File a report.

Annual time series of debt owed by customers.

Most-recent data:: 2021
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part J)
Primary key:: report_date, borrower_id_rus

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_owed_by_customers.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_owed_by_customers.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus7__yearly_owed_by_customers.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_owed_by_customers.parquet"
)

Columns#

Field Name	Type	Description
amount_due_over_60_days	number	Reported amount of dollars due over 60 days from consumers for electric service. Includes both connected and disconnected customers.
amount_written_off_ytd	number	Total charges due from consumers for electric service written off during the current year to Account 144.1, representing the write-off of uncollectible accounts.
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
report_date	date	Date reported.

core_rus7__yearly_patronage_capital#

Noticed any irregularities with this table? File a report.

Annual time series of patronage capital distributed and received.

Most-recent data:: 2021
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part I)
Primary key:: report_date, borrower_id_rus, patronage_type

Usage Warnings#

Some columns contain subtotals; use caution when choosing columns to aggregate.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_patronage_capital.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_patronage_capital.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus7__yearly_patronage_capital.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_patronage_capital.parquet"
)

Columns#

Field Name	Type	Description
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
is_total	boolean	Whether or not this record represents a total.
patronage_cumulative	number	Amount of patronage distributed or received cumulatively. Received patronage capital is not reported cumulatively and thus will be null.
patronage_report_year	number	Amount of patronage distributed or received within report year.
patronage_type	string	Type of patronage capital distributed or received.
report_date	date	Date reported.

core_rus7__yearly_power_requirements#

Noticed any irregularities with this table? File a report.

Annual time series of power requirements - revenue and generation summary.

Most-recent data:: 2021
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part O)
Primary key:: report_date, borrower_id_rus

Additional Details#

This table includes totals of electric sales and revenue which also appear in core_rus7__yearly_power_requirements_electric_sales with a customer_class of total. This table includes all other power requirements - not broken out by customer class, so we include these electric requirements in this table as well.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_power_requirements.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_power_requirements.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus7__yearly_power_requirements.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_power_requirements.parquet"
)

Columns#

Field Name	Type	Description
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
electric_sales_mwh	number	Total MWh Sold to electric sales. Total of lines 1b thru 9b on the original form.
electric_sales_revenue	integer	Total Revenue Received From Sales of Electric Energy. Total of lines 1c thru 9c on the original form.
generated_mwh	number	The total electricity generated.
interchange_mwh	number	The net interchange of electricity. The net amount of electricity exchanged in purchases and sales.
is_peak_coincident	boolean	Whether or not the peak_mw is coincident or non-coincident peak.
other_electric_revenue	integer	Electric revenue other than electric_sales_revenue.
own_use_mwh	number	The electricity in MWh used for the borrower’s own internal use.
peak_mw	number	The peak system MWh - the sum of all MW.
purchased_mwh	number	The total electricity purchased.
purchases_and_generation_cost	integer	The cost of purchases and generation of electricity.
report_date	date	Date reported.
transmission_revenue	integer	Transmission revenue.

core_rus7__yearly_power_requirements_electric_customers#

Noticed any irregularities with this table? File a report.

Annual time series of power requirements - number of customers served by customer type.

Most-recent data:: 2021
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part O)
Primary key:: report_date, borrower_id_rus, customer_class, observation_period

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_power_requirements_electric_customers.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_power_requirements_electric_customers.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus7__yearly_power_requirements_electric_customers.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_power_requirements_electric_customers.parquet"
)

Columns#

Field Name	Type	Description
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
customer_class	string	High level categorization of customer type.
customers_num	number	Number of customers.
observation_period	string	The date range that any given record pertains to. Ex: ‘december’ implies that this record covers the month of December only, while ‘avg’ implies this record pertains to the average of the reporting period.
report_date	date	Date reported.

core_rus7__yearly_power_requirements_electric_sales#

Noticed any irregularities with this table? File a report.

Annual time series of power requirements - revenue and energy sold by customer type.

Most-recent data:: 2021
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part O)
Primary key:: report_date, borrower_id_rus, customer_class

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_power_requirements_electric_sales.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_power_requirements_electric_sales.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus7__yearly_power_requirements_electric_sales.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_power_requirements_electric_sales.parquet"
)

Columns#

Field Name	Type	Description
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
customer_class	string	High level categorization of customer type.
report_date	date	Date reported.
revenue	number	Amount of revenue.
sales_mwh	number	Quantity of electricity sold in MWh.

core_rus7__yearly_service_interruptions#

Noticed any irregularities with this table? File a report.

Annual time series of service interruptions by cause.

Most-recent data:: 2021
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part G)
Primary key:: report_date, borrower_id_rus, service_interruption_cause, observation_period

Usage Warnings#

Some columns contain subtotals; use caution when choosing columns to aggregate.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_service_interruptions.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_service_interruptions.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus7__yearly_service_interruptions.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_service_interruptions.parquet"
)

Columns#

Field Name	Type	Description
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
is_total	boolean	Whether or not this record represents a total.
observation_period	string	The date range that any given record pertains to. Ex: ‘december’ implies that this record covers the month of December only, while ‘avg’ implies this record pertains to the average of the reporting period.
report_date	date	Date reported.
saidi_minutes	number	Cumulative duration (minutes) of interruption for the average customer during the report year.
service_interruption_cause	string	Source of service interruption.

core_rus7__yearly_statement_of_operations#

Noticed any irregularities with this table? File a report.

Annual time series of opex and cost of electric service for RUS borrowers by time period.

Most-recent data:: 2021
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part A)
Primary key:: report_date, borrower_id_rus, opex_group, opex_type

Usage Warnings#

Some columns contain subtotals; use caution when choosing columns to aggregate.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_statement_of_operations.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_statement_of_operations.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus7__yearly_statement_of_operations.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_statement_of_operations.parquet"
)

Columns#

Field Name	Type	Description
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
is_total	boolean	Whether or not this record represents a total.
opex_group	string	High level section from the statement of operations table. Most of these types have subcomponents broken out in the `opex_type` column.
opex_report_month	number	Amount of operational expense, cost or income during the report month.
opex_type	string	Type of item from the statement of operations.
opex_ytd	number	The year-to-date amount of operational expense, cost or income.
opex_ytd_budget	number	The year-to-date budget for amount of operational expense, cost or income.
report_date	date	Date reported.

core_rus7__yearly_transmission_and_distribution_mileage#

Noticed any irregularities with this table? File a report.

Annual time series of miles of transmission and distribution infrastructure.

Most-recent data:: 2021
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part B)
Primary key:: report_date, borrower_id_rus, line_type

Usage Warnings#

Some columns contain subtotals; use caution when choosing columns to aggregate.

Additional Details#

Note that according to RUS, ‘underbuild in overhead lines or joint runs in underground installations do not increase the number of line miles except for distribution underbuild on transmission poles. In such cases, distribution pole line miles would be increased by the number of underbuild miles involved.’

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_transmission_and_distribution_mileage.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_transmission_and_distribution_mileage.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus7__yearly_transmission_and_distribution_mileage.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_transmission_and_distribution_mileage.parquet"
)

Columns#

Field Name	Type	Description
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
is_total	boolean	Whether or not this record represents a total.
line_type	string	The type of line mileage reported (e.g., transmission, overhead distribution).
miles	number	Line length at the end of the reported period, in miles.
report_date	date	Date reported.

core_rus7__yearly_utility_plant_changes#

Noticed any irregularities with this table? File a report.

Annual time series of changes in utility plant

Most-recent data:: 2021
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part E)
Primary key:: report_date, borrower_id_rus, utility_plant_group, utility_plant_item

Usage Warnings#

Some columns contain subtotals; use caution when choosing columns to aggregate.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_utility_plant_changes.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_utility_plant_changes.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus7__yearly_utility_plant_changes.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_utility_plant_changes.parquet"
)

Columns#

Field Name	Type	Description
additions	number	Cost of acquisition of items classified within the account.
adjustments_and_transfers	number	Amount of adjustments and transfers within a class of assets.
borrower_id_rus	string	Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.
ending_balance	number	Account balance at end of year.
is_total	boolean	Whether or not this record represents a total.
report_date	date	Date reported.
retirements	number	Cost of disposal of items classified within the account.
utility_plant_group	string	High-level category of utility plant asset type.
utility_plant_item	string	Sub-category of utility_plant_group describing utility plant asset item.

core_rus__codes_fuel_types#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for fuel types in RUS.

Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: RUS – Mix of multiple RUS forms
Primary key:: code

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus__codes_fuel_types.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus__codes_fuel_types.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus__codes_fuel_types.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus__codes_fuel_types.parquet"
)

Columns#

Field Name	Type	Description
code	string	Originally reported short code.
description	string	Long human-readable description of the meaning of a code/label.

core_rus__codes_investment_types#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for investment types.

Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: RUS – Mix of multiple RUS forms
Primary key:: code

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus__codes_investment_types.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus__codes_investment_types.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus__codes_investment_types.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus__codes_investment_types.parquet"
)

Columns#

Field Name	Type	Description
code	string	Originally reported short code.
description	string	Long human-readable description of the meaning of a code/label.

core_sec10k__assn_exhibit_21_subsidiaries_and_eia_utilities#

Noticed any irregularities with this table? File a report.

Association table providing connections between subsidiaries listed in Exhibit 21 and EIA utilities.

Most-recent data:: 2023
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: U.S. Securities and Exchange Commission (SEC) Form 10-K
Primary key:: subsidiary_company_id_sec10k

Usage Warnings#

PUDL does not currently update its copy of this data.

Additional Details#

An Exhibit 21 subsidiary is considered matched to an EIA utility if their names are identical. Only subsidiaries that don’t file SEC 10-K themselves are included in this table. SEC 10-K filers have much more information available and are matched to EIA utilities using probabilistic record linkage in core_sec10k__assn_sec10k_filers_and_eia_utilities.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_sec10k__assn_exhibit_21_subsidiaries_and_eia_utilities.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_sec10k__assn_exhibit_21_subsidiaries_and_eia_utilities.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_sec10k__assn_exhibit_21_subsidiaries_and_eia_utilities.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_sec10k__assn_exhibit_21_subsidiaries_and_eia_utilities.parquet"
)

Columns#

Field Name	Type	Description
subsidiary_company_id_sec10k	string	PUDL-assigned ID for subsidiaries found in SEC 10-K Exhibit 21. The ID is created by concatenating the CIK of the company whose filing the subsidiary was found in, the subsidiary company’s name, and location of incorporation. It is not guaranteed to be stable across different releases of PUDL and so should never be hard-coded in analyses.
utility_id_eia	integer	The EIA Utility Identification number.

core_sec10k__assn_exhibit_21_subsidiaries_and_filers#

Noticed any irregularities with this table? File a report.

Association table providing connections between subsidiaries listed in Exhibit 21 and their SEC central index key, if the subsidiary also files Form 10-K.

Most-recent data:: 2023
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: U.S. Securities and Exchange Commission (SEC) Form 10-K
Primary key:: subsidiary_company_id_sec10k

Usage Warnings#

PUDL does not currently update its copy of this data.

Additional Details#

Exhibit 21 subsidiaries and SEC 10-K filers are considered matched if they have identical names and the same location of incorporation.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_sec10k__assn_exhibit_21_subsidiaries_and_filers.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_sec10k__assn_exhibit_21_subsidiaries_and_filers.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_sec10k__assn_exhibit_21_subsidiaries_and_filers.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_sec10k__assn_exhibit_21_subsidiaries_and_filers.parquet"
)

Columns#

Field Name	Type	Description
central_index_key	string	Identifier of the company in SEC database.
subsidiary_company_id_sec10k	string	PUDL-assigned ID for subsidiaries found in SEC 10-K Exhibit 21. The ID is created by concatenating the CIK of the company whose filing the subsidiary was found in, the subsidiary company’s name, and location of incorporation. It is not guaranteed to be stable across different releases of PUDL and so should never be hard-coded in analyses.

core_sec10k__assn_sec10k_filers_and_eia_utilities#

Noticed any irregularities with this table? File a report.

Association table providing connections between SEC 10-K filers and EIA utilities.

Most-recent data:: 2023
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: U.S. Securities and Exchange Commission (SEC) Form 10-K
Primary key:: central_index_key, utility_id_eia

Usage Warnings#

This data has been matched probabilistically using a machine learning model and contains some incompletions and errors. It should not be treated as ground truth data.
PUDL does not currently update its copy of this data.

Additional Details#

SEC central index keys are matched to EIA utility IDs using probabilistic record linkage based on associated company information like company name, business and mailing addresses, and state of incorporation. The match between central_index_key and utility_id_eia is one to one and is not allowed to change over time. In cases where there were multiple candidate matches, the match with the highest probability is selected.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_sec10k__assn_sec10k_filers_and_eia_utilities.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_sec10k__assn_sec10k_filers_and_eia_utilities.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_sec10k__assn_sec10k_filers_and_eia_utilities.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_sec10k__assn_sec10k_filers_and_eia_utilities.parquet"
)

Columns#

Field Name	Type	Description
central_index_key	string	Identifier of the company in SEC database.
utility_id_eia	integer	The EIA Utility Identification number.

core_sec10k__changelog_company_name#

Noticed any irregularities with this table? File a report.

Changelog table tracking changes in the names each SEC 10-K filer has used.

Most-recent data:: 2023
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: U.S. Securities and Exchange Commission (SEC) Form 10-K
Primary key:: This table has no primary key.

Usage Warnings#

PUDL does not currently update its copy of this data.

Additional Details#

This table is extracted from the same SEC 10-K filing header information as core_sec10k__quarterly_company_information. Each filing reports the full history of name change associated with a company up to the date of that filing. Because individual companies may appear in multiple filings in the same year, and the same historical name changes will be reported in multiple years, the raw input data contains many duplicate entries, which are deduplicated to create this table. The original name change data only contains the former name and the date of the change.

Roughly 2% of all records describe multiple name changes happening on the same date (they are duplicates on the basis of central_index_key and name_change_date). This may be due to company name reporting inconsistencies or reporting errors in which the old and new company names have been swapped.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_sec10k__changelog_company_name.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_sec10k__changelog_company_name.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_sec10k__changelog_company_name.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_sec10k__changelog_company_name.parquet"
)

Columns#

Field Name	Type	Description
central_index_key	string	Identifier of the company in SEC database.
company_name	string	Name of the reporting company.
company_name_old	string	Name of company prior to name change.
name_change_date	date	Date of last name change of the company.

core_sec10k__quarterly_company_information#

Noticed any irregularities with this table? File a report.

Quarterly time series of company and filing information extracted from SEC 10-K filing headers.

Most-recent data:: 2023
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: U.S. Securities and Exchange Commission (SEC) Form 10-K
Primary key:: filename_sec10k, central_index_key

Usage Warnings#

PUDL does not currently update its copy of this data.

Additional Details#

While the SEC 10-K filings are submitted by a single company, they often contain references to many other related companies. Information about these companies and the filing itself are contained in text headers at the beginning of the filing. This table contains data extracted from those headers. Each record in this table represents a single observation of a company in a single filing. The filer_count indicates which referenced company within a filing header the record corresponds to.

Because the same company may be referenced in a number of different filings submitted in the same reporting period or even on the same day, this table contains apparently duplicative records about many companies, that may be distinguished only by the filename associated with the filing they appeared in and their filer count. Note that all references to a particular company may not be perfectly consistent across all filings in which they appear. The various company names, addresses, and other information associated with the company’s unique and permanent central_index_key are later used as inputs into the probabilistic record linkage process.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_sec10k__quarterly_company_information.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_sec10k__quarterly_company_information.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_sec10k__quarterly_company_information.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_sec10k__quarterly_company_information.parquet"
)

Columns#

Field Name	Type	Description
business_city	string	City where the company’s place of business is located.
business_postal_code	string	Non-US postal code of the company’s place of business.
business_state	string	State where the company’s place of business is located.
business_street_address	string	Street address of the company’s place of business.
business_street_address_2	string	Second line of the street address of the company’s place of business.
business_zip_code	string	Zip code of the company’s place of business.
business_zip_code_4	string	Zip code suffix of the company’s place of business.
central_index_key	string	Identifier of the company in SEC database.
company_name	string	Name of the reporting company.
filename_sec10k	string	Unique portion of the filename associated with the SEC 10-K filing in the EDGAR database. The full source URL can be reconstructed by prepending https://www.sec.gov/Archives/edgar/data/ and adding the .txt file type extension.
filer_count	integer	A counter indicating which observation of company data within an SEC 10-K filing header the record pertains to.
filing_number_sec	string	Filing number used internally by the SEC commission to track filing.
film_number	string	Document control number used in the SEC EDGAR database. The first four digits can be used to access scans of the document in the SEC’s Virtual Private Reference Room.
fiscal_year_end	string	The end date of an SEC filing company’s fiscal year, in MMDD format.
incorporation_state	string	Two letter state code where company is incorporated.
industry_id_sic	string	Four-digit Standard Industrial Classification (SIC) code identifying the company’s primary industry. SIC codes have been replaced by NAICS codes in many applications, but are still used by the SEC. See e.g. https://www.osha.gov/data/sic-manual for code definitions.
industry_name_sic	string	Text description of Standard Industrial Classification (SIC)
mail_city	string	City of the company’s mailing address.
mail_postal_code	string	Non-US postal code of the company’s mailing address.
mail_state	string	State of the company’s mailing address.
mail_street_address	string	Street portion of the company’s mailing address.
mail_street_address_2	string	Second line of the street portion of the company’s mailing address.
mail_zip_code	string	Zip code of the company’s mailing address.
mail_zip_code_4	string	Zip code suffix of the company’s mailing address.
phone_number	string	Phone number for utility contact 1.
sec10k_type	string	Specific version of SEC 10-K that was filed. 10-k: the standard annual report. 10-k/a: an amended version of the annual report. 10-k405: filed to report insider trading that was not reported in a timely fashion. 10-k405/a: an amended version of the 10-k405. 10-kt: submitted in lieu of or in addition to a standard 10-K annual report when a company changes the end of its fiscal year (e.g. due to a merger) leaving the company with a longer or shorter reporting period. 10-kt/a: an amended version of the 10-kt. 10-ksb: the annual report for small businesses, also known as penny stocks. 10-ksb/a: an amended version of the 10-ksb.
sec_act	string	SEC Act through which the form was enacted, e.g. 1934 act.
taxpayer_id_irs	string	Taxpayer ID of the company with the IRS.

core_sec10k__quarterly_exhibit_21_company_ownership#

Noticed any irregularities with this table? File a report.

Quarterly time series of subsidiary company ownership data from the SEC 10-K Exhibit 21 attachments.

Most-recent data:: 2023
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: U.S. Securities and Exchange Commission (SEC) Form 10-K
Primary key:: filename_sec10k, subsidiary_company_id_sec10k

Usage Warnings#

This data has been extracted probabilistically using a machine learning model and contains some incompletions and errors. It should not be treated as ground truth data.
PUDL does not currently update its copy of this data.

Additional Details#

Exhibit 21 is an unstructured text or PDF attachment to the main SEC 10-K filing that is used to describe the subsidiaries owned by the filing company. It may or may not provide the percentage of the subsidiary that is owned by the filing company, or the location of the subsidiary.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_sec10k__quarterly_exhibit_21_company_ownership.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_sec10k__quarterly_exhibit_21_company_ownership.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_sec10k__quarterly_exhibit_21_company_ownership.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_sec10k__quarterly_exhibit_21_company_ownership.parquet"
)

Columns#

Field Name	Type	Description
filename_sec10k	string	Unique portion of the filename associated with the SEC 10-K filing in the EDGAR database. The full source URL can be reconstructed by prepending https://www.sec.gov/Archives/edgar/data/ and adding the .txt file type extension.
fraction_owned	number	Proportion of generator ownership attributable to this utility.
subsidiary_company_id_sec10k	string	PUDL-assigned ID for subsidiaries found in SEC 10-K Exhibit 21. The ID is created by concatenating the CIK of the company whose filing the subsidiary was found in, the subsidiary company’s name, and location of incorporation. It is not guaranteed to be stable across different releases of PUDL and so should never be hard-coded in analyses.
subsidiary_company_location	string	Location of subsidiary company. This is the full US state name or country name and occasionally a two digit code that was not mapped to a full name during cleaning.
subsidiary_company_name	string	Name of subsidiary company.

core_sec10k__quarterly_filings#

Noticed any irregularities with this table? File a report.

Quarterly time series of metadata describing SEC 10-K filings.

Most-recent data:: 2023
Processing:: Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
Source:: U.S. Securities and Exchange Commission (SEC) Form 10-K
Primary key:: filename_sec10k

Usage Warnings#

PUDL does not currently update its copy of this data.

Additional Details#

Each SEC 10-K filing is submitted by a single company, but may contain information about numerous other companies. This table indicates the company submitting the filing, as well as some information about the overall filing. Each filing is guaranteed to have a unique filename, but ~1% of all filings are one company submitting the same form multiple times on the same day, so the filename is the only available natural primary key.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_sec10k__quarterly_filings.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_sec10k__quarterly_filings.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_sec10k__quarterly_filings.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_sec10k__quarterly_filings.parquet"
)

Columns#

Field Name	Type	Description
central_index_key	string	Identifier of the company in SEC database.
company_name	string	Name of the reporting company.
exhibit_21_version	string	Version of exhibit 21 submitted (if applicable).
filename_sec10k	string	Unique portion of the filename associated with the SEC 10-K filing in the EDGAR database. The full source URL can be reconstructed by prepending https://www.sec.gov/Archives/edgar/data/ and adding the .txt file type extension.
filing_date	date	Date on which the filing was submitted.
report_date	date	Date reported.
sec10k_type	string	Specific version of SEC 10-K that was filed. 10-k: the standard annual report. 10-k/a: an amended version of the annual report. 10-k405: filed to report insider trading that was not reported in a timely fashion. 10-k405/a: an amended version of the 10-k405. 10-kt: submitted in lieu of or in addition to a standard 10-K annual report when a company changes the end of its fiscal year (e.g. due to a merger) leaving the company with a longer or shorter reporting period. 10-kt/a: an amended version of the 10-kt. 10-ksb: the annual report for small businesses, also known as penny stocks. 10-ksb/a: an amended version of the 10-ksb.

_core_eia860__cooling_equipment#

Noticed any irregularities with this table? File a report.

Information about cooling equipment at generation facilities.

Most-recent data:: 2024
Processing:: Data has been cleaned but not tidied/normalized. Published only temporarily and may be removed without notice.
Source:: EIA Form 860 – Annual Electric Generator Report (Schedule 6D)
Primary key:: plant_id_eia, utility_id_eia, cooling_id_eia, report_date

Usage Warnings#

This table has not been harvested with other EIA 923 or 860 data. The same variables present in this table may show up in other _core tables in other years.

Additional Details#

A single plant can have several collections of cooling equipment, so each line in this table represents one cooling system at a given plant. You can determine how to link each set of cooling equipment to specific boilers in a plant via core_eia860__assn_boiler_cooling.

EIA-860 respondents are required to submit information about cooling equipment for plants over 100 MW - including steam, combined cycle, nuclear or solar thermal generators.

This table has been cleaned, but not tidied/normalized with other EIA 923 or 860 data. The same variables present in this table may show up in other tables and/or in other years. Once this table has been harvested, it will be converted into a``core`` table.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/_core_eia860__cooling_equipment.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/_core_eia860__cooling_equipment.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/_core_eia860__cooling_equipment.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/_core_eia860__cooling_equipment.parquet"
)

Columns#

Field Name	Type	Description
chlorine_equipment_cost	number	Actual installed cost for the existing chlorine discharge control system or the anticipated cost to bring the chlorine discharge control system into commercial operation
chlorine_equipment_operating_date	date	Actual or projected in-service date for chlorine discharge control structures and equipment
cooling_equipment_total_cost	number	Actual installed cost for the existing system or the anticipated cost to bring the total system into commercial operation
cooling_id_eia	string	EIA Identification code for cooling system (if multiple cooling systems are not distinguished by separate IDs, the word ‘PLANT’ is listed to encompass the cooling system for the entire plant)
cooling_status_code	string	Operating status of cooling system
cooling_system_operating_date	date	The actual or projected in-service datetime of this cooling system
cooling_type_1	string	Type of cooling system
cooling_type_2	string	Type of cooling system
cooling_type_3	string	Type of cooling system
cooling_type_4	string	Type of cooling system
cooling_water_discharge	string	Name of river, lake, or water source that cooling water is discharged into
cooling_water_source	string	Name of river, lake, or water source that provides cooling water
county	string	County name.
intake_distance_shore_feet	number	Maximum distance from shore to intake
intake_distance_surface_feet	number	Average distance below water surface to intake
intake_rate_100pct_gallons_per_minute	number	Design cooling water flow rate at 100 percent load at in-take
outlet_distance_shore_feet	number	Maximum distance from shore to outlet
outlet_distance_surface_feet	number	Average distance below water surface to outlet
percent_dry_cooling	number	Percent of cooling load served by dry cooling components
plant_id_eia	integer	The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.
plant_name_eia	string	Plant name.
plant_summer_capacity_mw	number	The plant summer capacity associated with the operating generators at the plant
pond_cost	number	Actual installed cost for the existing cooling ponds or the anticipated cost to bring the cooling ponds into commercial operation
pond_operating_date	date	Cooling ponds actual or projected in-service date
pond_surface_area_acres	number	Total surface area of cooling pond
pond_volume_acre_feet	number	Total volume of water in cooling pond
power_requirement_mw	number	Maximum power requirement for cooling towers at 100 percent load
report_date	date	Date reported.
state	string	Two letter US state abbreviation.
steam_plant_type_code	integer	Code that describes types of steam plants from EIA 860. See steam_plant_types_eia table for more details.
tower_cost	number	Actual installed cost for the existing cooling towers or the anticipated cost to bring the cooling towers into commercial operation
tower_operating_date	date	Cooling towers actual or projected in-service date
tower_type_1	string	Types of cooling towers at this plant
tower_type_2	string	Types of cooling towers at this plant
tower_type_3	string	Types of cooling towers at this plant
tower_type_4	string	Types of cooling towers at this plant
tower_water_rate_100pct_gallons_per_minute	number	Maximum design rate of water flow at 100 percent load for the cooling towers
utility_id_eia	integer	The EIA Utility Identification number.
utility_name_eia	string	The name of the utility.
water_source	string	Name of water source associated with the plant.
water_source_code	string	Type of cooling water source
water_type_code	string	Type of cooling water

_core_eia860__fgd_equipment#

Noticed any irregularities with this table? File a report.

Information about flue gas desulfurization equipment at generation facilities.

Most-recent data:: 2024
Processing:: Data has been cleaned but not tidied/normalized. Published only temporarily and may be removed without notice.
Source:: EIA Form 860 – Annual Electric Generator Report (Schedule 6E)
Primary key:: plant_id_eia, so2_control_id_eia, report_date

Usage Warnings#

This table has not been harvested with other EIA 923 or 860 data. The same variables present in this table may show up in other _core tables in other years.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/_core_eia860__fgd_equipment.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/_core_eia860__fgd_equipment.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/_core_eia860__fgd_equipment.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/_core_eia860__fgd_equipment.parquet"
)

Columns#

Field Name	Type	Description
byproduct_recovery	boolean	Is saleable byproduct recovered by the unit?
county	string	County name.
county_id_fips	string	County ID from the Federal Information Processing Standard Publication 6-4.
data_maturity	string	Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.
fgd_manufacturer	string	Name of flue gas desulfurization equipment manufacturer.
fgd_manufacturer_code	string	Code corresponding to name of flue gas desulfurization equipment manufacturer.
fgd_operating_date	date	The actual or projected in-service datetime of this flue gas desulfurization system
fgd_operational_status_code	string	Operating status code for flue gas desulfurization equipment.
fgd_other_cost	number	Other actual installed costs for installation of a flue gas desulfurization unit or the anticipated other costs pertaining to the installation of a flue gas desulfurization unit.
fgd_structure_cost	number	Actual installed costs for the existing systems or the anticipated costs of structures and equipment to bring a planned flue gas desulfurization system into commercial operation.
fgd_trains_100pct	number	Total number of flue gas desulfurization unit scrubber trains operated at 100 percent load.
fgd_trains_total	number	Total number of flue gas desulfurization unit scrubber trains.
flue_gas_bypass_fgd	boolean	Indicates whether flue gas can bypass the flue gas desulfurization unit.
flue_gas_entering_fgd_pct_of_total	number	Ratio of all flue gas that is entering the flue gas desulfurization unit.
flue_gas_exit_rate_cubic_feet_per_minute	number	Actual flue gas exit rate, in cubic feet per minute.
flue_gas_exit_temperature_fahrenheit	number	Flue gas exit temperature, in degrees Fahrenheit.
plant_id_eia	integer	The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.
plant_summer_capacity_mw	number	The plant summer capacity associated with the operating generators at the plant
pond_landfill_requirements_acre_foot_per_year	number	Annual pond and land fill requirements for flue gas desulfurization equipment.
report_date	date	Date reported.
sludge_disposal_cost	number	Actual installed costs for the existing sludge transport and disposal systems or the anticipated costs of sludge transport and disposal systems to bring a planned system into commercial operation.
sludge_pond	boolean	Indicates if there is a sludge pond associated with this unit.
sludge_pond_lined	boolean	Indicates whether the sludge pond is lined.
so2_control_id_eia	string	Sulfur dioxide control identification number. This ID is not a unique identifier.
so2_emission_rate_lbs_per_hour	number	Sulfur dioxide emission rate when operating at 100 percent load (pounds per hour).
so2_equipment_type_1	string	Type of sulfur dioxide control equipment.
so2_equipment_type_2	string	Type of sulfur dioxide control equipment.
so2_equipment_type_3	string	Type of sulfur dioxide control equipment.
so2_equipment_type_4	string	Type of sulfur dioxide control equipment.
so2_removal_efficiency_design	number	Designed removal efficiency for sulfur dioxide when operating at 100 percent load. Reported at the nearest 0.1 percent by weight of gases removed from the flue gas.
sorbent_type_1	string	Type of sorbent used by this sulfur dioxide control equipment.
sorbent_type_2	string	Type of sorbent used by this sulfur dioxide control equipment.
sorbent_type_3	string	Type of sorbent used by this sulfur dioxide control equipment.
sorbent_type_4	string	Type of sorbent used by this sulfur dioxide control equipment.
specifications_of_coal_ash	number	Design fuel specifications for ash when burning coal or petroleum coke (nearest 0.1 percent by weight).
specifications_of_coal_sulfur	number	Design fuel specifications for sulfur when burning coal or petroleum coke (nearest 0.1 percent by weight).
state	string	Two letter US state abbreviation.
state_id_fips	string	Two digit state FIPS code.
steam_plant_type_code	integer	Code that describes types of steam plants from EIA 860. See steam_plant_types_eia table for more details.
total_fgd_equipment_cost	number	Total actual installed costs for the existing flue gas desulfurization unit or the anticipated costs to bring a planned flue gas desulfurization unit into commercial operation.
utility_id_eia	integer	The EIA Utility Identification number.
utility_name_eia	string	The name of the utility.
water_source	string	Name of water source associated with the plant.

_core_eia923__monthly_cooling_system_information#

Noticed any irregularities with this table? File a report.

Monthly time series of cooling system information at generation facilities - mainly water volumes and temperatures.

Most-recent data:: 2024
Processing:: Data has been cleaned but not tidied/normalized. Published only temporarily and may be removed without notice.
Source:: EIA Form 923 – Power Plant Operations Report (Schedule 8D)
Primary key:: plant_id_eia, report_date, cooling_id_eia

Usage Warnings#

Some years use a slightly different data definition.

Additional Details#

In 2008 and 2009, EIA only reports annual averages, but in later years all data is monthly.

Note: This table has been cleaned, but not harvested with other EIA 923 or 860 data. The same variables present in this table may show up in other _core tables in other years. Once this table has been harvested, it will be removed from the PUDL database.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/_core_eia923__monthly_cooling_system_information.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/_core_eia923__monthly_cooling_system_information.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/_core_eia923__monthly_cooling_system_information.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/_core_eia923__monthly_cooling_system_information.parquet"
)

Columns#

Field Name	Type	Description
annual_average_consumption_rate_gallons_per_minute	number	Annual average consumption rate of cooling water
annual_average_discharge_rate_gallons_per_minute	number	Annual average discharge rate of cooling water
annual_average_withdrawal_rate_gallons_per_minute	number	Annual average withdrawal rate of cooling water
annual_maximum_intake_summer_temperature_fahrenheit	number	Maximum cooling water temperature at intake during the summer
annual_maximum_intake_winter_temperature_fahrenheit	number	Maximum cooling water temperature at intake in winter
annual_maximum_outlet_summer_temperature_fahrenheit	number	Maximum cooling water temperature at outlet in summer
annual_maximum_outlet_winter_temperature_fahrenheit	number	Maximum cooling water temperature at outlet in winter
annual_total_chlorine_lbs	number	Amount of elemental chlorine added to cooling water annually. May be just the amount of chlorine-containing compound if schedule 9 is filled out.
cooling_id_eia	string	EIA Identification code for cooling system (if multiple cooling systems are not distinguished by separate IDs, the word ‘PLANT’ is listed to encompass the cooling system for the entire plant)
cooling_status_code	string	Operating status of cooling system
cooling_type	string	Type of cooling system
data_maturity	string	Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.
flow_rate_method	string	Calculation method for flow rates (actual or method of estimation)
monthly_average_consumption_rate_gallons_per_minute	number	Monthly average consumption rate of cooling water
monthly_average_discharge_rate_gallons_per_minute	number	Monthly average discharge rate of cooling water
monthly_average_discharge_temperature_fahrenheit	number	Average cooling water temperature at discharge point
monthly_average_diversion_rate_gallons_per_minute	number	Monthly average diversion rate of cooling water
monthly_average_intake_temperature_fahrenheit	number	Average cooling water temperature at intake point
monthly_average_withdrawal_rate_gallons_per_minute	number	Monthly average withdrawal rate of cooling water
monthly_maximum_discharge_temperature_fahrenheit	number	Maximum cooling water temperature at discharge
monthly_maximum_intake_temperature_fahrenheit	number	Maximum cooling water temperature at intake
monthly_total_chlorine_lbs	number	Amount of elemental chlorine added to cooling water monthly. May be just the amount of chlorine-containing compound if schedule 9 is filled out.
monthly_total_consumption_volume_gallons	number	Monthly volume of water consumed at consumption point (accurate to 0.1 million gal)
monthly_total_cooling_hours_in_service	integer	Total hours the system operated during the month
monthly_total_discharge_volume_gallons	number	Monthly volume of water discharged at discharge point (accurate to 0.1 million gal)
monthly_total_diversion_volume_gallons	number	Monthly volume of water diverted at diversion point (accurate to 0.1 million gal)
monthly_total_withdrawal_volume_gallons	number	Monthly volume of water withdrawn at withdrawal point (accurate to 0.1 million gal)
plant_id_eia	integer	The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.
report_date	date	Date reported.
temperature_method	string	Method for measurement of temperatures

_core_eia923__yearly_byproduct_disposition#

Noticed any irregularities with this table? File a report.

Annual time series of by-products reported by thermoelectric power plants with total steam turbine capacity of 100 megawatts or greater.

Most-recent data:: 2024
Processing:: Data has been cleaned but not tidied/normalized. Published only temporarily and may be removed without notice.
Source:: EIA Form 923 – Power Plant Operations Report (Schedule 8A)
Primary key:: plant_id_eia, report_date, byproduct_description

Usage Warnings#

This table is experimental and/or a work in progress and may change in the future.

Additional Details#

All by-products other than steam sales are reported in thousand tons and are rounded to whole numbers. Steam sales are reported in MMBTu.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/_core_eia923__yearly_byproduct_disposition.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/_core_eia923__yearly_byproduct_disposition.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/_core_eia923__yearly_byproduct_disposition.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/_core_eia923__yearly_byproduct_disposition.parquet"
)

Columns#

Field Name	Type	Description
byproduct_description	string	Description of combustion by-product.
byproduct_units	string	Reported unit of measure for combustion byproduct. MMBTU for steam, tons for all other byproducts.
data_maturity	string	Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.
disposal_landfill_units	number	Disposed by-products in landfill, to the nearest hundred tons or in MMBTU for steam sales.
disposal_offsite_units	number	Disposed by-products offsite, to the nearest hundred tons or in MMBTU for steam sales.
disposal_ponds_units	number	Disposed by-products in ponds, to the nearest hundred tons or in MMBTU for steam sales.
no_byproducts_to_report	string	Whether any combustion by-products were produced by a plant. ‘Y’ indicates no byproducts to report. The ‘Y’ and ‘N’ values do not align with expected values of reported byproducts. This column is messy and requires standardization.
plant_id_eia	integer	The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.
report_date	date	Date reported.
sold_units	number	Sold by-products, in tons (to the nearest 100 tons) or, for Steam, MMBTU.
stored_offsite_units	number	Stored by-products offsite, to the nearest hundred tons or in MMBTU for steam sales.
stored_onsite_units	number	Stored by-products onsite, to the nearest hundred tons or in MMBTU for steam sales.
total_disposal_units	number	Total by-product disposal, to the nearest hundred tons or in MMBTU for steam sales.
used_offsite_units	number	Used offsite by-products, to the nearest hundred tons or in MMBTU for steam sales.
used_onsite_units	number	Used onsite by-products, to the nearest hundred tons or in MMBTU for steam sales.

_core_eia923__yearly_byproduct_expenses_and_revenues#

Noticed any irregularities with this table? File a report.

Annual time series of financial information related to combustion by-products reported by thermoelectric power plants with total steam turbine capacity of 100 megawatts or greater and that produced combustion by-products during the reporting year.

Most-recent data:: 2024
Processing:: Data has been cleaned but not tidied/normalized. Published only temporarily and may be removed without notice.
Source:: EIA Form 923 – Power Plant Operations Report (Schedule 8B)
Primary key:: plant_id_eia, report_date

Usage Warnings#

This table is experimental and/or a work in progress and may change in the future.

Additional Details#

Cost data must be entered for all entries on Schedule 8A. Financial information includes operational and maintenance expenditures (opex), capital expenditures (capex), and by-products sales revenue.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/_core_eia923__yearly_byproduct_expenses_and_revenues.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/_core_eia923__yearly_byproduct_expenses_and_revenues.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/_core_eia923__yearly_byproduct_expenses_and_revenues.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/_core_eia923__yearly_byproduct_expenses_and_revenues.parquet"
)

Columns#

Field Name	Type	Description
capex_air_abatement	number	Cost of new structures and/or equipment purchased to reduce, monitor, or eliminate airborne pollutants.
capex_other_abatement	number	Other amortizable expenses and purchases of new structures and or equipment when such purchases are not allocated to a particular unit or item.
capex_solid_waste	number	Cost of structures or equipment purchased to collect and dispose of objectionable solids or contained liquids.
capex_water_abatement	number	Cost of new structures and/or equipment purchased to reduce, monitor, or eliminate waterborne pollutants, including chlorine, phosphates, acids, bases, hydrocarbons, sewage, and other pollutants.
data_maturity	string	Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.
opex_bottom_ash_collection	number	Costs of materials and labor associated with the collection of bottom ash from all sources.
opex_bottom_ash_disposal	number	Costs of materials and labor associated with the disposal of bottom ash from all sources.
opex_bottom_ash_other	number	Other costs associated with the collection and disposal of bottom ash.
opex_fgd_byproduct_collection	number	Costs of materials and labor associated with the collection of sulfur by-product (flue gas desulfurization).
opex_fgd_byproduct_disposal	number	Costs of materials and labor associated with the disposal of sulfur by-product (flue gas desulfurization).
opex_fgd_byproduct_other	number	Other costs associated with the collection and disposal of sulfur by-product (flue gas desulfurization).
opex_fly_ash_collection	number	Costs of materials and labor associated with the collection of fly ash from all sources.
opex_fly_ash_disposal	number	Costs of materials and labor associated with the disposal of fly ash from all sources.
opex_fly_ash_other	number	Other costs associated with the collection and disposal of fly ash.
opex_other_abatement_collection	number	Abatement costs of by-product collection that are not allocated to a particular expenditure, e.g., costs of operating an environmental protection office.
opex_other_abatement_disposal	number	Abatement costs of by-product disposal that are not allocated to a particular expenditure.
opex_other_abatement_other	number	Other abatement costs that are not allocated to a particular expenditure.
opex_total_collection_abatement	number	Sum of abatement costs associated with by-product collection.
opex_total_disposal_abatement	number	Sum of abatement costs associated with by-product disposal.
opex_total_other_abatement	number	Sum of other abatement costs associated with the collection and disposal of byproducts.
opex_water_abatement_collection	number	Costs associated with the collection/abatement of water pollution, e.g., equipment operation and maintenance of pumps, pipes, and settling ponds.
opex_water_abatement_disposal	number	Costs associated with the disposal of water pollutants.
opex_water_abatement_other	number	Other abatement costs associated with water pollutants.
plant_id_eia	integer	The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.
report_date	date	Date reported.
revenues_bottom_ash	number	Revenue from the sale of bottom ash by-product.
revenues_fgd_byproducts	number	Revenue from the sale of flue gas desulfurization by-product.
revenues_fly_ash	number	Revenue from the sale of fly ash by-product.
revenues_fly_bottom_ash_intermingled	number	Revenue from the sale of intermingled fly and bottom ash by-product.
revenues_other_byproducts	number	Revenue from the sale of other by-products.
revenues_total_byproduct	number	Total revenue from the sale of by-products.

_core_eia923__yearly_emissions_control#

Noticed any irregularities with this table? File a report.

Annual time series of Actual rate and removal efficiency for air emissions reported by thermoelectric or combined-cycle power plants with a total steam turbine capacity greater than or equal to 10 megawatts.

Most-recent data:: 2024
Processing:: Data has been cleaned but not tidied/normalized. Published only temporarily and may be removed without notice.
Source:: EIA Form 923 – Power Plant Operations Report (Schedule 8C)
Primary key:: This table has no primary key. This table is not yet normalized, and contains information about several different types of emissions control equipment, each of which has its own ID, and not all of which are present at every plant. As a result there is currently no unique, non-null primary key. Conceptually, the primary key should consist of plant_id_eia, report_date, mercury_control_id_eia, nox_control_id_eia, so2_control_id_eia, and particulate_control_id_eia. In practice many records have NULL values for all of the control equipment IDs.

Usage Warnings#

This table is experimental and/or a work in progress and may change in the future.

Additional Details#

In the raw data, the so2_test_date and particulate_test_date columns contained a wide variety of non-standard date formats. They have been standardized to ISO-8601 (YYYY-MM-DD) where possible. Missing days and months have been filled in with 01. A handful of manual spot-fixes have been applied to particularly strange formats. Two-digit years have been assumed to fall between 1950 and the last year observed in the report_date column.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/_core_eia923__yearly_emissions_control.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/_core_eia923__yearly_emissions_control.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/_core_eia923__yearly_emissions_control.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/_core_eia923__yearly_emissions_control.parquet"
)

Columns#

Field Name	Type	Description
acid_gas_removal_efficiency	number	Removal efficiency for acid gas emissions. Ranges from 0 to 1.
annual_nox_emission_rate_lb_per_mmbtu	number	Actual controlled (or uncontrolled) nitrogen oxides emission rate. Based on data from CEMS where possible.
data_maturity	string	Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.
environmental_equipment_name	string	Type of equipment or strategy for the control of air emissions.
fgd_electricity_consumption_mwh	number	Electric power consumed by the flue gas desulfurization unit (in MWh).
fgd_sorbent_consumption_tons	number	Quantity of flue gas desulfurization sorbent used, to the nearest 100 ton.
hours_in_service	integer	Total hours the emissions control was in service during the reporting year, rounded to the nearest hour.
mercury_control_id_eia	string	Mercury control identification number. This ID is not a unique identifier.
mercury_emission_rate_lb_per_trillion_btu	number	Actual controlled (or uncontrolled) mercury emission rate, based on data from CEMS, where possible.
mercury_removal_efficiency	number	Removal efficiency for mercury emissions. Ranges from 0 to 1.
nox_control_id_eia	string	Nitrogen oxide control identification number. This ID is not a unique identifier.
operational_status	string	The operating status of the asset. For generators this is based on which tab the generator was listed in in EIA 860.
ozone_season_nox_emission_rate_lb_per_mmbtu	number	Actual controlled (or uncontrolled) nitrogen oxides emission rate during the ozone season (May to September)
particulate_control_id_eia	string	Particulate matter control identification number. This ID is not a unique identifier.
particulate_emission_rate_lb_per_mmbtu	number	Average annual emission removal rate for particulate matter.
particulate_removal_efficiency_annual	number	Particulate removal efficiency, based on the annual operating factor, which is defined as annual fuel consumption (MMBTU) divided by the product of the boiler design firing rate (MMBTU per hour) and hours of operation per year.When actual data are not available, estimates are provided based on equipment design performance specifications. Ranges from 0 to 1.
particulate_removal_efficiency_tested	number	The tested efficiency for the removal of particulate matter at 100 percent load. If not tested at 100 percent load, then the load at which the test was conducted is included as a comment on Schedule 9. If no test was conducted, the test date and tested efficiency field should be blank. Ranges from 0 to 1.
particulate_test_date	date	Date of the latest efficiency test for the removal of particulate matter.
plant_id_eia	integer	The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.
report_date	date	Date reported.
so2_control_id_eia	string	Sulfur dioxide control identification number. This ID is not a unique identifier.
so2_removal_efficiency_annual	number	Sulfur dioxide removal efficiency, based on the annual operating factor, which is defined as annual fuel consumption (MMBTU) divided by the product of the boiler design firing rate (MMBTU per hour) and hours of operation per year.When actual data are not available, estimates are provided based on equipment design performance specifications. Ranges from 0 to 1.
so2_removal_efficiency_tested	number	The tested efficiency for the removal of sulfur dioxide at 100 percent load. If not tested at 100 percent load, then the load at which the test was conducted is included as a comment on Schedule 9. If no test was conducted, the test date and tested efficiency field should be blank. Ranges from 0 to 1.
so2_test_date	date	Date of most recent test for sulfur dioxide removal efficiency.

_core_eia923__yearly_fgd_operation_maintenance#

Noticed any irregularities with this table? File a report.

Annual time series of flue gas desulfurization (FGD) operation and maintenance expenses at generation facilities.

Most-recent data:: 2024
Processing:: Data has been cleaned but not tidied/normalized. Published only temporarily and may be removed without notice.
Source:: EIA Form 923 – Power Plant Operations Report (Schedule 8C)
Primary key:: plant_id_eia, report_date, so2_control_id_eia

Usage Warnings#

Some years use a slightly different data definition.

Additional Details#

From 2008-2011 this table also reported operational characteristics that are now reported in 923 Air Emissions Control Info.

Note: This table has been cleaned, but not harvested with other EIA 923 or 860 data. The same variables present in this table may show up in other _core tables in other years. Once this table has been harvested, it will be removed from the PUDL database.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/_core_eia923__yearly_fgd_operation_maintenance.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/_core_eia923__yearly_fgd_operation_maintenance.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/_core_eia923__yearly_fgd_operation_maintenance.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/_core_eia923__yearly_fgd_operation_maintenance.parquet"
)

Columns#

Field Name	Type	Description
data_maturity	string	Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.
fgd_control_flag	boolean	Indicates whether or not a plant has a flue gas desulfurization control unit.
fgd_electricity_consumption_mwh	number	Electric power consumed by the flue gas desulfurization unit (in MWh).
fgd_hours_in_service	integer	Number of hours the flue gas desulfurization equipment was in operation during the year.
fgd_operational_status_code	string	Operating status code for flue gas desulfurization equipment.
fgd_sorbent_consumption_tons	number	Quantity of flue gas desulfurization sorbent used, to the nearest 100 ton.
opex_fgd_feed_materials_chemical	integer	Annual operation and maintenance expenditures for feed materials and chemicals for flue gas desulfurization equipment, excluding electricity.
opex_fgd_labor_supervision	integer	Annual operation and maintenance expenditures for labor and supervision of flue gas desulfurization equipment, excluding electricity.
opex_fgd_land_acquisition	integer	Annual operation and maintenance expenditures for land acquisition for flue gas desulfurization equipment, excluding electricity.
opex_fgd_maintenance_material_other	integer	Annual operation and maintenance expenditures for maintenance, materials and all other costs of flue gas desulfurization equipment, excluding electricity
opex_fgd_total_cost	integer	Annual total cost of operation and maintenance expenditures on flue gas desulfurization equipment, excluding electricity
opex_fgd_waste_disposal	integer	Annual operation and maintenance expenditures for waste disposal, excluding electricity.
plant_id_eia	integer	The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.
report_date	date	Date reported.
so2_control_id_eia	string	Sulfur dioxide control identification number. This ID is not a unique identifier.
so2_removal_efficiency_annual	number	Sulfur dioxide removal efficiency, based on the annual operating factor, which is defined as annual fuel consumption (MMBTU) divided by the product of the boiler design firing rate (MMBTU per hour) and hours of operation per year.When actual data are not available, estimates are provided based on equipment design performance specifications. Ranges from 0 to 1.
so2_removal_efficiency_tested	number	The tested efficiency for the removal of sulfur dioxide at 100 percent load. If not tested at 100 percent load, then the load at which the test was conducted is included as a comment on Schedule 9. If no test was conducted, the test date and tested efficiency field should be blank. Ranges from 0 to 1.
so2_test_date	date	Date of most recent test for sulfur dioxide removal efficiency.

_core_phmsagas__yearly_distribution_by_install_decade#

Noticed any irregularities with this table? File a report.

Annual time series of miles of mains and the number of services in operation at the end of the year by install decade.

Most-recent data:: 2024
Processing:: Data has been cleaned but not tidied/normalized. Published only temporarily and may be removed without notice.
Source:: Pipelines and Hazardous Materials Safety Administration (PHMSA) Annual Natural Gas Report (Part B - System Description / Section 4)
Primary key:: report_id, report_date, operator_id_phmsa, operating_state, install_decade

Usage Warnings#

This table has been concatenated across all years and re-organized into a logical structure, but the data has not been fully cleaned. Except some inconsistent units, data types and values over the years of reported data. Once fully cleaned, this table will be deprecated and replaced with a core table.
Some columns contain subtotals; use caution when choosing columns to aggregate.

Additional Details#

The records with an install decade of total_decade are a total - beware of aggregating these values.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/_core_phmsagas__yearly_distribution_by_install_decade.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/_core_phmsagas__yearly_distribution_by_install_decade.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/_core_phmsagas__yearly_distribution_by_install_decade.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/_core_phmsagas__yearly_distribution_by_install_decade.parquet"
)

Columns#

Field Name	Type	Description
commodity	string	The type of gas delivered by the distribution pipeline.
install_decade	string	The decade the distribution pipeline was installed.
mains_miles	number	The miles of mains distribution pipeline.
operating_state	string	State that the distribution utility is reporting for. Prior to 2004, this may be a list of states.
operator_id_phmsa	integer	PHMSA unique operator ID. A value of zero represents an unknown operator ID.
report_date	date	Date reported.
report_id	integer	Report number of the PHMSA Gas utility submission.
services	number	Number of services in system at end of year.

_core_phmsagas__yearly_distribution_by_material#

Noticed any irregularities with this table? File a report.

Annual time series of miles of mains and the number of services in operation at the end of the year by material for each gas distribution operator.

Most-recent data:: 2024
Processing:: Data has been cleaned but not tidied/normalized. Published only temporarily and may be removed without notice.
Source:: Pipelines and Hazardous Materials Safety Administration (PHMSA) Annual Natural Gas Report (Part B - System Description / Section 1 - General)
Primary key:: This table has no primary key. We expect the primary key for this table should be report_id, operator_id_phmsa, operating_state and material. However, there are nulls in the operating_state across several years of reporting.

Usage Warnings#

This table has been concatenated across all years and re-organized into a logical structure, but the data has not been fully cleaned. Except some inconsistent units, data types and values over the years of reported data. Once fully cleaned, this table will be deprecated and replaced with a core table.
Some columns contain subtotals; use caution when choosing columns to aggregate.
The categories of material types have changed slightly over the years (ex: cast and wrought iron were broken up in two categories before 1984).
Beginning in 2004, companies file one report per state. The operating_state column has not been normalized and may contain more than one state in earlier years of data.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/_core_phmsagas__yearly_distribution_by_material.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/_core_phmsagas__yearly_distribution_by_material.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/_core_phmsagas__yearly_distribution_by_material.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/_core_phmsagas__yearly_distribution_by_material.parquet"
)

Columns#

Field Name	Type	Description
commodity	string	The type of gas delivered by the distribution pipeline.
mains_miles	number	The miles of mains distribution pipeline.
material	string	The material of the gas distribution pipe. The categories of material types have changed slightly over the years (ex: cast and wrought iron were broken up in two categories before 1984).
operating_state	string	State that the distribution utility is reporting for. Prior to 2004, this may be a list of states.
operator_id_phmsa	integer	PHMSA unique operator ID. A value of zero represents an unknown operator ID.
report_date	date	Date reported.
report_id	integer	Report number of the PHMSA Gas utility submission.
services	number	Number of services in system at end of year.

_core_phmsagas__yearly_distribution_by_material_and_size#

Noticed any irregularities with this table? File a report.

Annual time series of miles of mains and the number of services in operation at the end of the year by material and size of pipe.

Most-recent data:: 2024
Processing:: Data has been cleaned but not tidied/normalized. Published only temporarily and may be removed without notice.
Source:: Pipelines and Hazardous Materials Safety Administration (PHMSA) Annual Natural Gas Report (Part B - System Description / Section 3)
Primary key:: This table has no primary key. We expect the primary key for this table should be report_id, operator_id_phmsa, operating_state, main_size and material. There are nulls in the operating_state across several years of reporting.

Usage Warnings#

This table has been concatenated across all years and re-organized into a logical structure, but the data has not been fully cleaned. Except some inconsistent units, data types and values over the years of reported data. Once fully cleaned, this table will be deprecated and replaced with a core table.
Some columns contain subtotals; use caution when choosing columns to aggregate.
The size ranges in main_size have changed slightly over the years (ex: before 1984 they reported 0.5_in_or_less whereas after they reported 1_in_or_less)
The categories of material types have changed slightly over the years (ex: cast and wrought iron were broken up in two categories before 1984).
Beginning in 2004, companies file one report per state. The operating_state column has not been normalized and may contain more than one state in earlier years of data.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/_core_phmsagas__yearly_distribution_by_material_and_size.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/_core_phmsagas__yearly_distribution_by_material_and_size.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/_core_phmsagas__yearly_distribution_by_material_and_size.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/_core_phmsagas__yearly_distribution_by_material_and_size.parquet"
)

Columns#

Field Name	Type	Description
commodity	string	The type of gas delivered by the distribution pipeline.
main_other_material_detail	string	A free-form text field containing notes about the other material type. This column should only contain values in it for rows with other as the material type listed.
main_size	string	Size range of mains. The size ranges have changed slightly over the years (ex: before 1984 they reported 0.5_in_or_less whereas after they reported 1_in_or_less).
mains_miles	number	The miles of mains distribution pipeline.
material	string	The material of the gas distribution pipe. The categories of material types have changed slightly over the years (ex: cast and wrought iron were broken up in two categories before 1984).
operating_state	string	State that the distribution utility is reporting for. Prior to 2004, this may be a list of states.
operator_id_phmsa	integer	PHMSA unique operator ID. A value of zero represents an unknown operator ID.
report_date	date	Date reported.
report_id	integer	Report number of the PHMSA Gas utility submission.
services	number	Number of services in system at end of year.

_core_phmsagas__yearly_distribution_excavation_damages#

Noticed any irregularities with this table? File a report.

Annual time series of excavation damages from various sources.

Most-recent data:: 2024
Processing:: Data has been cleaned but not tidied/normalized. Published only temporarily and may be removed without notice.
Source:: Pipelines and Hazardous Materials Safety Administration (PHMSA) Annual Natural Gas Report (Part D - Excavation Damage)
Primary key:: report_id, damage_type, damage_sub_type

Usage Warnings#

This table has been concatenated across all years and re-organized into a logical structure, but the data has not been fully cleaned. Except some inconsistent units, data types and values over the years of reported data. Once fully cleaned, this table will be deprecated and replaced with a core table.
Some columns contain subtotals; use caution when choosing columns to aggregate.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/_core_phmsagas__yearly_distribution_excavation_damages.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/_core_phmsagas__yearly_distribution_excavation_damages.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/_core_phmsagas__yearly_distribution_excavation_damages.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/_core_phmsagas__yearly_distribution_excavation_damages.parquet"
)

Columns#

Field Name	Type	Description
commodity	string	The type of gas delivered by the distribution pipeline.
damage_sub_type	string	A sub-category of damage_type of excavation damage causes.
damage_type	string	A high level category of excavation damage causes.
damages	number	Number of instances of excavation damage.
operating_state	string	State that the distribution utility is reporting for. Prior to 2004, this may be a list of states.
operator_id_phmsa	integer	PHMSA unique operator ID. A value of zero represents an unknown operator ID.
report_date	date	Date reported.
report_id	integer	Report number of the PHMSA Gas utility submission.

_core_phmsagas__yearly_distribution_filings#

Noticed any irregularities with this table? File a report.

Annual time series of filings (aka submissions) from gas distribution system operators.

Most-recent data:: 2024
Processing:: Data has been cleaned but not tidied/normalized. Published only temporarily and may be removed without notice.
Source:: Pipelines and Hazardous Materials Safety Administration (PHMSA) Annual Natural Gas Report
Primary key:: report_id, report_date, operator_id_phmsa

Usage Warnings#

This table has been concatenated across all years and re-organized into a logical structure, but the data has not been fully cleaned. Except some inconsistent units, data types and values over the years of reported data. Once fully cleaned, this table will be deprecated and replaced with a core table.

Additional Details#

This table contains information about the filer and filing type. This includes information about who filed but also whether this was an original filing or a correction.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/_core_phmsagas__yearly_distribution_filings.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/_core_phmsagas__yearly_distribution_filings.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/_core_phmsagas__yearly_distribution_filings.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/_core_phmsagas__yearly_distribution_filings.parquet"
)

Columns#

Field Name	Type	Description
data_date	date	When the data source was last updated.
filing_correction_date	date	Date when a correction filing was submitted.
filing_date	date	Date on which the filing was submitted.
form_revision_id	string	PHMSA form revision identifier.
initial_filing_date	date	Initial date when filing was originally submitted.
operator_id_phmsa	integer	PHMSA unique operator ID. A value of zero represents an unknown operator ID.
preparer_email	string	Email address of representative who filed report.
preparer_fax	string	Fax number of representative who filed report.
preparer_name	string	Name of representative who filed report.
preparer_phone	string	Phone number of representative who filed report.
preparer_title	string	Title of representative who filed report.
report_date	date	Date reported.
report_filing_type	string	Type of report submitted, either Initial or Supplemental.
report_id	integer	Report number of the PHMSA Gas utility submission.

_core_phmsagas__yearly_distribution_leaks#

Noticed any irregularities with this table? File a report.

Annual time series of total and hazardous leaks eliminated or repaired during the report year.

Most-recent data:: 2024
Processing:: Data has been cleaned but not tidied/normalized. Published only temporarily and may be removed without notice.
Source:: Pipelines and Hazardous Materials Safety Administration (PHMSA) Annual Natural Gas Report (Part C)
Primary key:: This table has no primary key. We expect the primary key for this table should be report_id, operator_id_phmsa, operating_state, leak_severity and leak_source. There are nulls in the operating_state across several years of reporting.

Usage Warnings#

This table has been concatenated across all years and re-organized into a logical structure, but the data has not been fully cleaned. Except some inconsistent units, data types and values over the years of reported data. Once fully cleaned, this table will be deprecated and replaced with a core table.
Some columns contain subtotals; use caution when choosing columns to aggregate.
Beginning in 2004, companies file one report per state. The operating_state column has not been normalized and may contain more than one state in earlier years of data.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/_core_phmsagas__yearly_distribution_leaks.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/_core_phmsagas__yearly_distribution_leaks.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/_core_phmsagas__yearly_distribution_leaks.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/_core_phmsagas__yearly_distribution_leaks.parquet"
)

Columns#

Field Name	Type	Description
commodity	string	The type of gas delivered by the distribution pipeline.
leak_severity	string	Whether or not the leak described in this record are all leaks or hazardous leaks.
leak_source	string	The cause of the leaks.
mains	number	The number of mains distribution pipeline.
operating_state	string	State that the distribution utility is reporting for. Prior to 2004, this may be a list of states.
operator_id_phmsa	integer	PHMSA unique operator ID. A value of zero represents an unknown operator ID.
report_date	date	Date reported.
report_id	integer	Report number of the PHMSA Gas utility submission.
services	number	Number of services in system at end of year.

_core_phmsagas__yearly_distribution_misc#

Noticed any irregularities with this table? File a report.

Annual time series of miscellaneous distribution information.

Most-recent data:: 2024
Processing:: Data has been cleaned but not tidied/normalized. Published only temporarily and may be removed without notice.
Source:: Pipelines and Hazardous Materials Safety Administration (PHMSA) Annual Natural Gas Report (Part B & C)
Primary key:: This table has no primary key. We expect the primary key for this table should be report_id, operator_id_phmsa, and operating_state. There are nulls in the operating_state across several years of reporting.

Usage Warnings#

This table has been concatenated across all years and re-organized into a logical structure, but the data has not been fully cleaned. Except some inconsistent units, data types and values over the years of reported data. Once fully cleaned, this table will be deprecated and replaced with a core table.
Beginning in 2004, companies file one report per state. The operating_state column has not been normalized and may contain more than one state in earlier years of data.

Access methods#

pandas

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/_core_phmsagas__yearly_distribution_misc.parquet",
      dtype_backend="pyarrow",
)

polars

import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/_core_phmsagas__yearly_distribution_misc.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)

SQL (DuckDB)

SELECT * FROM 's3://pudl.catalyst.coop/nightly/_core_phmsagas__yearly_distribution_misc.parquet';

R

library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/_core_phmsagas__yearly_distribution_misc.parquet"
)

Columns#

Field Name	Type	Description
all_known_leaks_scheduled_for_repair	number	The number of known system leaks at the end of the report year scheduled for repair.
all_known_leaks_scheduled_for_repair_main	number	The number of known leaks on main at the end of the report year scheduled for repair.
average_service_length_feet	number	The average system service length in feet.
excavation_tickets	integer	Number of Excavation Tickets received by the operator during the year, (i.e., receipt of information by the operator from the notification center).
federal_land_leaks_repaired_or_scheduled	integer	Total number of leaks repaired, eliminated, or scheduled for repair on federal land during the reporting year.
hazardous_leaks_mechanical_joint_failure	number	The total number of hazardous leaks caused by a mechanical joint failure.
operating_state	string	State that the distribution utility is reporting for. Prior to 2004, this may be a list of states.
operator_id_phmsa	integer	PHMSA unique operator ID. A value of zero represents an unknown operator ID.
report_date	date	Date reported.
report_id	integer	Report number of the PHMSA Gas utility submission.
services_efv_in_system	integer	Estimated number of services with Excess Flow Valve in the system at end of reported year related to natural gas distribution.
services_efv_installed	integer	Total number of services with Excess Flow Valve installed during reported year related to natural gas distribution.
services_shutoff_valve_in_system	integer	Estimated number of services with manual service line shut-off valves installed in the system at end of report year related to natural gas distribution.
services_shutoff_valve_installed	integer	Total number of manual service line shut-off valves installed during reported year related to natural gas distribution.
unaccounted_for_gas_fraction	number	Unaccounted for gas as a fraction of total consumption for the 12 months ending June 30 of the reporting year. Calculated as follows: Take the sum of: (purchased gas + produced gas) minus (customer use + company use + appropriate adjustments). Then divide by the sum of (customer use + company use + appropriate adjustments). Prior to 2017, this field was calculated with a different deonominator (purchased gas + produced gas). The time period between 2010-2017 having this different calculation method ensured that there was no records that had a negative fraction. For all the other reporting years there are known and expected negative values in this column.