PUDL Data Dictionary#
The following data tables have been cleaned and transformed by our ETL process.
out_censusdp1tract__counties#
Noticed any irregularities with this table? File a report.
County-level summarized US Census demographic information, along with a geometry describing county boundaries.
- Most-recent data:
2010
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
Census DP1 – Profile of General Demographic Characteristics
- Primary key:
county_id_fips
Usage Warnings#
The original data is no longer being collected or reported in this way.
Additional Details#
Based on the 2010 Census. Used as a convenient source of geospatial information within PUDL, but should not be treated as authoritative.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_censusdp1tract__counties.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_censusdp1tract__counties.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_censusdp1tract__counties.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_censusdp1tract__counties.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
county |
string |
County name. |
county_id_fips |
string |
County ID from the Federal Information Processing Standard Publication 6-4. |
dp0010001 |
integer |
DPSF1. Sex and age - Total population |
dp0010002 |
integer |
DPSF1. Sex and age - Total population under 5 years |
dp0010003 |
integer |
DPSF1. Sex and age - Total population 5 to 9 years |
dp0010004 |
integer |
DPSF1. Sex and age - Total population 10 to 14 years |
dp0010005 |
integer |
DPSF1. Sex and age - Total population 15 to 19 years |
dp0010006 |
integer |
DPSF1. Sex and age - Total population 20 to 24 years |
dp0010007 |
integer |
DPSF1. Sex and age - Total population 25 to 29 years |
dp0010008 |
integer |
DPSF1. Sex and age - Total population 30 to 34 years |
dp0010009 |
integer |
DPSF1. Sex and age - Total population 35 to 39 years |
dp0010010 |
integer |
DPSF1. Sex and age - Total population 40 to 44 years |
dp0010011 |
integer |
DPSF1. Sex and age - Total population 45 to 49 years |
dp0010012 |
integer |
DPSF1. Sex and age - Total population 50 to 54 years |
dp0010013 |
integer |
DPSF1. Sex and age - Total population 55 to 59 years |
dp0010014 |
integer |
DPSF1. Sex and age - Total population 60 to 64 years |
dp0010015 |
integer |
DPSF1. Sex and age - Total population 65 to 69 years |
dp0010016 |
integer |
DPSF1. Sex and age - Total population 70 to 74 years |
dp0010017 |
integer |
DPSF1. Sex and age - Total population 75 to 79 years |
dp0010018 |
integer |
DPSF1. Sex and age - Total population 80 to 84 years |
dp0010019 |
integer |
DPSF1. Sex and age - Total population 85 years and over |
dp0010020 |
integer |
DPSF1. Sex and age - Male population total |
dp0010021 |
integer |
DPSF1. Sex and age - Male population under 5 years |
dp0010022 |
integer |
DPSF1. Sex and age - Male population 5 to 9 years |
dp0010023 |
integer |
DPSF1. Sex and age - Male population 10 to 14 years |
dp0010024 |
integer |
DPSF1. Sex and age - Male population 15 to 19 years |
dp0010025 |
integer |
DPSF1. Sex and age - Male population 20 to 24 years |
dp0010026 |
integer |
DPSF1. Sex and age - Male population 25 to 29 years |
dp0010027 |
integer |
DPSF1. Sex and age - Male population 30 to 34 years |
dp0010028 |
integer |
DPSF1. Sex and age - Male population 35 to 39 years |
dp0010029 |
integer |
DPSF1. Sex and age - Male population 40 to 44 years |
dp0010030 |
integer |
DPSF1. Sex and age - Male population 45 to 49 years |
dp0010031 |
integer |
DPSF1. Sex and age - Male population 50 to 54 years |
dp0010032 |
integer |
DPSF1. Sex and age - Male population 55 to 59 years |
dp0010033 |
integer |
DPSF1. Sex and age - Male population 60 to 64 years |
dp0010034 |
integer |
DPSF1. Sex and age - Male population 65 to 69 years |
dp0010035 |
integer |
DPSF1. Sex and age - Male population 70 to 74 years |
dp0010036 |
integer |
DPSF1. Sex and age - Male population 75 to 79 years |
dp0010037 |
integer |
DPSF1. Sex and age - Male population 80 to 84 years |
dp0010038 |
integer |
DPSF1. Sex and age - Male population 85 years and over |
dp0010039 |
integer |
DPSF1. Sex and age - Female population total |
dp0010040 |
integer |
DPSF1. Sex and age - Female population under 5 years |
dp0010041 |
integer |
DPSF1. Sex and age - Female population 5 to 9 years |
dp0010042 |
integer |
DPSF1. Sex and age - Female population 10 to 14 years |
dp0010043 |
integer |
DPSF1. Sex and age - Female population 15 to 19 years |
dp0010044 |
integer |
DPSF1. Sex and age - Female population 20 to 24 years |
dp0010045 |
integer |
DPSF1. Sex and age - Female population 25 to 29 years |
dp0010046 |
integer |
DPSF1. Sex and age - Female population 30 to 34 years |
dp0010047 |
integer |
DPSF1. Sex and age - Female population 35 to 39 years |
dp0010048 |
integer |
DPSF1. Sex and age - Female population 40 to 44 years |
dp0010049 |
integer |
DPSF1. Sex and age - Female population 45 to 49 years |
dp0010050 |
integer |
DPSF1. Sex and age - Female population 50 to 54 years |
dp0010051 |
integer |
DPSF1. Sex and age - Female population 55 to 59 years |
dp0010052 |
integer |
DPSF1. Sex and age - Female population 60 to 64 years |
dp0010053 |
integer |
DPSF1. Sex and age - Female population 65 to 69 years |
dp0010054 |
integer |
DPSF1. Sex and age - Female population 70 to 74 years |
dp0010055 |
integer |
DPSF1. Sex and age - Female population 75 to 79 years |
dp0010056 |
integer |
DPSF1. Sex and age - Female population 80 to 84 years |
dp0010057 |
integer |
DPSF1. Sex and age - Female population 85 years and over |
dp0020001 |
number |
DPSF2. Median age by sex - Both sexes |
dp0020002 |
number |
DPSF2. Median age by sex - Male |
dp0020003 |
number |
DPSF2. Median age by sex - Female |
dp0030001 |
integer |
DPSF3. Sex for population 16 years and over - Total |
dp0030002 |
integer |
DPSF3. Sex for population 16 years and over - Male |
dp0030003 |
integer |
DPSF3. Sex for population 16 years and over - Female |
dp0040001 |
integer |
DPSF4. Sex for population 18 years and over - Total |
dp0040002 |
integer |
DPSF4. Sex for population 18 years and over - Male |
dp0040003 |
integer |
DPSF4. Sex for population 18 years and over - Female |
dp0050001 |
integer |
DPSF5. Sex for population 21 years and over - Total |
dp0050002 |
integer |
DPSF5. Sex for population 21 years and over - Male |
dp0050003 |
integer |
DPSF5. Sex for population 21 years and over - Female |
dp0060001 |
integer |
DPSF6. Sex for population 62 years and over - Total |
dp0060002 |
integer |
DPSF6. Sex for population 62 years and over - Male |
dp0060003 |
integer |
DPSF6. Sex for population 62 years and over - Female |
dp0070001 |
integer |
DPSF7. Sex for population 65 years and over - Total |
dp0070002 |
integer |
DPSF7. Sex for population 65 years and over - Male |
dp0070003 |
integer |
DPSF7. Sex for population 65 years and over - Female |
dp0080001 |
integer |
DPSF8. Race - Total population |
dp0080002 |
integer |
DPSF8. Race - Population of one race |
dp0080003 |
integer |
DPSF8. Race - White |
dp0080004 |
integer |
DPSF8. Race - Black or African American |
dp0080005 |
integer |
DPSF8. Race - American Indian and Alaska Native |
dp0080006 |
integer |
DPSF8. Race - Asian total |
dp0080007 |
integer |
DPSF8. Race - Asian Indian |
dp0080008 |
integer |
DPSF8. Race - Chinese |
dp0080009 |
integer |
DPSF8. Race - Filipino |
dp0080010 |
integer |
DPSF8. Race - Japanese |
dp0080011 |
integer |
DPSF8. Race - Korean |
dp0080012 |
integer |
DPSF8. Race - Vietnamese |
dp0080013 |
integer |
DPSF8. Race - Other Asian |
dp0080014 |
integer |
DPSF8. Race - Native Hawaiian and Other Pacific Islander total |
dp0080015 |
integer |
DPSF8. Race - Native Hawaiian |
dp0080016 |
integer |
DPSF8. Race - Guamanian or Chamorro |
dp0080017 |
integer |
DPSF8. Race - Samoan |
dp0080018 |
integer |
DPSF8. Race - Other Pacific Islander |
dp0080019 |
integer |
DPSF8. Race - Some Other Race |
dp0080020 |
integer |
DPSF8. Race - Population of Two or More Races |
dp0080021 |
integer |
DPSF8. Race - White; American Indian and Alaska Native |
dp0080022 |
integer |
DPSF8. Race - White; Asian |
dp0080023 |
integer |
DPSF8. Race - White; Black or African American |
dp0080024 |
integer |
DPSF8. Race - White; Some Other Race |
dp0090001 |
integer |
DPSF9. Race (total races tallied) - White alone or in combination with one or more other races |
dp0090002 |
integer |
DPSF9. Race (total races tallied) - Black or African American alone or in combination with one or more other races |
dp0090003 |
integer |
DPSF9. Race (total races tallied) - American Indian and Alaska Native alone or in combination with one or more other races |
dp0090004 |
integer |
DPSF9. Race (total races tallied) - Asian alone or in combination with one or more other races |
dp0090005 |
integer |
DPSF9. Race (total races tallied) - Native Hawaiian and Other Pacific Islander alone or in combination with one or more other races |
dp0090006 |
integer |
DPSF9. Race (total races tallied) - Some Other Race alone or in combination with one or more other races |
dp0100001 |
integer |
DPSF10. Hispanic or Latino by specific origin - Total population |
dp0100002 |
integer |
DPSF10. Hispanic or Latino by specific origin - Hispanic or Latino (of any race) |
dp0100003 |
integer |
DPSF10. Hispanic or Latino by specific origin - Mexican |
dp0100004 |
integer |
DPSF10. Hispanic or Latino by specific origin - Puerto Rican |
dp0100005 |
integer |
DPSF10. Hispanic or Latino by specific origin - Cuban |
dp0100006 |
integer |
DPSF10. Hispanic or Latino by specific origin - Other Hispanic or Latino |
dp0100007 |
integer |
DPSF10. Hispanic or Latino by specific origin - Not Hispanic or Latino |
dp0110001 |
integer |
DPSF11. Hispanic or Latino and race - Total population |
dp0110002 |
integer |
DPSF11. Hispanic or Latino and race - Hispanic or Latino total |
dp0110003 |
integer |
DPSF11. Hispanic or Latino and race - Hispanic or Latino: White alone |
dp0110004 |
integer |
DPSF11. Hispanic or Latino and race - Hispanic or Latino: Black or African American alone |
dp0110005 |
integer |
DPSF11. Hispanic or Latino and race - Hispanic or Latino: American Indian and Alaska Native alone |
dp0110006 |
integer |
DPSF11. Hispanic or Latino and race - Hispanic or Latino: Asian alone |
dp0110007 |
integer |
DPSF11. Hispanic or Latino and race - Hispanic or Latino: Native Hawaiian and Other Pacific Islander alone |
dp0110008 |
integer |
DPSF11. Hispanic or Latino and race - Hispanic or Latino: Some Other Race alone |
dp0110009 |
integer |
DPSF11. Hispanic or Latino and race - Hispanic or Latino: Two or More Races |
dp0110010 |
integer |
DPSF11. Hispanic or Latino and race - Not Hispanic or Latino total |
dp0110011 |
integer |
DPSF11. Hispanic or Latino and race - Not Hispanic or Latino: White alone |
dp0110012 |
integer |
DPSF11. Hispanic or Latino and race - Not Hispanic or Latino: Black or African American alone |
dp0110013 |
integer |
DPSF11. Hispanic or Latino and race - Not Hispanic or Latino: American Indian and Alaska Native alone |
dp0110014 |
integer |
DPSF11. Hispanic or Latino and race - Not Hispanic or Latino: Asian alone |
dp0110015 |
integer |
DPSF11. Hispanic or Latino and race - Not Hispanic or Latino: Native Hawaiian and Other Pacific Islander alone |
dp0110016 |
integer |
DPSF11. Hispanic or Latino and race - Not Hispanic or Latino: Some Other Race alone |
dp0110017 |
integer |
DPSF11. Hispanic or Latino and race - Not Hispanic or Latino: Two or More Races |
dp0120001 |
integer |
DPSF12. Relationship - Total population |
dp0120002 |
integer |
DPSF12. Relationship - In households |
dp0120003 |
integer |
DPSF12. Relationship - Householder |
dp0120004 |
integer |
DPSF12. Relationship - Spouse |
dp0120005 |
integer |
DPSF12. Relationship - Child |
dp0120006 |
integer |
DPSF12. Relationship - Own child under 18 years |
dp0120007 |
integer |
DPSF12. Relationship - Other relatives |
dp0120008 |
integer |
DPSF12. Relationship - Other relatives under 18 years |
dp0120009 |
integer |
DPSF12. Relationship - Other relatives 65 years and over |
dp0120010 |
integer |
DPSF12. Relationship - Nonrelatives |
dp0120011 |
integer |
DPSF12. Relationship - Nonrelatives under 18 years |
dp0120012 |
integer |
DPSF12. Relationship - Nonrelatives 65 years and over |
dp0120013 |
integer |
DPSF12. Relationship - Unmarried partner |
dp0120014 |
integer |
DPSF12. Relationship - In group quarters |
dp0120015 |
integer |
DPSF12. Relationship - Institutionalized population |
dp0120016 |
integer |
DPSF12. Relationship - Institutionalized population: Male |
dp0120017 |
integer |
DPSF12. Relationship - Institutionalized population: Female |
dp0120018 |
integer |
DPSF12. Relationship - Noninstitutionalized population |
dp0120019 |
integer |
DPSF12. Relationship - Noninstitutionalized population: Male |
dp0120020 |
integer |
DPSF12. Relationship - Noninstitutionalized population: Female |
dp0130001 |
integer |
DPSF13. Households by type - Total households |
dp0130002 |
integer |
DPSF13. Households by type - Family households (families) |
dp0130003 |
integer |
DPSF13. Households by type - Family households with own children under 18 years |
dp0130004 |
integer |
DPSF13. Households by type - Husband-wife family |
dp0130005 |
integer |
DPSF13. Households by type - Husband-wife family with own children under 18 years |
dp0130006 |
integer |
DPSF13. Households by type - Male householder, no wife present |
dp0130007 |
integer |
DPSF13. Households by type - Male householder, no wife present, with own children under 18 years |
dp0130008 |
integer |
DPSF13. Households by type - Female householder, no husband present |
dp0130009 |
integer |
DPSF13. Households by type - Female householder, no husband present, with own children under 18 years |
dp0130010 |
integer |
DPSF13. Households by type - Nonfamily households |
dp0130011 |
integer |
DPSF13. Households by type - Householder living alone |
dp0130012 |
integer |
DPSF13. Households by type - Householder living alone: Male |
dp0130013 |
integer |
DPSF13. Households by type - Householder living alone: Male 65 years and over |
dp0130014 |
integer |
DPSF13. Households by type - Householder living alone: Female |
dp0130015 |
integer |
DPSF13. Households by type - Householder living alone: Female 65 years and over |
dp0140001 |
integer |
DPSF14. Households with individuals under 18 years - Total |
dp0150001 |
integer |
DPSF15. Households with individuals 65 years and over - Total |
dp0160001 |
number |
DPSF16. Average household size - Average household size |
dp0170001 |
number |
DPSF17. Average family size - Average family size |
dp0180001 |
integer |
DPSF18. Housing occupancy - Total housing units |
dp0180002 |
integer |
DPSF18. Housing occupancy - Occupied housing units |
dp0180003 |
integer |
DPSF18. Housing occupancy - Vacant housing units |
dp0180004 |
integer |
DPSF18. Housing occupancy - Vacant housing units for rent |
dp0180005 |
integer |
DPSF18. Housing occupancy - Vacant housing units rented, not occupied |
dp0180006 |
integer |
DPSF18. Housing occupancy - Vacant housing units for sale only |
dp0180007 |
integer |
DPSF18. Housing occupancy - Vacant housing units sold, not occupied |
dp0180008 |
integer |
DPSF18. Housing occupancy - Vacant housing units for seasonal, recreational, or occasional use |
dp0180009 |
integer |
DPSF18. Housing occupancy - All other vacant housing units |
dp0190001 |
number |
DPSF19. Homeowner vacancy rate - Homeowner vacancy rate (percent) |
dp0200001 |
number |
DPSF20. Rental vacancy rate - Rental vacancy rate (percent) |
dp0210001 |
integer |
DPSF21. Housing tenure - Total occupied housing units |
dp0210002 |
integer |
DPSF21. Housing tenure - Owner-occupied housing units |
dp0210003 |
integer |
DPSF21. Housing tenure - Renter-occupied housing units |
dp0220001 |
integer |
DPSF22. Population in occupied housing units by tenure - Owner-occupied housing units |
dp0220002 |
integer |
DPSF22. Population in occupied housing units by tenure - Renter-occupied housing units |
dp0230001 |
number |
DPSF23. Average household size by tenure - Owner occupied |
dp0230002 |
number |
DPSF23. Average household size by tenure - Renter occupied |
functional_status_code_census |
string |
The functional status (FUNCSTAT) code defines the current functional status of a geographic entity. These codes can be found in the TIGER/Line products, gazetteer files, and other products. |
geometry |
geometry |
Geospatial representation of the feature. |
internal_point_latitude |
number |
Internal point latitude in decimal degrees. |
internal_point_longitude |
number |
Internal point longitude in decimal degrees. |
land_area |
number |
Land area in square meters. |
shape_area |
number |
Area of the feature in square degrees. |
shape_length |
number |
Length of the feature’s perimeter in degrees. |
water_area |
number |
Water area in square meters. |
out_censusdp1tract__states#
Noticed any irregularities with this table? File a report.
State-level summarized US Census demographic information, along with a geometry describing state boundaries.
- Most-recent data:
2010
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
Census DP1 – Profile of General Demographic Characteristics
- Primary key:
state_id_fips
Usage Warnings#
The original data is no longer being collected or reported in this way.
Additional Details#
Based on the 2010 Census. Used as a convenient source of geospatial information within PUDL, but should not be treated as authoritative.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_censusdp1tract__states.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_censusdp1tract__states.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_censusdp1tract__states.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_censusdp1tract__states.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
dp0010001 |
integer |
DPSF1. Sex and age - Total population |
dp0010002 |
integer |
DPSF1. Sex and age - Total population under 5 years |
dp0010003 |
integer |
DPSF1. Sex and age - Total population 5 to 9 years |
dp0010004 |
integer |
DPSF1. Sex and age - Total population 10 to 14 years |
dp0010005 |
integer |
DPSF1. Sex and age - Total population 15 to 19 years |
dp0010006 |
integer |
DPSF1. Sex and age - Total population 20 to 24 years |
dp0010007 |
integer |
DPSF1. Sex and age - Total population 25 to 29 years |
dp0010008 |
integer |
DPSF1. Sex and age - Total population 30 to 34 years |
dp0010009 |
integer |
DPSF1. Sex and age - Total population 35 to 39 years |
dp0010010 |
integer |
DPSF1. Sex and age - Total population 40 to 44 years |
dp0010011 |
integer |
DPSF1. Sex and age - Total population 45 to 49 years |
dp0010012 |
integer |
DPSF1. Sex and age - Total population 50 to 54 years |
dp0010013 |
integer |
DPSF1. Sex and age - Total population 55 to 59 years |
dp0010014 |
integer |
DPSF1. Sex and age - Total population 60 to 64 years |
dp0010015 |
integer |
DPSF1. Sex and age - Total population 65 to 69 years |
dp0010016 |
integer |
DPSF1. Sex and age - Total population 70 to 74 years |
dp0010017 |
integer |
DPSF1. Sex and age - Total population 75 to 79 years |
dp0010018 |
integer |
DPSF1. Sex and age - Total population 80 to 84 years |
dp0010019 |
integer |
DPSF1. Sex and age - Total population 85 years and over |
dp0010020 |
integer |
DPSF1. Sex and age - Male population total |
dp0010021 |
integer |
DPSF1. Sex and age - Male population under 5 years |
dp0010022 |
integer |
DPSF1. Sex and age - Male population 5 to 9 years |
dp0010023 |
integer |
DPSF1. Sex and age - Male population 10 to 14 years |
dp0010024 |
integer |
DPSF1. Sex and age - Male population 15 to 19 years |
dp0010025 |
integer |
DPSF1. Sex and age - Male population 20 to 24 years |
dp0010026 |
integer |
DPSF1. Sex and age - Male population 25 to 29 years |
dp0010027 |
integer |
DPSF1. Sex and age - Male population 30 to 34 years |
dp0010028 |
integer |
DPSF1. Sex and age - Male population 35 to 39 years |
dp0010029 |
integer |
DPSF1. Sex and age - Male population 40 to 44 years |
dp0010030 |
integer |
DPSF1. Sex and age - Male population 45 to 49 years |
dp0010031 |
integer |
DPSF1. Sex and age - Male population 50 to 54 years |
dp0010032 |
integer |
DPSF1. Sex and age - Male population 55 to 59 years |
dp0010033 |
integer |
DPSF1. Sex and age - Male population 60 to 64 years |
dp0010034 |
integer |
DPSF1. Sex and age - Male population 65 to 69 years |
dp0010035 |
integer |
DPSF1. Sex and age - Male population 70 to 74 years |
dp0010036 |
integer |
DPSF1. Sex and age - Male population 75 to 79 years |
dp0010037 |
integer |
DPSF1. Sex and age - Male population 80 to 84 years |
dp0010038 |
integer |
DPSF1. Sex and age - Male population 85 years and over |
dp0010039 |
integer |
DPSF1. Sex and age - Female population total |
dp0010040 |
integer |
DPSF1. Sex and age - Female population under 5 years |
dp0010041 |
integer |
DPSF1. Sex and age - Female population 5 to 9 years |
dp0010042 |
integer |
DPSF1. Sex and age - Female population 10 to 14 years |
dp0010043 |
integer |
DPSF1. Sex and age - Female population 15 to 19 years |
dp0010044 |
integer |
DPSF1. Sex and age - Female population 20 to 24 years |
dp0010045 |
integer |
DPSF1. Sex and age - Female population 25 to 29 years |
dp0010046 |
integer |
DPSF1. Sex and age - Female population 30 to 34 years |
dp0010047 |
integer |
DPSF1. Sex and age - Female population 35 to 39 years |
dp0010048 |
integer |
DPSF1. Sex and age - Female population 40 to 44 years |
dp0010049 |
integer |
DPSF1. Sex and age - Female population 45 to 49 years |
dp0010050 |
integer |
DPSF1. Sex and age - Female population 50 to 54 years |
dp0010051 |
integer |
DPSF1. Sex and age - Female population 55 to 59 years |
dp0010052 |
integer |
DPSF1. Sex and age - Female population 60 to 64 years |
dp0010053 |
integer |
DPSF1. Sex and age - Female population 65 to 69 years |
dp0010054 |
integer |
DPSF1. Sex and age - Female population 70 to 74 years |
dp0010055 |
integer |
DPSF1. Sex and age - Female population 75 to 79 years |
dp0010056 |
integer |
DPSF1. Sex and age - Female population 80 to 84 years |
dp0010057 |
integer |
DPSF1. Sex and age - Female population 85 years and over |
dp0020001 |
number |
DPSF2. Median age by sex - Both sexes |
dp0020002 |
number |
DPSF2. Median age by sex - Male |
dp0020003 |
number |
DPSF2. Median age by sex - Female |
dp0030001 |
integer |
DPSF3. Sex for population 16 years and over - Total |
dp0030002 |
integer |
DPSF3. Sex for population 16 years and over - Male |
dp0030003 |
integer |
DPSF3. Sex for population 16 years and over - Female |
dp0040001 |
integer |
DPSF4. Sex for population 18 years and over - Total |
dp0040002 |
integer |
DPSF4. Sex for population 18 years and over - Male |
dp0040003 |
integer |
DPSF4. Sex for population 18 years and over - Female |
dp0050001 |
integer |
DPSF5. Sex for population 21 years and over - Total |
dp0050002 |
integer |
DPSF5. Sex for population 21 years and over - Male |
dp0050003 |
integer |
DPSF5. Sex for population 21 years and over - Female |
dp0060001 |
integer |
DPSF6. Sex for population 62 years and over - Total |
dp0060002 |
integer |
DPSF6. Sex for population 62 years and over - Male |
dp0060003 |
integer |
DPSF6. Sex for population 62 years and over - Female |
dp0070001 |
integer |
DPSF7. Sex for population 65 years and over - Total |
dp0070002 |
integer |
DPSF7. Sex for population 65 years and over - Male |
dp0070003 |
integer |
DPSF7. Sex for population 65 years and over - Female |
dp0080001 |
integer |
DPSF8. Race - Total population |
dp0080002 |
integer |
DPSF8. Race - Population of one race |
dp0080003 |
integer |
DPSF8. Race - White |
dp0080004 |
integer |
DPSF8. Race - Black or African American |
dp0080005 |
integer |
DPSF8. Race - American Indian and Alaska Native |
dp0080006 |
integer |
DPSF8. Race - Asian total |
dp0080007 |
integer |
DPSF8. Race - Asian Indian |
dp0080008 |
integer |
DPSF8. Race - Chinese |
dp0080009 |
integer |
DPSF8. Race - Filipino |
dp0080010 |
integer |
DPSF8. Race - Japanese |
dp0080011 |
integer |
DPSF8. Race - Korean |
dp0080012 |
integer |
DPSF8. Race - Vietnamese |
dp0080013 |
integer |
DPSF8. Race - Other Asian |
dp0080014 |
integer |
DPSF8. Race - Native Hawaiian and Other Pacific Islander total |
dp0080015 |
integer |
DPSF8. Race - Native Hawaiian |
dp0080016 |
integer |
DPSF8. Race - Guamanian or Chamorro |
dp0080017 |
integer |
DPSF8. Race - Samoan |
dp0080018 |
integer |
DPSF8. Race - Other Pacific Islander |
dp0080019 |
integer |
DPSF8. Race - Some Other Race |
dp0080020 |
integer |
DPSF8. Race - Population of Two or More Races |
dp0080021 |
integer |
DPSF8. Race - White; American Indian and Alaska Native |
dp0080022 |
integer |
DPSF8. Race - White; Asian |
dp0080023 |
integer |
DPSF8. Race - White; Black or African American |
dp0080024 |
integer |
DPSF8. Race - White; Some Other Race |
dp0090001 |
integer |
DPSF9. Race (total races tallied) - White alone or in combination with one or more other races |
dp0090002 |
integer |
DPSF9. Race (total races tallied) - Black or African American alone or in combination with one or more other races |
dp0090003 |
integer |
DPSF9. Race (total races tallied) - American Indian and Alaska Native alone or in combination with one or more other races |
dp0090004 |
integer |
DPSF9. Race (total races tallied) - Asian alone or in combination with one or more other races |
dp0090005 |
integer |
DPSF9. Race (total races tallied) - Native Hawaiian and Other Pacific Islander alone or in combination with one or more other races |
dp0090006 |
integer |
DPSF9. Race (total races tallied) - Some Other Race alone or in combination with one or more other races |
dp0100001 |
integer |
DPSF10. Hispanic or Latino by specific origin - Total population |
dp0100002 |
integer |
DPSF10. Hispanic or Latino by specific origin - Hispanic or Latino (of any race) |
dp0100003 |
integer |
DPSF10. Hispanic or Latino by specific origin - Mexican |
dp0100004 |
integer |
DPSF10. Hispanic or Latino by specific origin - Puerto Rican |
dp0100005 |
integer |
DPSF10. Hispanic or Latino by specific origin - Cuban |
dp0100006 |
integer |
DPSF10. Hispanic or Latino by specific origin - Other Hispanic or Latino |
dp0100007 |
integer |
DPSF10. Hispanic or Latino by specific origin - Not Hispanic or Latino |
dp0110001 |
integer |
DPSF11. Hispanic or Latino and race - Total population |
dp0110002 |
integer |
DPSF11. Hispanic or Latino and race - Hispanic or Latino total |
dp0110003 |
integer |
DPSF11. Hispanic or Latino and race - Hispanic or Latino: White alone |
dp0110004 |
integer |
DPSF11. Hispanic or Latino and race - Hispanic or Latino: Black or African American alone |
dp0110005 |
integer |
DPSF11. Hispanic or Latino and race - Hispanic or Latino: American Indian and Alaska Native alone |
dp0110006 |
integer |
DPSF11. Hispanic or Latino and race - Hispanic or Latino: Asian alone |
dp0110007 |
integer |
DPSF11. Hispanic or Latino and race - Hispanic or Latino: Native Hawaiian and Other Pacific Islander alone |
dp0110008 |
integer |
DPSF11. Hispanic or Latino and race - Hispanic or Latino: Some Other Race alone |
dp0110009 |
integer |
DPSF11. Hispanic or Latino and race - Hispanic or Latino: Two or More Races |
dp0110010 |
integer |
DPSF11. Hispanic or Latino and race - Not Hispanic or Latino total |
dp0110011 |
integer |
DPSF11. Hispanic or Latino and race - Not Hispanic or Latino: White alone |
dp0110012 |
integer |
DPSF11. Hispanic or Latino and race - Not Hispanic or Latino: Black or African American alone |
dp0110013 |
integer |
DPSF11. Hispanic or Latino and race - Not Hispanic or Latino: American Indian and Alaska Native alone |
dp0110014 |
integer |
DPSF11. Hispanic or Latino and race - Not Hispanic or Latino: Asian alone |
dp0110015 |
integer |
DPSF11. Hispanic or Latino and race - Not Hispanic or Latino: Native Hawaiian and Other Pacific Islander alone |
dp0110016 |
integer |
DPSF11. Hispanic or Latino and race - Not Hispanic or Latino: Some Other Race alone |
dp0110017 |
integer |
DPSF11. Hispanic or Latino and race - Not Hispanic or Latino: Two or More Races |
dp0120001 |
integer |
DPSF12. Relationship - Total population |
dp0120002 |
integer |
DPSF12. Relationship - In households |
dp0120003 |
integer |
DPSF12. Relationship - Householder |
dp0120004 |
integer |
DPSF12. Relationship - Spouse |
dp0120005 |
integer |
DPSF12. Relationship - Child |
dp0120006 |
integer |
DPSF12. Relationship - Own child under 18 years |
dp0120007 |
integer |
DPSF12. Relationship - Other relatives |
dp0120008 |
integer |
DPSF12. Relationship - Other relatives under 18 years |
dp0120009 |
integer |
DPSF12. Relationship - Other relatives 65 years and over |
dp0120010 |
integer |
DPSF12. Relationship - Nonrelatives |
dp0120011 |
integer |
DPSF12. Relationship - Nonrelatives under 18 years |
dp0120012 |
integer |
DPSF12. Relationship - Nonrelatives 65 years and over |
dp0120013 |
integer |
DPSF12. Relationship - Unmarried partner |
dp0120014 |
integer |
DPSF12. Relationship - In group quarters |
dp0120015 |
integer |
DPSF12. Relationship - Institutionalized population |
dp0120016 |
integer |
DPSF12. Relationship - Institutionalized population: Male |
dp0120017 |
integer |
DPSF12. Relationship - Institutionalized population: Female |
dp0120018 |
integer |
DPSF12. Relationship - Noninstitutionalized population |
dp0120019 |
integer |
DPSF12. Relationship - Noninstitutionalized population: Male |
dp0120020 |
integer |
DPSF12. Relationship - Noninstitutionalized population: Female |
dp0130001 |
integer |
DPSF13. Households by type - Total households |
dp0130002 |
integer |
DPSF13. Households by type - Family households (families) |
dp0130003 |
integer |
DPSF13. Households by type - Family households with own children under 18 years |
dp0130004 |
integer |
DPSF13. Households by type - Husband-wife family |
dp0130005 |
integer |
DPSF13. Households by type - Husband-wife family with own children under 18 years |
dp0130006 |
integer |
DPSF13. Households by type - Male householder, no wife present |
dp0130007 |
integer |
DPSF13. Households by type - Male householder, no wife present, with own children under 18 years |
dp0130008 |
integer |
DPSF13. Households by type - Female householder, no husband present |
dp0130009 |
integer |
DPSF13. Households by type - Female householder, no husband present, with own children under 18 years |
dp0130010 |
integer |
DPSF13. Households by type - Nonfamily households |
dp0130011 |
integer |
DPSF13. Households by type - Householder living alone |
dp0130012 |
integer |
DPSF13. Households by type - Householder living alone: Male |
dp0130013 |
integer |
DPSF13. Households by type - Householder living alone: Male 65 years and over |
dp0130014 |
integer |
DPSF13. Households by type - Householder living alone: Female |
dp0130015 |
integer |
DPSF13. Households by type - Householder living alone: Female 65 years and over |
dp0140001 |
integer |
DPSF14. Households with individuals under 18 years - Total |
dp0150001 |
integer |
DPSF15. Households with individuals 65 years and over - Total |
dp0160001 |
number |
DPSF16. Average household size - Average household size |
dp0170001 |
number |
DPSF17. Average family size - Average family size |
dp0180001 |
integer |
DPSF18. Housing occupancy - Total housing units |
dp0180002 |
integer |
DPSF18. Housing occupancy - Occupied housing units |
dp0180003 |
integer |
DPSF18. Housing occupancy - Vacant housing units |
dp0180004 |
integer |
DPSF18. Housing occupancy - Vacant housing units for rent |
dp0180005 |
integer |
DPSF18. Housing occupancy - Vacant housing units rented, not occupied |
dp0180006 |
integer |
DPSF18. Housing occupancy - Vacant housing units for sale only |
dp0180007 |
integer |
DPSF18. Housing occupancy - Vacant housing units sold, not occupied |
dp0180008 |
integer |
DPSF18. Housing occupancy - Vacant housing units for seasonal, recreational, or occasional use |
dp0180009 |
integer |
DPSF18. Housing occupancy - All other vacant housing units |
dp0190001 |
number |
DPSF19. Homeowner vacancy rate - Homeowner vacancy rate (percent) |
dp0200001 |
number |
DPSF20. Rental vacancy rate - Rental vacancy rate (percent) |
dp0210001 |
integer |
DPSF21. Housing tenure - Total occupied housing units |
dp0210002 |
integer |
DPSF21. Housing tenure - Owner-occupied housing units |
dp0210003 |
integer |
DPSF21. Housing tenure - Renter-occupied housing units |
dp0220001 |
integer |
DPSF22. Population in occupied housing units by tenure - Owner-occupied housing units |
dp0220002 |
integer |
DPSF22. Population in occupied housing units by tenure - Renter-occupied housing units |
dp0230001 |
number |
DPSF23. Average household size by tenure - Owner occupied |
dp0230002 |
number |
DPSF23. Average household size by tenure - Renter occupied |
geometry |
geometry |
Geospatial representation of the feature. |
internal_point_latitude |
number |
Internal point latitude in decimal degrees. |
internal_point_longitude |
number |
Internal point longitude in decimal degrees. |
land_area |
number |
Land area in square meters. |
shape_area |
number |
Area of the feature in square degrees. |
shape_length |
number |
Length of the feature’s perimeter in degrees. |
state |
string |
Two letter US state abbreviation. |
state_id_fips |
string |
Two digit state FIPS code. |
state_name |
string |
Full name of the state. |
water_area |
number |
Water area in square meters. |
out_censusdp1tract__tracts#
Noticed any irregularities with this table? File a report.
Census tract-level summarized US Census demographic information, along with a geometry describing tract boundaries.
- Most-recent data:
2010
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
Census DP1 – Profile of General Demographic Characteristics
- Primary key:
tract_id_fips
Usage Warnings#
The original data is no longer being collected or reported in this way.
Additional Details#
Based on the 2010 Census. Used as a convenient source of geospatial information within PUDL, but should not be treated as authoritative.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_censusdp1tract__tracts.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_censusdp1tract__tracts.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_censusdp1tract__tracts.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_censusdp1tract__tracts.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
dp0010001 |
integer |
DPSF1. Sex and age - Total population |
dp0010002 |
integer |
DPSF1. Sex and age - Total population under 5 years |
dp0010003 |
integer |
DPSF1. Sex and age - Total population 5 to 9 years |
dp0010004 |
integer |
DPSF1. Sex and age - Total population 10 to 14 years |
dp0010005 |
integer |
DPSF1. Sex and age - Total population 15 to 19 years |
dp0010006 |
integer |
DPSF1. Sex and age - Total population 20 to 24 years |
dp0010007 |
integer |
DPSF1. Sex and age - Total population 25 to 29 years |
dp0010008 |
integer |
DPSF1. Sex and age - Total population 30 to 34 years |
dp0010009 |
integer |
DPSF1. Sex and age - Total population 35 to 39 years |
dp0010010 |
integer |
DPSF1. Sex and age - Total population 40 to 44 years |
dp0010011 |
integer |
DPSF1. Sex and age - Total population 45 to 49 years |
dp0010012 |
integer |
DPSF1. Sex and age - Total population 50 to 54 years |
dp0010013 |
integer |
DPSF1. Sex and age - Total population 55 to 59 years |
dp0010014 |
integer |
DPSF1. Sex and age - Total population 60 to 64 years |
dp0010015 |
integer |
DPSF1. Sex and age - Total population 65 to 69 years |
dp0010016 |
integer |
DPSF1. Sex and age - Total population 70 to 74 years |
dp0010017 |
integer |
DPSF1. Sex and age - Total population 75 to 79 years |
dp0010018 |
integer |
DPSF1. Sex and age - Total population 80 to 84 years |
dp0010019 |
integer |
DPSF1. Sex and age - Total population 85 years and over |
dp0010020 |
integer |
DPSF1. Sex and age - Male population total |
dp0010021 |
integer |
DPSF1. Sex and age - Male population under 5 years |
dp0010022 |
integer |
DPSF1. Sex and age - Male population 5 to 9 years |
dp0010023 |
integer |
DPSF1. Sex and age - Male population 10 to 14 years |
dp0010024 |
integer |
DPSF1. Sex and age - Male population 15 to 19 years |
dp0010025 |
integer |
DPSF1. Sex and age - Male population 20 to 24 years |
dp0010026 |
integer |
DPSF1. Sex and age - Male population 25 to 29 years |
dp0010027 |
integer |
DPSF1. Sex and age - Male population 30 to 34 years |
dp0010028 |
integer |
DPSF1. Sex and age - Male population 35 to 39 years |
dp0010029 |
integer |
DPSF1. Sex and age - Male population 40 to 44 years |
dp0010030 |
integer |
DPSF1. Sex and age - Male population 45 to 49 years |
dp0010031 |
integer |
DPSF1. Sex and age - Male population 50 to 54 years |
dp0010032 |
integer |
DPSF1. Sex and age - Male population 55 to 59 years |
dp0010033 |
integer |
DPSF1. Sex and age - Male population 60 to 64 years |
dp0010034 |
integer |
DPSF1. Sex and age - Male population 65 to 69 years |
dp0010035 |
integer |
DPSF1. Sex and age - Male population 70 to 74 years |
dp0010036 |
integer |
DPSF1. Sex and age - Male population 75 to 79 years |
dp0010037 |
integer |
DPSF1. Sex and age - Male population 80 to 84 years |
dp0010038 |
integer |
DPSF1. Sex and age - Male population 85 years and over |
dp0010039 |
integer |
DPSF1. Sex and age - Female population total |
dp0010040 |
integer |
DPSF1. Sex and age - Female population under 5 years |
dp0010041 |
integer |
DPSF1. Sex and age - Female population 5 to 9 years |
dp0010042 |
integer |
DPSF1. Sex and age - Female population 10 to 14 years |
dp0010043 |
integer |
DPSF1. Sex and age - Female population 15 to 19 years |
dp0010044 |
integer |
DPSF1. Sex and age - Female population 20 to 24 years |
dp0010045 |
integer |
DPSF1. Sex and age - Female population 25 to 29 years |
dp0010046 |
integer |
DPSF1. Sex and age - Female population 30 to 34 years |
dp0010047 |
integer |
DPSF1. Sex and age - Female population 35 to 39 years |
dp0010048 |
integer |
DPSF1. Sex and age - Female population 40 to 44 years |
dp0010049 |
integer |
DPSF1. Sex and age - Female population 45 to 49 years |
dp0010050 |
integer |
DPSF1. Sex and age - Female population 50 to 54 years |
dp0010051 |
integer |
DPSF1. Sex and age - Female population 55 to 59 years |
dp0010052 |
integer |
DPSF1. Sex and age - Female population 60 to 64 years |
dp0010053 |
integer |
DPSF1. Sex and age - Female population 65 to 69 years |
dp0010054 |
integer |
DPSF1. Sex and age - Female population 70 to 74 years |
dp0010055 |
integer |
DPSF1. Sex and age - Female population 75 to 79 years |
dp0010056 |
integer |
DPSF1. Sex and age - Female population 80 to 84 years |
dp0010057 |
integer |
DPSF1. Sex and age - Female population 85 years and over |
dp0020001 |
number |
DPSF2. Median age by sex - Both sexes |
dp0020002 |
number |
DPSF2. Median age by sex - Male |
dp0020003 |
number |
DPSF2. Median age by sex - Female |
dp0030001 |
integer |
DPSF3. Sex for population 16 years and over - Total |
dp0030002 |
integer |
DPSF3. Sex for population 16 years and over - Male |
dp0030003 |
integer |
DPSF3. Sex for population 16 years and over - Female |
dp0040001 |
integer |
DPSF4. Sex for population 18 years and over - Total |
dp0040002 |
integer |
DPSF4. Sex for population 18 years and over - Male |
dp0040003 |
integer |
DPSF4. Sex for population 18 years and over - Female |
dp0050001 |
integer |
DPSF5. Sex for population 21 years and over - Total |
dp0050002 |
integer |
DPSF5. Sex for population 21 years and over - Male |
dp0050003 |
integer |
DPSF5. Sex for population 21 years and over - Female |
dp0060001 |
integer |
DPSF6. Sex for population 62 years and over - Total |
dp0060002 |
integer |
DPSF6. Sex for population 62 years and over - Male |
dp0060003 |
integer |
DPSF6. Sex for population 62 years and over - Female |
dp0070001 |
integer |
DPSF7. Sex for population 65 years and over - Total |
dp0070002 |
integer |
DPSF7. Sex for population 65 years and over - Male |
dp0070003 |
integer |
DPSF7. Sex for population 65 years and over - Female |
dp0080001 |
integer |
DPSF8. Race - Total population |
dp0080002 |
integer |
DPSF8. Race - Population of one race |
dp0080003 |
integer |
DPSF8. Race - White |
dp0080004 |
integer |
DPSF8. Race - Black or African American |
dp0080005 |
integer |
DPSF8. Race - American Indian and Alaska Native |
dp0080006 |
integer |
DPSF8. Race - Asian total |
dp0080007 |
integer |
DPSF8. Race - Asian Indian |
dp0080008 |
integer |
DPSF8. Race - Chinese |
dp0080009 |
integer |
DPSF8. Race - Filipino |
dp0080010 |
integer |
DPSF8. Race - Japanese |
dp0080011 |
integer |
DPSF8. Race - Korean |
dp0080012 |
integer |
DPSF8. Race - Vietnamese |
dp0080013 |
integer |
DPSF8. Race - Other Asian |
dp0080014 |
integer |
DPSF8. Race - Native Hawaiian and Other Pacific Islander total |
dp0080015 |
integer |
DPSF8. Race - Native Hawaiian |
dp0080016 |
integer |
DPSF8. Race - Guamanian or Chamorro |
dp0080017 |
integer |
DPSF8. Race - Samoan |
dp0080018 |
integer |
DPSF8. Race - Other Pacific Islander |
dp0080019 |
integer |
DPSF8. Race - Some Other Race |
dp0080020 |
integer |
DPSF8. Race - Population of Two or More Races |
dp0080021 |
integer |
DPSF8. Race - White; American Indian and Alaska Native |
dp0080022 |
integer |
DPSF8. Race - White; Asian |
dp0080023 |
integer |
DPSF8. Race - White; Black or African American |
dp0080024 |
integer |
DPSF8. Race - White; Some Other Race |
dp0090001 |
integer |
DPSF9. Race (total races tallied) - White alone or in combination with one or more other races |
dp0090002 |
integer |
DPSF9. Race (total races tallied) - Black or African American alone or in combination with one or more other races |
dp0090003 |
integer |
DPSF9. Race (total races tallied) - American Indian and Alaska Native alone or in combination with one or more other races |
dp0090004 |
integer |
DPSF9. Race (total races tallied) - Asian alone or in combination with one or more other races |
dp0090005 |
integer |
DPSF9. Race (total races tallied) - Native Hawaiian and Other Pacific Islander alone or in combination with one or more other races |
dp0090006 |
integer |
DPSF9. Race (total races tallied) - Some Other Race alone or in combination with one or more other races |
dp0100001 |
integer |
DPSF10. Hispanic or Latino by specific origin - Total population |
dp0100002 |
integer |
DPSF10. Hispanic or Latino by specific origin - Hispanic or Latino (of any race) |
dp0100003 |
integer |
DPSF10. Hispanic or Latino by specific origin - Mexican |
dp0100004 |
integer |
DPSF10. Hispanic or Latino by specific origin - Puerto Rican |
dp0100005 |
integer |
DPSF10. Hispanic or Latino by specific origin - Cuban |
dp0100006 |
integer |
DPSF10. Hispanic or Latino by specific origin - Other Hispanic or Latino |
dp0100007 |
integer |
DPSF10. Hispanic or Latino by specific origin - Not Hispanic or Latino |
dp0110001 |
integer |
DPSF11. Hispanic or Latino and race - Total population |
dp0110002 |
integer |
DPSF11. Hispanic or Latino and race - Hispanic or Latino total |
dp0110003 |
integer |
DPSF11. Hispanic or Latino and race - Hispanic or Latino: White alone |
dp0110004 |
integer |
DPSF11. Hispanic or Latino and race - Hispanic or Latino: Black or African American alone |
dp0110005 |
integer |
DPSF11. Hispanic or Latino and race - Hispanic or Latino: American Indian and Alaska Native alone |
dp0110006 |
integer |
DPSF11. Hispanic or Latino and race - Hispanic or Latino: Asian alone |
dp0110007 |
integer |
DPSF11. Hispanic or Latino and race - Hispanic or Latino: Native Hawaiian and Other Pacific Islander alone |
dp0110008 |
integer |
DPSF11. Hispanic or Latino and race - Hispanic or Latino: Some Other Race alone |
dp0110009 |
integer |
DPSF11. Hispanic or Latino and race - Hispanic or Latino: Two or More Races |
dp0110010 |
integer |
DPSF11. Hispanic or Latino and race - Not Hispanic or Latino total |
dp0110011 |
integer |
DPSF11. Hispanic or Latino and race - Not Hispanic or Latino: White alone |
dp0110012 |
integer |
DPSF11. Hispanic or Latino and race - Not Hispanic or Latino: Black or African American alone |
dp0110013 |
integer |
DPSF11. Hispanic or Latino and race - Not Hispanic or Latino: American Indian and Alaska Native alone |
dp0110014 |
integer |
DPSF11. Hispanic or Latino and race - Not Hispanic or Latino: Asian alone |
dp0110015 |
integer |
DPSF11. Hispanic or Latino and race - Not Hispanic or Latino: Native Hawaiian and Other Pacific Islander alone |
dp0110016 |
integer |
DPSF11. Hispanic or Latino and race - Not Hispanic or Latino: Some Other Race alone |
dp0110017 |
integer |
DPSF11. Hispanic or Latino and race - Not Hispanic or Latino: Two or More Races |
dp0120001 |
integer |
DPSF12. Relationship - Total population |
dp0120002 |
integer |
DPSF12. Relationship - In households |
dp0120003 |
integer |
DPSF12. Relationship - Householder |
dp0120004 |
integer |
DPSF12. Relationship - Spouse |
dp0120005 |
integer |
DPSF12. Relationship - Child |
dp0120006 |
integer |
DPSF12. Relationship - Own child under 18 years |
dp0120007 |
integer |
DPSF12. Relationship - Other relatives |
dp0120008 |
integer |
DPSF12. Relationship - Other relatives under 18 years |
dp0120009 |
integer |
DPSF12. Relationship - Other relatives 65 years and over |
dp0120010 |
integer |
DPSF12. Relationship - Nonrelatives |
dp0120011 |
integer |
DPSF12. Relationship - Nonrelatives under 18 years |
dp0120012 |
integer |
DPSF12. Relationship - Nonrelatives 65 years and over |
dp0120013 |
integer |
DPSF12. Relationship - Unmarried partner |
dp0120014 |
integer |
DPSF12. Relationship - In group quarters |
dp0120015 |
integer |
DPSF12. Relationship - Institutionalized population |
dp0120016 |
integer |
DPSF12. Relationship - Institutionalized population: Male |
dp0120017 |
integer |
DPSF12. Relationship - Institutionalized population: Female |
dp0120018 |
integer |
DPSF12. Relationship - Noninstitutionalized population |
dp0120019 |
integer |
DPSF12. Relationship - Noninstitutionalized population: Male |
dp0120020 |
integer |
DPSF12. Relationship - Noninstitutionalized population: Female |
dp0130001 |
integer |
DPSF13. Households by type - Total households |
dp0130002 |
integer |
DPSF13. Households by type - Family households (families) |
dp0130003 |
integer |
DPSF13. Households by type - Family households with own children under 18 years |
dp0130004 |
integer |
DPSF13. Households by type - Husband-wife family |
dp0130005 |
integer |
DPSF13. Households by type - Husband-wife family with own children under 18 years |
dp0130006 |
integer |
DPSF13. Households by type - Male householder, no wife present |
dp0130007 |
integer |
DPSF13. Households by type - Male householder, no wife present, with own children under 18 years |
dp0130008 |
integer |
DPSF13. Households by type - Female householder, no husband present |
dp0130009 |
integer |
DPSF13. Households by type - Female householder, no husband present, with own children under 18 years |
dp0130010 |
integer |
DPSF13. Households by type - Nonfamily households |
dp0130011 |
integer |
DPSF13. Households by type - Householder living alone |
dp0130012 |
integer |
DPSF13. Households by type - Householder living alone: Male |
dp0130013 |
integer |
DPSF13. Households by type - Householder living alone: Male 65 years and over |
dp0130014 |
integer |
DPSF13. Households by type - Householder living alone: Female |
dp0130015 |
integer |
DPSF13. Households by type - Householder living alone: Female 65 years and over |
dp0140001 |
integer |
DPSF14. Households with individuals under 18 years - Total |
dp0150001 |
integer |
DPSF15. Households with individuals 65 years and over - Total |
dp0160001 |
number |
DPSF16. Average household size - Average household size |
dp0170001 |
number |
DPSF17. Average family size - Average family size |
dp0180001 |
integer |
DPSF18. Housing occupancy - Total housing units |
dp0180002 |
integer |
DPSF18. Housing occupancy - Occupied housing units |
dp0180003 |
integer |
DPSF18. Housing occupancy - Vacant housing units |
dp0180004 |
integer |
DPSF18. Housing occupancy - Vacant housing units for rent |
dp0180005 |
integer |
DPSF18. Housing occupancy - Vacant housing units rented, not occupied |
dp0180006 |
integer |
DPSF18. Housing occupancy - Vacant housing units for sale only |
dp0180007 |
integer |
DPSF18. Housing occupancy - Vacant housing units sold, not occupied |
dp0180008 |
integer |
DPSF18. Housing occupancy - Vacant housing units for seasonal, recreational, or occasional use |
dp0180009 |
integer |
DPSF18. Housing occupancy - All other vacant housing units |
dp0190001 |
number |
DPSF19. Homeowner vacancy rate - Homeowner vacancy rate (percent) |
dp0200001 |
number |
DPSF20. Rental vacancy rate - Rental vacancy rate (percent) |
dp0210001 |
integer |
DPSF21. Housing tenure - Total occupied housing units |
dp0210002 |
integer |
DPSF21. Housing tenure - Owner-occupied housing units |
dp0210003 |
integer |
DPSF21. Housing tenure - Renter-occupied housing units |
dp0220001 |
integer |
DPSF22. Population in occupied housing units by tenure - Owner-occupied housing units |
dp0220002 |
integer |
DPSF22. Population in occupied housing units by tenure - Renter-occupied housing units |
dp0230001 |
number |
DPSF23. Average household size by tenure - Owner occupied |
dp0230002 |
number |
DPSF23. Average household size by tenure - Renter occupied |
geometry |
geometry |
Geospatial representation of the feature. |
internal_point_latitude |
number |
Internal point latitude in decimal degrees. |
internal_point_longitude |
number |
Internal point longitude in decimal degrees. |
land_area |
number |
Land area in square meters. |
shape_area |
number |
Area of the feature in square degrees. |
shape_length |
number |
Length of the feature’s perimeter in degrees. |
tract_id_fips |
string |
Census tract 10-digit FIPS code |
tract_name |
string |
Census tract legal/statistical area description |
water_area |
number |
Water area in square meters. |
out_eia860__yearly_emissions_control_equipment#
Noticed any irregularities with this table? File a report.
Annual time series of the cost, type, operating status, retirement date, and install year of emissions control equipment reported to EIA.
- Most-recent data:
2024
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
EIA Form 860 – Annual Electric Generator Report
- Primary key:
report_date, plant_id_eia, emission_control_id_pudl
Additional Details#
Includes control ids for sulfur dioxide (SO2), particulate matter, mercury, nitrogen oxide (NOX), and acid (HCl) gas monitoring.
This denormalized version includes plant name, utility id, pudl id, and utility name columns.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia860__yearly_emissions_control_equipment.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia860__yearly_emissions_control_equipment.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia860__yearly_emissions_control_equipment.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia860__yearly_emissions_control_equipment.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
acid_gas_control |
boolean |
Indicates whether the emissions control equipment controls acid (HCl) gas. |
data_maturity |
string |
Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity. |
emission_control_equipment_cost |
number |
The total cost to install a piece of emission control equipment. |
emission_control_equipment_type_code |
string |
Short code indicating the type of emission control equipment installed. |
emission_control_id_pudl |
number |
A PUDL-generated ID used to distinguish emission control units in the same report year and plant id. This ID should not be used to track units over time or between plants. |
emission_control_operating_date |
date |
The date a piece of emissions control equipment began operating. Derived from month and year columns in the raw data. |
emission_control_retirement_date |
date |
The expected or actual retirement date for a piece of emissions control equipment. Derived from month and year columns in the raw data. |
mercury_control_id_eia |
string |
Mercury control identification number. This ID is not a unique identifier. |
nox_control_id_eia |
string |
Nitrogen oxide control identification number. This ID is not a unique identifier. |
operational_status |
string |
The operating status of the asset. For generators this is based on which tab the generator was listed in in EIA 860. |
operational_status_code |
string |
The operating status of the asset. |
particulate_control_id_eia |
string |
Particulate matter control identification number. This ID is not a unique identifier. |
plant_id_eia |
integer |
The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration. |
plant_id_pudl |
integer |
A manually assigned PUDL plant ID. May not be constant over time. |
plant_name_eia |
string |
Plant name. |
report_date |
date |
Date reported. |
so2_control_id_eia |
string |
Sulfur dioxide control identification number. This ID is not a unique identifier. |
utility_id_eia |
integer |
The EIA Utility Identification number. |
utility_id_pudl |
integer |
A manually assigned PUDL utility ID. May not be stable over time. |
utility_name_eia |
string |
The name of the utility. |
out_eia860__yearly_ownership#
Noticed any irregularities with this table? File a report.
Annual time series of generator ownership. Includes only jointly or third-party owned generators.
- Most-recent data:
2024
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
EIA Form 860 – Annual Electric Generator Report (Schedule 4)
- Primary key:
report_date, plant_id_eia, generator_id, owner_utility_id_eia
Additional Details#
Denormalized to include plant and utility names and other associated IDs.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia860__yearly_ownership.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia860__yearly_ownership.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia860__yearly_ownership.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia860__yearly_ownership.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
data_maturity |
string |
Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity. |
fraction_owned |
number |
Proportion of generator ownership attributable to this utility. |
generator_id |
string |
Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string! |
operator_utility_id_eia |
integer |
The EIA utility Identification number for the operator utility. |
owner_city |
string |
City of owner. |
owner_country |
string |
Three letter ISO-3166 country code. |
owner_state |
string |
Two letter ISO-3166 political subdivision code. |
owner_street_address |
string |
Steet address of owner. |
owner_utility_id_eia |
integer |
The EIA utility Identification number for the owner company that is responsible for the day-to-day operations of the generator, not the operator utility. |
owner_utility_name_eia |
string |
The name of the EIA owner utility. |
owner_zip_code |
string |
Zip code of owner. |
plant_id_eia |
integer |
The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration. |
plant_id_pudl |
integer |
A manually assigned PUDL plant ID. May not be constant over time. |
plant_name_eia |
string |
Plant name. |
report_date |
date |
Date reported. |
utility_id_pudl |
integer |
A manually assigned PUDL utility ID for the owner company that is responsible for the day-to-day operations of the generator, not the operator utility. May not be stable over time. |
out_eia861__yearly_utility_service_territory#
Noticed any irregularities with this table? File a report.
Annual time series of counties in utility service territories.
- Most-recent data:
2024
- Processing:
Data is ready for use in analyses, but for practical reasons has not been denormalized and remains in narrow format.
- Source:
EIA Form 861 – Annual Electric Power Industry Report
- Primary key:
utility_id_eia, report_date, county_id_fips
Additional Details#
Contains additional information about counties.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia861__yearly_utility_service_territory.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia861__yearly_utility_service_territory.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia861__yearly_utility_service_territory.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia861__yearly_utility_service_territory.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
area_km2 |
number |
County area in km2. |
county |
string |
County name. |
county_id_fips |
string |
County ID from the Federal Information Processing Standard Publication 6-4. |
county_name_census |
string |
County name as specified in Census DP1 Data. |
population |
number |
County population, sourced from Census DP1 data. |
report_date |
date |
Date reported. |
state |
string |
Two letter US state abbreviation. |
state_id_fips |
string |
Two digit state FIPS code. |
utility_id_eia |
integer |
The EIA Utility Identification number. |
out_eia923__boiler_fuel#
Noticed any irregularities with this table? File a report.
Monthly time series of boiler fuel consumption and emissions.
- Most-recent data:
2025
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
EIA Form 923 – Power Plant Operations Report (Schedule 3)
- Primary key:
plant_id_eia, boiler_id, energy_source_code, prime_mover_code, report_date
Usage Warnings#
Date column arbitrarily uses the first of the month.
A small number of respondents only report annual fuel consumption, and all of it is reported in December.
Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.
Additional Details#
Reports the quantity of each type of fuel consumed by each boiler on a monthly basis, as well as the sulfur and ash content of those fuels. Fuel quantity is reported in standard EIA fuel units (tons, barrels, Mcf). Heat content per unit of fuel is also reported, making this table useful for calculating the thermal efficiency (heat rate) of various generation units.
This table provides better coverage of the entire fleet of generators than the
core_eia923__monthly_generation_fuel table, but the fuel consumption reported here is not directly
associated with a generator. This complicates the heat rate calculation, since the
associations between individual boilers and generators are incomplete and can be
complex.
This table inherits canonicalized values for plants and utilities. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia923__boiler_fuel.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia923__boiler_fuel.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia923__boiler_fuel.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia923__boiler_fuel.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
ash_content_pct |
number |
Ash content percentage by weight to the nearest 0.1 percent. |
boiler_id |
string |
Alphanumeric boiler ID. |
data_maturity |
string |
Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity. |
energy_source_code |
string |
A 2-3 letter code indicating the energy source (e.g. fuel type) associated with the record. |
fuel_consumed_mmbtu |
number |
Total consumption of fuel in physical unit, year to date. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production. |
fuel_consumed_units |
number |
Consumption of the fuel type in physical unit. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production. |
fuel_mmbtu_per_unit |
number |
Heat content of the fuel in millions of Btus per physical unit. |
fuel_type_code_pudl |
string |
Simplified fuel type code used in PUDL |
plant_id_eia |
integer |
The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration. |
plant_id_pudl |
integer |
A manually assigned PUDL plant ID. May not be constant over time. |
plant_name_eia |
string |
Plant name. |
prime_mover_code |
string |
Code for the type of prime mover (e.g. CT, CG) |
report_date |
date |
Date reported. |
sulfur_content_pct |
number |
Sulfur content percentage by weight to the nearest 0.01 percent. |
unit_id_pudl |
integer |
Dynamically assigned PUDL unit id. WARNING: This ID is not guaranteed to be static long term as the input data and algorithm may evolve over time. |
utility_id_eia |
integer |
The EIA Utility Identification number. |
utility_id_pudl |
integer |
A manually assigned PUDL utility ID. May not be stable over time. |
utility_name_eia |
string |
The name of the utility. |
out_eia923__fuel_receipts_costs#
Noticed any irregularities with this table? File a report.
Individual fuel deliveries to power plants, organized by fuel type and supplier.
- Most-recent data:
2025
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
EIA Form 923 – Power Plant Operations Report (Schedule 2 - Part A)
- Primary key:
This table has no primary key. Each record describes an individual fuel delivery. There can be multiple deliveries of the same type of fuel from the same supplier to the same plant in a single month, so the table has no natural primary key.
Usage Warnings#
Date column arbitrarily uses the first of the month.
Some values have been redacted.
Time of fuel deliveries is not necessarily connected with time of fuel consumption.
Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.
Contains estimated values.
Additional Details#
There can be a significant delay between the receipt of fuel and its consumption, so using this table to infer monthly attributes associated with power generation may not be entirely accurate. However, this is the most granular data we have describing fuel costs, and we use it in calculating the marginal cost of electricity for individual generation units.
Under some circumstances utilities are allowed to treat the price of fuel as proprietary business data, meaning it is redacted from the publicly available spreadsheets. It’s still reported to EIA and influences the aggregated (state, region, annual, etc.) fuel prices they publish. From 2009-2021 about 1/3 of all prices are redacted. The missing data is not randomly distributed. Deregulated markets dominated by merchant generators (independent power producers) redact much more data, and natural gas is by far the most likely fuel to have its price redacted. This means, for instance, that the entire Northeastern US reports essentially no fine-grained data about its natural gas prices.
Additional data which we haven’t yet integrated is available in a similar format from 2002-2008 via the EIA-423, and going back as far as 1972 from the FERC-423.
This table inherits canonicalized values for plants and utilities. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia923__fuel_receipts_costs.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia923__fuel_receipts_costs.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia923__fuel_receipts_costs.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia923__fuel_receipts_costs.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
ash_content_pct |
number |
Ash content percentage by weight to the nearest 0.1 percent. |
bulk_agg_fuel_cost_per_mmbtu |
number |
Fuel cost per MMBTU reported in the EIA bulk electricity data. This is an aggregate average fuel price for a whole state, region, month, sector, etc. Used to fill in missing fuel prices. |
chlorine_content_ppm |
number |
For coal only: the chlorine content in parts per million (ppm) to the nearest 0.001 ppm. If lab tests of the coal do not include the chlorine content, this field contains the amount specified in the contract with the supplier. |
coalmine_county_id_fips |
string |
County ID from the Federal Information Processing Standard Publication 6-4. This is the county where the coal mine is located. |
contract_expiration_date |
date |
Date contract expires.Format: MMYY. |
contract_type_code |
string |
Purchase type under which receipts occurred in the reporting month. C: Contract, NC: New Contract, S: Spot Purchase, T: Tolling Agreement. |
data_maturity |
string |
Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity. |
energy_source_code |
string |
A 2-3 letter code indicating the energy source (e.g. fuel type) associated with the record. |
fuel_consumed_mmbtu |
number |
Total consumption of fuel in physical unit, year to date. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production. |
fuel_cost_per_mmbtu |
number |
Average fuel cost per MMBTU of heat content in nominal USD. |
fuel_cost_per_mmbtu_source |
string |
Indicates the source of the values in the fuel_cost_per_mmbtu column. The fuel cost either comes directly from the EIA forms (original), was filled in from the EIA’s API using state-level averages (eiaapi), was filled in using a rolling average (rolling_avg) or When the records get aggregated together and contain multiple sources (mixed). |
fuel_group_code |
string |
Fuel groups used in the Electric Power Monthly |
fuel_mmbtu_per_unit |
number |
Heat content of the fuel in millions of Btus per physical unit. |
fuel_received_units |
number |
Quantity of fuel received in tons, barrel, or Mcf. |
fuel_type_code_pudl |
string |
Simplified fuel type code used in PUDL |
mercury_content_ppm |
number |
Mercury content in parts per million (ppm) to the nearest 0.001 ppm. |
mine_id_msha |
integer |
MSHA issued mine identifier. |
mine_name |
string |
Coal mine name. |
mine_state |
string |
State where the coal mine is located. Two letter abbreviation. |
mine_type_code |
string |
Type of coal mine. |
moisture_content_pct |
number |
For coal only: the moisture content of the fuel in terms of moisture percentage by weight. Reported to the nearest 0.01 percent. |
natural_gas_delivery_contract_type_code |
string |
Contract type for natural gas delivery service: |
natural_gas_transport_code |
string |
Contract type for natural gas transportation service. |
plant_id_eia |
integer |
The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration. |
plant_id_pudl |
integer |
A manually assigned PUDL plant ID. May not be constant over time. |
plant_name_eia |
string |
Plant name. |
primary_transportation_mode_code |
string |
Transportation mode for the longest distance transported. |
report_date |
date |
Date reported. |
secondary_transportation_mode_code |
string |
Transportation mode for the second longest distance transported. |
state |
string |
Two letter US state abbreviation. |
sulfur_content_pct |
number |
Sulfur content percentage by weight to the nearest 0.01 percent. |
supplier_name |
string |
Company that sold the fuel to the plant or, in the case of Natural Gas, pipeline owner. |
total_fuel_cost |
number |
Total annual reported fuel costs for the plant part. Includes costs from all fuels. |
utility_id_eia |
integer |
The EIA Utility Identification number. |
utility_id_pudl |
integer |
A manually assigned PUDL utility ID. May not be stable over time. |
utility_name_eia |
string |
The name of the utility. |
out_eia923__generation#
Noticed any irregularities with this table? File a report.
Monthly time series of net electricity generated by power plant generators.
- Most-recent data:
2025
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
EIA Form 923 – Power Plant Operations Report (Schedule 3)
- Primary key:
plant_id_eia, generator_id, report_date
Usage Warnings#
Date column arbitrarily uses the first of the month.
Not all IDs are present.
Table has known low coverage - either geographic or temporal or otherwise.
A small number of respondents only report annual fuel consumption, and all of it is reported in December.
Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.
Additional Details#
This is the most granular information we have about how much electricity individual
generators are producing, but only about half of all the generation reported in the
core_eia923__monthly_generation_fuel appears in this table due to the different reporting
requirements imposed on different types and sizes of generators.
Whenever possible, we use this generator-level net generation to estimate the heat rates of generation units and the marginal cost of electricity on a per-generator basis, but those calculations depend on plant-level fuel costs and sometimes uncertain or incomplete boiler-generator associations.
This table inherits canonicalized values for plants and utilities. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia923__generation.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia923__generation.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia923__generation.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia923__generation.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
data_maturity |
string |
Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity. |
generator_id |
string |
Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string! |
net_generation_mwh |
number |
Net electricity generation for the specified period in megawatt-hours (MWh). |
plant_id_eia |
integer |
The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration. |
plant_id_pudl |
integer |
A manually assigned PUDL plant ID. May not be constant over time. |
plant_name_eia |
string |
Plant name. |
report_date |
date |
Date reported. |
unit_id_pudl |
integer |
Dynamically assigned PUDL unit id. WARNING: This ID is not guaranteed to be static long term as the input data and algorithm may evolve over time. |
utility_id_eia |
integer |
The EIA Utility Identification number. |
utility_id_pudl |
integer |
A manually assigned PUDL utility ID. May not be stable over time. |
utility_name_eia |
string |
The name of the utility. |
out_eia923__generation_fuel_combined#
Noticed any irregularities with this table? File a report.
Monthly time series of generation and fuel consumption of all generation units.
- Most-recent data:
2025
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
EIA Form 923 – Power Plant Operations Report (Schedule 3)
- Primary key:
plant_id_eia, report_date, prime_mover_code, energy_source_code
Usage Warnings#
Date column arbitrarily uses the first of the month.
A small number of respondents only report annual fuel consumption, and all of it is reported in December.
Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.
Additional Details#
Denormalized, combined data from the core_eia923__monthly_generation_fuel and
core_eia923__monthly_generation_fuel_nuclear with nuclear generation aggregated from the nuclear
generation unit level up to the plant prime mover level, so as to be compatible with
fossil fuel generation data.
This data can’t be easily linked to individual boilers, generators, and
generation units, but it is provides the most complete coverage
of fuel consumption and electricity generation for the entire generation fleet. We use
the primary fuels and prime movers reported for each generator along with their
capacities to attribute fuel consumption and generation when it isn’t directly reported
in the core_eia923__monthly_generation and core_eia923__monthly_boiler_fuel tables in order to calculate
capacity factors, heat rates, and the marginal cost of electricity.
The table makes a distinction between all fuel consumed and fuel consumed for electricity generation because some units are also combined heat and power (CHP) units, and also provide high temperature process heat at the expense of net electricity generation.
This table inherits canonicalized values for plants and utilities. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia923__generation_fuel_combined.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia923__generation_fuel_combined.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia923__generation_fuel_combined.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia923__generation_fuel_combined.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
data_maturity |
string |
Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity. |
energy_source_code |
string |
A 2-3 letter code indicating the energy source (e.g. fuel type) associated with the record. |
fuel_consumed_for_electricity_mmbtu |
number |
Total consumption of fuel to produce electricity, in physical unit, year to date. |
fuel_consumed_for_electricity_units |
number |
Consumption for electric generation of the fuel type in physical unit. |
fuel_consumed_mmbtu |
number |
Total consumption of fuel in physical unit, year to date. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production. |
fuel_consumed_units |
number |
Consumption of the fuel type in physical unit. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production. |
fuel_mmbtu_per_unit |
number |
Heat content of the fuel in millions of Btus per physical unit. |
fuel_type_code_agg |
string |
A partial aggregation of the reported fuel type codes into larger categories used by EIA in, for example, the Annual Energy Review (AER) or Monthly Energy Review (MER). Two or three letter alphanumeric. |
fuel_type_code_pudl |
string |
Simplified fuel type code used in PUDL |
net_generation_mwh |
number |
Net electricity generation for the specified period in megawatt-hours (MWh). |
plant_id_eia |
integer |
The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration. |
plant_id_pudl |
integer |
A manually assigned PUDL plant ID. May not be constant over time. |
plant_name_eia |
string |
Plant name. |
prime_mover_code |
string |
Code for the type of prime mover (e.g. CT, CG) |
report_date |
date |
Date reported. |
utility_id_eia |
integer |
The EIA Utility Identification number. |
utility_id_pudl |
integer |
A manually assigned PUDL utility ID. May not be stable over time. |
utility_name_eia |
string |
The name of the utility. |
out_eia923__monthly_boiler_fuel#
Noticed any irregularities with this table? File a report.
Monthly time series of boiler fuel consumption and emissions.
- Most-recent data:
2025
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
EIA Form 923 – Power Plant Operations Report (Schedule 3)
- Primary key:
plant_id_eia, boiler_id, energy_source_code, prime_mover_code, report_date
Usage Warnings#
Date column arbitrarily uses the first of the month.
A small number of respondents only report annual fuel consumption, and all of it is reported in December.
Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.
Additional Details#
Reports the quantity of each type of fuel consumed by each boiler on a monthly basis, as well as the sulfur and ash content of those fuels. Fuel quantity is reported in standard EIA fuel units (tons, barrels, Mcf). Heat content per unit of fuel is also reported, making this table useful for calculating the thermal efficiency (heat rate) of various generation units.
This table provides better coverage of the entire fleet of generators than the
core_eia923__monthly_generation_fuel table, but the fuel consumption reported here is not directly
associated with a generator. This complicates the heat rate calculation, since the
associations between individual boilers and generators are incomplete and can be
complex.
This table inherits canonicalized values for plants and utilities. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia923__monthly_boiler_fuel.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia923__monthly_boiler_fuel.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia923__monthly_boiler_fuel.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia923__monthly_boiler_fuel.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
ash_content_pct |
number |
Ash content percentage by weight to the nearest 0.1 percent. |
boiler_id |
string |
Alphanumeric boiler ID. |
data_maturity |
string |
Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity. |
energy_source_code |
string |
A 2-3 letter code indicating the energy source (e.g. fuel type) associated with the record. |
fuel_consumed_mmbtu |
number |
Total consumption of fuel in physical unit, year to date. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production. |
fuel_consumed_units |
number |
Consumption of the fuel type in physical unit. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production. |
fuel_mmbtu_per_unit |
number |
Heat content of the fuel in millions of Btus per physical unit. |
fuel_type_code_pudl |
string |
Simplified fuel type code used in PUDL |
plant_id_eia |
integer |
The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration. |
plant_id_pudl |
integer |
A manually assigned PUDL plant ID. May not be constant over time. |
plant_name_eia |
string |
Plant name. |
prime_mover_code |
string |
Code for the type of prime mover (e.g. CT, CG) |
report_date |
date |
Date reported. |
sulfur_content_pct |
number |
Sulfur content percentage by weight to the nearest 0.01 percent. |
unit_id_pudl |
integer |
Dynamically assigned PUDL unit id. WARNING: This ID is not guaranteed to be static long term as the input data and algorithm may evolve over time. |
utility_id_eia |
integer |
The EIA Utility Identification number. |
utility_id_pudl |
integer |
A manually assigned PUDL utility ID. May not be stable over time. |
utility_name_eia |
string |
The name of the utility. |
out_eia923__monthly_fuel_receipts_costs#
Noticed any irregularities with this table? File a report.
Monthly time series of fuel deliveries to power plants aggregated by plant, fuel type, and time period.
- Most-recent data:
2025
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
EIA Form 923 – Power Plant Operations Report (Schedule 2 - Part A)
- Primary key:
plant_id_eia, fuel_type_code_pudl, report_date
Usage Warnings#
Date column arbitrarily uses the first of the month.
Contains estimated values.
Some values have been redacted.
Time of fuel deliveries is not necessarily connected with time of fuel consumption.
Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.
Additional Details#
This table is an aggregation of the more detailed data in the core_eia923__fuel_receipts_costs table. It provides a tidy timeseries of deliveries by fuel type for each plant. However, not all values in the original table can be aggregated meaningfully, so this table contains only a subset of the source table columns – primarily numerical values and a handful of categorical variables, plus additional attributes that are constant within each plant-fuel-time period grouping and associated with the plant or utility. When aggregating numerical values any sum that contains an NA value is treated as NA.
This table inherits canonicalized values for plants and utilities. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia923__monthly_fuel_receipts_costs.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia923__monthly_fuel_receipts_costs.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia923__monthly_fuel_receipts_costs.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia923__monthly_fuel_receipts_costs.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
ash_content_pct |
number |
Ash content percentage by weight to the nearest 0.1 percent. |
chlorine_content_ppm |
number |
For coal only: the chlorine content in parts per million (ppm) to the nearest 0.001 ppm. If lab tests of the coal do not include the chlorine content, this field contains the amount specified in the contract with the supplier. |
data_maturity |
string |
Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity. |
fuel_consumed_mmbtu |
number |
Total consumption of fuel in physical unit, year to date. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production. |
fuel_cost_per_mmbtu |
number |
Average fuel cost per MMBTU of heat content in nominal USD. |
fuel_cost_per_mmbtu_source |
string |
Indicates the source of the values in the fuel_cost_per_mmbtu column. The fuel cost either comes directly from the EIA forms (original), was filled in from the EIA’s API using state-level averages (eiaapi), was filled in using a rolling average (rolling_avg) or When the records get aggregated together and contain multiple sources (mixed). |
fuel_mmbtu_per_unit |
number |
Heat content of the fuel in millions of Btus per physical unit. |
fuel_received_units |
number |
Quantity of fuel received in tons, barrel, or Mcf. |
fuel_type_code_pudl |
string |
Simplified fuel type code used in PUDL |
mercury_content_ppm |
number |
Mercury content in parts per million (ppm) to the nearest 0.001 ppm. |
moisture_content_pct |
number |
For coal only: the moisture content of the fuel in terms of moisture percentage by weight. Reported to the nearest 0.01 percent. |
plant_id_eia |
integer |
The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration. |
plant_id_pudl |
integer |
A manually assigned PUDL plant ID. May not be constant over time. |
plant_name_eia |
string |
Plant name. |
report_date |
date |
Date reported. |
state |
string |
Two letter US state abbreviation. |
sulfur_content_pct |
number |
Sulfur content percentage by weight to the nearest 0.01 percent. |
total_fuel_cost |
number |
Total annual reported fuel costs for the plant part. Includes costs from all fuels. |
utility_id_eia |
integer |
The EIA Utility Identification number. |
utility_id_pudl |
integer |
A manually assigned PUDL utility ID. May not be stable over time. |
utility_name_eia |
string |
The name of the utility. |
out_eia923__monthly_generation#
Noticed any irregularities with this table? File a report.
Monthly time series of net electricity generated by power plant generators.
- Most-recent data:
2025
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
EIA Form 923 – Power Plant Operations Report (Schedule 3)
- Primary key:
plant_id_eia, generator_id, report_date
Usage Warnings#
Date column arbitrarily uses the first of the month.
Not all IDs are present.
Table has known low coverage - either geographic or temporal or otherwise.
A small number of respondents only report annual fuel consumption, and all of it is reported in December.
Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.
Additional Details#
This is the most granular information we have about how much electricity individual
generators are producing, but only about half of all the generation reported in the
core_eia923__monthly_generation_fuel appears in this table due to the different reporting
requirements imposed on different types and sizes of generators.
Whenever possible, we use this generator-level net generation to estimate the heat rates of generation units and the marginal cost of electricity on a per-generator basis, but those calculations depend on plant-level fuel costs and sometimes uncertain or incomplete boiler-generator associations.
This table inherits canonicalized values for plants and utilities. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia923__monthly_generation.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia923__monthly_generation.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia923__monthly_generation.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia923__monthly_generation.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
data_maturity |
string |
Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity. |
generator_id |
string |
Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string! |
net_generation_mwh |
number |
Net electricity generation for the specified period in megawatt-hours (MWh). |
plant_id_eia |
integer |
The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration. |
plant_id_pudl |
integer |
A manually assigned PUDL plant ID. May not be constant over time. |
plant_name_eia |
string |
Plant name. |
report_date |
date |
Date reported. |
unit_id_pudl |
integer |
Dynamically assigned PUDL unit id. WARNING: This ID is not guaranteed to be static long term as the input data and algorithm may evolve over time. |
utility_id_eia |
integer |
The EIA Utility Identification number. |
utility_id_pudl |
integer |
A manually assigned PUDL utility ID. May not be stable over time. |
utility_name_eia |
string |
The name of the utility. |
out_eia923__monthly_generation_fuel_by_generator#
Noticed any irregularities with this table? File a report.
Monthly time series of of estimated net generation and fuel consumption by generator.
- Most-recent data:
2024
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
EIA Form 923 – Power Plant Operations Report (Schedule 3)
- Primary key:
report_date, plant_id_eia, generator_id
Usage Warnings#
Contains estimated values.
This downscaling process used to create this table does not distinguish between primary and secondary energy_sources for generators (see below for implications).
Date column arbitrarily uses the first of the month.
A small number of respondents only report annual fuel consumption, and all of it is reported in December.
Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.
Contains information from multiple raw inputs.
Additional Details#
Based on allocating net electricity generation and fuel consumption reported in the EIA-923 generation and generation_fuel tables to individual generators.
The net generation and fuel consumption allocation method PUDL employs begins with the following context of the originally reported EIA-860 and EIA-923 data:
The core_eia923__monthly_generation_fuel table is the authoritative source of information about how much generation and fuel consumption is attributable to an entire plant. This table has the most complete data coverage, but it is not the most granular data reported.
The core_eia923__monthly_generation table contains the most granular net generation data. It is reported at the
plant_id_eia,generator_idandreport_datelevel. This table includes only ~40% of the total MWhs reported in the core_eia923__monthly_generation_fuel table.The core_eia923__monthly_boiler_fuel table contains the most granular fuel consumption data. It is reported at the boiler/prime mover/energy source level. This table includes only ~40% of the total MMBTUs reported in the core_eia923__monthly_generation_fuel table.
The core_eia860__scd_generators table provides an exhaustive list of all generators whose generation is being reported in the core_eia923__monthly_generation_fuel table.
In this table, PUDL aggregates the net generation and fuel consumption that has been allocated to the generator_id/energy_source_code/prime_mover_code level in the:ref:out_eia923__monthly_generation_fuel_by_generator_energy_source to the generator level.This process does not distinguish between primary and secondary energy_sources for generators. Net generation is allocated equally between energy source codes, so if a plant has multiple generators with the same prime_mover_code but different energy source codes the core_eia923__monthly_generation_fuel records will be associated similarly between these two generators. Allocated net generation will still be proportional to each generator’s net generation or capacity.
This table inherits canonicalized values for generators, plants, and utilities. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia923__monthly_generation_fuel_by_generator.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia923__monthly_generation_fuel_by_generator.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia923__monthly_generation_fuel_by_generator.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia923__monthly_generation_fuel_by_generator.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
fuel_consumed_for_electricity_mmbtu |
number |
Total consumption of fuel to produce electricity, in physical unit, year to date. |
fuel_consumed_mmbtu |
number |
Total consumption of fuel in physical unit, year to date. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production. |
generator_id |
string |
Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string! |
net_generation_mwh |
number |
Net electricity generation for the specified period in megawatt-hours (MWh). |
plant_id_eia |
integer |
The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration. |
plant_id_pudl |
integer |
A manually assigned PUDL plant ID. May not be constant over time. |
plant_name_eia |
string |
Plant name. |
report_date |
date |
Date reported. |
unit_id_pudl |
integer |
Dynamically assigned PUDL unit id. WARNING: This ID is not guaranteed to be static long term as the input data and algorithm may evolve over time. |
utility_id_eia |
integer |
The EIA Utility Identification number. |
utility_id_pudl |
integer |
A manually assigned PUDL utility ID. May not be stable over time. |
utility_name_eia |
string |
The name of the utility. |
out_eia923__monthly_generation_fuel_by_generator_energy_source#
Noticed any irregularities with this table? File a report.
Monthly time series of of estimated net generation and fuel consumption associated with each combination of generator, energy source, and prime mover.
- Most-recent data:
2024
- Processing:
Data is ready for use in analyses, but for practical reasons has not been denormalized and remains in narrow format.
- Source:
EIA Form 923 – Power Plant Operations Report (Schedule 3)
- Primary key:
report_date, plant_id_eia, generator_id, prime_mover_code, energy_source_code
Usage Warnings#
Contains estimated values.
This downscaling process used to create this table does not distinguish between primary and secondary energy_sources for generators (see below for implications).
Date column arbitrarily uses the first of the month.
A small number of respondents only report annual fuel consumption, and all of it is reported in December.
Contains information from multiple raw inputs.
Additional Details#
The net generation and fuel consumption allocation method PUDL employs begins with the following context of the originally reported EIA-860 and EIA-923 data:
The core_eia923__monthly_generation_fuel table is the authoritative source of information about how much generation and fuel consumption is attributable to an entire plant. This table has the most complete data coverage, but it is not the most granular data reported.
The core_eia923__monthly_generation table contains the most granular net generation data. It is reported at the
plant_id_eia,generator_idandreport_datelevel. This table includes only ~40% of the total MWhs reported in the core_eia923__monthly_generation_fuel table.The core_eia923__monthly_boiler_fuel table contains the most granular fuel consumption data. It is reported at the boiler/prime mover/energy source level. This table includes only ~40% of the total MMBTUs reported in the core_eia923__monthly_generation_fuel table.
The core_eia860__scd_generators table provides an exhaustive list of all generators whose generation is being reported in the core_eia923__monthly_generation_fuel table.
In this table, PUDL has allocated the net electricity generation and fuel consumption from core_eia923__monthly_generation_fuel to the generator_id/energy_source_code/prime_mover_code level.
The allocation process entails generating a fraction for each record based on the net generation in the core_eia923__monthly_generation table and the capacity from the core_eia860__scd_generators table. This process does not distinguish between primary and secondary energy_sources for generators. Net generation is allocated equally between energy source codes, so if a plant has multiple generators with the same prime_mover_code but different energy source codes the core_eia923__monthly_generation_fuel records will be associated similarly between these two generators. Allocated net generation will still be proportional to each generator’s net generation or capacity.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia923__monthly_generation_fuel_by_generator_energy_source.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia923__monthly_generation_fuel_by_generator_energy_source.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia923__monthly_generation_fuel_by_generator_energy_source.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia923__monthly_generation_fuel_by_generator_energy_source.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
energy_source_code |
string |
A 2-3 letter code indicating the energy source (e.g. fuel type) associated with the record. |
energy_source_code_num |
string |
Name of the energy_source_code_N column that this energy source code was reported in for the generator referenced in the same record. |
fuel_consumed_for_electricity_mmbtu |
number |
Total consumption of fuel to produce electricity, in physical unit, year to date. |
fuel_consumed_mmbtu |
number |
Total consumption of fuel in physical unit, year to date. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production. |
generator_id |
string |
Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string! |
net_generation_mwh |
number |
Net electricity generation for the specified period in megawatt-hours (MWh). |
plant_id_eia |
integer |
The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration. |
prime_mover_code |
string |
Code for the type of prime mover (e.g. CT, CG) |
report_date |
date |
Date reported. |
out_eia923__monthly_generation_fuel_combined#
Noticed any irregularities with this table? File a report.
Monthly time series of generation and fuel consumption of all generation units.
- Most-recent data:
2025
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
EIA Form 923 – Power Plant Operations Report (Schedule 3)
- Primary key:
plant_id_eia, report_date, prime_mover_code, energy_source_code
Usage Warnings#
Date column arbitrarily uses the first of the month.
A small number of respondents only report annual fuel consumption, and all of it is reported in December.
Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.
Additional Details#
Denormalized, combined data from the core_eia923__monthly_generation_fuel and
core_eia923__monthly_generation_fuel_nuclear with nuclear generation aggregated from the nuclear
generation unit level up to the plant prime mover level, so as to be compatible with
fossil fuel generation data.
This data can’t be easily linked to individual boilers, generators, and
generation units, but it is provides the most complete coverage
of fuel consumption and electricity generation for the entire generation fleet. We use
the primary fuels and prime movers reported for each generator along with their
capacities to attribute fuel consumption and generation when it isn’t directly reported
in the core_eia923__monthly_generation and core_eia923__monthly_boiler_fuel tables in order to calculate
capacity factors, heat rates, and the marginal cost of electricity.
The table makes a distinction between all fuel consumed and fuel consumed for electricity generation because some units are also combined heat and power (CHP) units, and also provide high temperature process heat at the expense of net electricity generation.
This table inherits canonicalized values for plants and utilities. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia923__monthly_generation_fuel_combined.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia923__monthly_generation_fuel_combined.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia923__monthly_generation_fuel_combined.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia923__monthly_generation_fuel_combined.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
data_maturity |
string |
Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity. |
energy_source_code |
string |
A 2-3 letter code indicating the energy source (e.g. fuel type) associated with the record. |
fuel_consumed_for_electricity_mmbtu |
number |
Total consumption of fuel to produce electricity, in physical unit, year to date. |
fuel_consumed_for_electricity_units |
number |
Consumption for electric generation of the fuel type in physical unit. |
fuel_consumed_mmbtu |
number |
Total consumption of fuel in physical unit, year to date. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production. |
fuel_consumed_units |
number |
Consumption of the fuel type in physical unit. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production. |
fuel_mmbtu_per_unit |
number |
Heat content of the fuel in millions of Btus per physical unit. |
fuel_type_code_pudl |
string |
Simplified fuel type code used in PUDL |
net_generation_mwh |
number |
Net electricity generation for the specified period in megawatt-hours (MWh). |
plant_id_eia |
integer |
The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration. |
plant_id_pudl |
integer |
A manually assigned PUDL plant ID. May not be constant over time. |
plant_name_eia |
string |
Plant name. |
prime_mover_code |
string |
Code for the type of prime mover (e.g. CT, CG) |
report_date |
date |
Date reported. |
utility_id_eia |
integer |
The EIA Utility Identification number. |
utility_id_pudl |
integer |
A manually assigned PUDL utility ID. May not be stable over time. |
utility_name_eia |
string |
The name of the utility. |
out_eia923__yearly_boiler_fuel#
Noticed any irregularities with this table? File a report.
Annual time series of boiler fuel consumption and emissions.
- Most-recent data:
2024
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
EIA Form 923 – Power Plant Operations Report (Schedule 3)
- Primary key:
plant_id_eia, boiler_id, energy_source_code, prime_mover_code, report_date
Usage Warnings#
Date column arbitrarily uses the first of the month.
A small number of respondents only report annual fuel consumption, and all of it is reported in December.
Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.
Additional Details#
Reports the quantity of each type of fuel consumed by each boiler on a monthly basis, as well as the sulfur and ash content of those fuels. Fuel quantity is reported in standard EIA fuel units (tons, barrels, Mcf). Heat content per unit of fuel is also reported, making this table useful for calculating the thermal efficiency (heat rate) of various generation units.
This table provides better coverage of the entire fleet of generators than the
core_eia923__monthly_generation_fuel table, but the fuel consumption reported here is not directly
associated with a generator. This complicates the heat rate calculation, since the
associations between individual boilers and generators are incomplete and can be
complex.
This table inherits canonicalized values for plants and utilities. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia923__yearly_boiler_fuel.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia923__yearly_boiler_fuel.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia923__yearly_boiler_fuel.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia923__yearly_boiler_fuel.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
ash_content_pct |
number |
Ash content percentage by weight to the nearest 0.1 percent. |
boiler_id |
string |
Alphanumeric boiler ID. |
data_maturity |
string |
Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity. |
energy_source_code |
string |
A 2-3 letter code indicating the energy source (e.g. fuel type) associated with the record. |
fuel_consumed_mmbtu |
number |
Total consumption of fuel in physical unit, year to date. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production. |
fuel_consumed_units |
number |
Consumption of the fuel type in physical unit. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production. |
fuel_mmbtu_per_unit |
number |
Heat content of the fuel in millions of Btus per physical unit. |
fuel_type_code_pudl |
string |
Simplified fuel type code used in PUDL |
plant_id_eia |
integer |
The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration. |
plant_id_pudl |
integer |
A manually assigned PUDL plant ID. May not be constant over time. |
plant_name_eia |
string |
Plant name. |
prime_mover_code |
string |
Code for the type of prime mover (e.g. CT, CG) |
report_date |
date |
Date reported. |
sulfur_content_pct |
number |
Sulfur content percentage by weight to the nearest 0.01 percent. |
unit_id_pudl |
integer |
Dynamically assigned PUDL unit id. WARNING: This ID is not guaranteed to be static long term as the input data and algorithm may evolve over time. |
utility_id_eia |
integer |
The EIA Utility Identification number. |
utility_id_pudl |
integer |
A manually assigned PUDL utility ID. May not be stable over time. |
utility_name_eia |
string |
The name of the utility. |
out_eia923__yearly_fuel_receipts_costs#
Noticed any irregularities with this table? File a report.
Annual time series of fuel deliveries to power plants aggregated by plant, fuel type, and time period.
- Most-recent data:
2024
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
EIA Form 923 – Power Plant Operations Report (Schedule 2 - Part A)
- Primary key:
plant_id_eia, fuel_type_code_pudl, report_date
Usage Warnings#
Date column arbitrarily uses the first of the month.
Contains estimated values.
Some values have been redacted.
Time of fuel deliveries is not necessarily connected with time of fuel consumption.
Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.
Additional Details#
This table is an aggregation of the more detailed data in the core_eia923__fuel_receipts_costs table. It provides a tidy timeseries of deliveries by fuel type for each plant. However, not all values in the original table can be aggregated meaningfully, so this table contains only a subset of the source table columns – primarily numerical values and a handful of categorical variables, plus additional attributes that are constant within each plant-fuel-time period grouping and associated with the plant or utility. When aggregating numerical values any sum that contains an NA value is treated as NA.
This table inherits canonicalized values for plants and utilities. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia923__yearly_fuel_receipts_costs.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia923__yearly_fuel_receipts_costs.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia923__yearly_fuel_receipts_costs.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia923__yearly_fuel_receipts_costs.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
ash_content_pct |
number |
Ash content percentage by weight to the nearest 0.1 percent. |
chlorine_content_ppm |
number |
For coal only: the chlorine content in parts per million (ppm) to the nearest 0.001 ppm. If lab tests of the coal do not include the chlorine content, this field contains the amount specified in the contract with the supplier. |
data_maturity |
string |
Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity. |
fuel_consumed_mmbtu |
number |
Total consumption of fuel in physical unit, year to date. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production. |
fuel_cost_per_mmbtu |
number |
Average fuel cost per MMBTU of heat content in nominal USD. |
fuel_cost_per_mmbtu_source |
string |
Indicates the source of the values in the fuel_cost_per_mmbtu column. The fuel cost either comes directly from the EIA forms (original), was filled in from the EIA’s API using state-level averages (eiaapi), was filled in using a rolling average (rolling_avg) or When the records get aggregated together and contain multiple sources (mixed). |
fuel_mmbtu_per_unit |
number |
Heat content of the fuel in millions of Btus per physical unit. |
fuel_received_units |
number |
Quantity of fuel received in tons, barrel, or Mcf. |
fuel_type_code_pudl |
string |
Simplified fuel type code used in PUDL |
mercury_content_ppm |
number |
Mercury content in parts per million (ppm) to the nearest 0.001 ppm. |
moisture_content_pct |
number |
For coal only: the moisture content of the fuel in terms of moisture percentage by weight. Reported to the nearest 0.01 percent. |
plant_id_eia |
integer |
The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration. |
plant_id_pudl |
integer |
A manually assigned PUDL plant ID. May not be constant over time. |
plant_name_eia |
string |
Plant name. |
report_date |
date |
Date reported. |
state |
string |
Two letter US state abbreviation. |
sulfur_content_pct |
number |
Sulfur content percentage by weight to the nearest 0.01 percent. |
total_fuel_cost |
number |
Total annual reported fuel costs for the plant part. Includes costs from all fuels. |
utility_id_eia |
integer |
The EIA Utility Identification number. |
utility_id_pudl |
integer |
A manually assigned PUDL utility ID. May not be stable over time. |
utility_name_eia |
string |
The name of the utility. |
out_eia923__yearly_generation#
Noticed any irregularities with this table? File a report.
Annual time series of net electricity generated by power plant generators.
- Most-recent data:
2024
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
EIA Form 923 – Power Plant Operations Report (Schedule 3)
- Primary key:
plant_id_eia, generator_id, report_date
Usage Warnings#
Date column arbitrarily uses the first of the month.
Not all IDs are present.
Table has known low coverage - either geographic or temporal or otherwise.
A small number of respondents only report annual fuel consumption, and all of it is reported in December.
Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.
Additional Details#
This is the most granular information we have about how much electricity individual
generators are producing, but only about half of all the generation reported in the
core_eia923__monthly_generation_fuel appears in this table due to the different reporting
requirements imposed on different types and sizes of generators.
Whenever possible, we use this generator-level net generation to estimate the heat rates of generation units and the marginal cost of electricity on a per-generator basis, but those calculations depend on plant-level fuel costs and sometimes uncertain or incomplete boiler-generator associations.
This table inherits canonicalized values for plants and utilities. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia923__yearly_generation.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia923__yearly_generation.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia923__yearly_generation.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia923__yearly_generation.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
data_maturity |
string |
Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity. |
generator_id |
string |
Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string! |
net_generation_mwh |
number |
Net electricity generation for the specified period in megawatt-hours (MWh). |
plant_id_eia |
integer |
The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration. |
plant_id_pudl |
integer |
A manually assigned PUDL plant ID. May not be constant over time. |
plant_name_eia |
string |
Plant name. |
report_date |
date |
Date reported. |
unit_id_pudl |
integer |
Dynamically assigned PUDL unit id. WARNING: This ID is not guaranteed to be static long term as the input data and algorithm may evolve over time. |
utility_id_eia |
integer |
The EIA Utility Identification number. |
utility_id_pudl |
integer |
A manually assigned PUDL utility ID. May not be stable over time. |
utility_name_eia |
string |
The name of the utility. |
out_eia923__yearly_generation_fuel_by_generator#
Noticed any irregularities with this table? File a report.
Annual time series of of estimated net generation and fuel consumption by generator.
- Most-recent data:
2024
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
EIA Form 923 – Power Plant Operations Report (Schedule 3)
- Primary key:
report_date, plant_id_eia, generator_id
Usage Warnings#
Contains estimated values.
This downscaling process used to create this table does not distinguish between primary and secondary energy_sources for generators (see below for implications).
Date column arbitrarily uses the first of the month.
A small number of respondents only report annual fuel consumption, and all of it is reported in December.
Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.
Contains information from multiple raw inputs.
Additional Details#
Based on allocating net electricity generation and fuel consumption reported in the EIA-923 generation and generation_fuel tables to individual generators.
The net generation and fuel consumption allocation method PUDL employs begins with the following context of the originally reported EIA-860 and EIA-923 data:
The core_eia923__monthly_generation_fuel table is the authoritative source of information about how much generation and fuel consumption is attributable to an entire plant. This table has the most complete data coverage, but it is not the most granular data reported.
The core_eia923__monthly_generation table contains the most granular net generation data. It is reported at the
plant_id_eia,generator_idandreport_datelevel. This table includes only ~40% of the total MWhs reported in the core_eia923__monthly_generation_fuel table.The core_eia923__monthly_boiler_fuel table contains the most granular fuel consumption data. It is reported at the boiler/prime mover/energy source level. This table includes only ~40% of the total MMBTUs reported in the core_eia923__monthly_generation_fuel table.
The core_eia860__scd_generators table provides an exhaustive list of all generators whose generation is being reported in the core_eia923__monthly_generation_fuel table.
In this table, PUDL aggregates the net generation and fuel consumption that has been allocated to the generator_id/energy_source_code/prime_mover_code level in the:ref:out_eia923__yearly_generation_fuel_by_generator_energy_source to the generator level.This process does not distinguish between primary and secondary energy_sources for generators. Net generation is allocated equally between energy source codes, so if a plant has multiple generators with the same prime_mover_code but different energy source codes the core_eia923__monthly_generation_fuel records will be associated similarly between these two generators. Allocated net generation will still be proportional to each generator’s net generation or capacity.
This table inherits canonicalized values for generators, plants, and utilities. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia923__yearly_generation_fuel_by_generator.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia923__yearly_generation_fuel_by_generator.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia923__yearly_generation_fuel_by_generator.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia923__yearly_generation_fuel_by_generator.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
fuel_consumed_for_electricity_mmbtu |
number |
Total consumption of fuel to produce electricity, in physical unit, year to date. |
fuel_consumed_mmbtu |
number |
Total consumption of fuel in physical unit, year to date. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production. |
generator_id |
string |
Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string! |
net_generation_mwh |
number |
Net electricity generation for the specified period in megawatt-hours (MWh). |
plant_id_eia |
integer |
The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration. |
plant_id_pudl |
integer |
A manually assigned PUDL plant ID. May not be constant over time. |
plant_name_eia |
string |
Plant name. |
report_date |
date |
Date reported. |
unit_id_pudl |
integer |
Dynamically assigned PUDL unit id. WARNING: This ID is not guaranteed to be static long term as the input data and algorithm may evolve over time. |
utility_id_eia |
integer |
The EIA Utility Identification number. |
utility_id_pudl |
integer |
A manually assigned PUDL utility ID. May not be stable over time. |
utility_name_eia |
string |
The name of the utility. |
out_eia923__yearly_generation_fuel_by_generator_energy_source#
Noticed any irregularities with this table? File a report.
Annual time series of of estimated net generation and fuel consumption associated with each combination of generator, energy source, and prime mover.
- Most-recent data:
2024
- Processing:
Data is ready for use in analyses, but for practical reasons has not been denormalized and remains in narrow format.
- Source:
EIA Form 923 – Power Plant Operations Report (Schedule 3)
- Primary key:
report_date, plant_id_eia, generator_id, prime_mover_code, energy_source_code
Usage Warnings#
Contains estimated values.
This downscaling process used to create this table does not distinguish between primary and secondary energy_sources for generators (see below for implications).
Date column arbitrarily uses the first of the month.
A small number of respondents only report annual fuel consumption, and all of it is reported in December.
Contains information from multiple raw inputs.
Additional Details#
The net generation and fuel consumption allocation method PUDL employs begins with the following context of the originally reported EIA-860 and EIA-923 data:
The core_eia923__monthly_generation_fuel table is the authoritative source of information about how much generation and fuel consumption is attributable to an entire plant. This table has the most complete data coverage, but it is not the most granular data reported.
The core_eia923__monthly_generation table contains the most granular net generation data. It is reported at the
plant_id_eia,generator_idandreport_datelevel. This table includes only ~40% of the total MWhs reported in the core_eia923__monthly_generation_fuel table.The core_eia923__monthly_boiler_fuel table contains the most granular fuel consumption data. It is reported at the boiler/prime mover/energy source level. This table includes only ~40% of the total MMBTUs reported in the core_eia923__monthly_generation_fuel table.
The core_eia860__scd_generators table provides an exhaustive list of all generators whose generation is being reported in the core_eia923__monthly_generation_fuel table.
In this table, PUDL has allocated the net electricity generation and fuel consumption from core_eia923__monthly_generation_fuel to the generator_id/energy_source_code/prime_mover_code level.
The allocation process entails generating a fraction for each record based on the net generation in the core_eia923__monthly_generation table and the capacity from the core_eia860__scd_generators table. This process does not distinguish between primary and secondary energy_sources for generators. Net generation is allocated equally between energy source codes, so if a plant has multiple generators with the same prime_mover_code but different energy source codes the core_eia923__monthly_generation_fuel records will be associated similarly between these two generators. Allocated net generation will still be proportional to each generator’s net generation or capacity.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia923__yearly_generation_fuel_by_generator_energy_source.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia923__yearly_generation_fuel_by_generator_energy_source.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia923__yearly_generation_fuel_by_generator_energy_source.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia923__yearly_generation_fuel_by_generator_energy_source.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
energy_source_code |
string |
A 2-3 letter code indicating the energy source (e.g. fuel type) associated with the record. |
energy_source_code_num |
string |
Name of the energy_source_code_N column that this energy source code was reported in for the generator referenced in the same record. |
fuel_consumed_for_electricity_mmbtu |
number |
Total consumption of fuel to produce electricity, in physical unit, year to date. |
fuel_consumed_mmbtu |
number |
Total consumption of fuel in physical unit, year to date. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production. |
generator_id |
string |
Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string! |
net_generation_mwh |
number |
Net electricity generation for the specified period in megawatt-hours (MWh). |
plant_id_eia |
integer |
The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration. |
prime_mover_code |
string |
Code for the type of prime mover (e.g. CT, CG) |
report_date |
date |
Date reported. |
out_eia923__yearly_generation_fuel_by_generator_energy_source_owner#
Noticed any irregularities with this table? File a report.
Annual time series of of estimated net generation and fuel consumption for each generator, associated with each combination of generator, energy source, prime mover, and owner.
- Most-recent data:
2024
- Processing:
Data is ready for use in analyses, but for practical reasons has not been denormalized and remains in narrow format.
- Source:
EIA Form 923 – Power Plant Operations Report (Schedule 3)
- Primary key:
report_date, plant_id_eia, generator_id, prime_mover_code, energy_source_code, utility_id_eia, ownership_record_type
Usage Warnings#
Contains estimated values.
This downscaling process used to create this table does not distinguish between primary and secondary energy_sources for generators (see below for implications).
Date column arbitrarily uses the first of the month.
Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.
Contains information from multiple raw inputs.
Additional Details#
The net generation and fuel consumption allocation method PUDL employs begins with the following context of the originally reported EIA-860 and EIA-923 data:
The core_eia923__monthly_generation_fuel table is the authoritative source of information about how much generation and fuel consumption is attributable to an entire plant. This table has the most complete data coverage, but it is not the most granular data reported.
The core_eia923__monthly_generation table contains the most granular net generation data. It is reported at the
plant_id_eia,generator_idandreport_datelevel. This table includes only ~40% of the total MWhs reported in the core_eia923__monthly_generation_fuel table.The core_eia923__monthly_boiler_fuel table contains the most granular fuel consumption data. It is reported at the boiler/prime mover/energy source level. This table includes only ~40% of the total MMBTUs reported in the core_eia923__monthly_generation_fuel table.
The core_eia860__scd_generators table provides an exhaustive list of all generators whose generation is being reported in the core_eia923__monthly_generation_fuel table.
In this table, PUDL has scaled the net electricity generation and fuel consumption from core_eia923__monthly_generation_fuel by ownership from out_eia860__yearly_ownership. Note that the utility_id_eia in this table refers to the OWNER of the generator, not the operator.
This process does not distinguish between primary and secondary energy_sources for generators. Net generation is allocated equally between energy source codes, so if a plant has multiple generators with the same prime_mover_code but different energy source codes the core_eia923__monthly_generation_fuel records will be associated similarly between these two generators. Allocated net generation will still be proportional to each generator’s net generation or capacity.
This table inherits canonicalized values for generators and plants. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia923__yearly_generation_fuel_by_generator_energy_source_owner.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia923__yearly_generation_fuel_by_generator_energy_source_owner.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia923__yearly_generation_fuel_by_generator_energy_source_owner.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia923__yearly_generation_fuel_by_generator_energy_source_owner.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
capacity_mw |
number |
Total installed (nameplate) capacity, in megawatts. |
energy_source_code |
string |
A 2-3 letter code indicating the energy source (e.g. fuel type) associated with the record. |
energy_source_code_num |
string |
Name of the energy_source_code_N column that this energy source code was reported in for the generator referenced in the same record. |
fraction_owned |
number |
Proportion of generator ownership attributable to this utility. |
fuel_consumed_for_electricity_mmbtu |
number |
Total consumption of fuel to produce electricity, in physical unit, year to date. |
fuel_consumed_mmbtu |
number |
Total consumption of fuel in physical unit, year to date. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production. |
generator_id |
string |
Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string! |
net_generation_mwh |
number |
Net electricity generation for the specified period in megawatt-hours (MWh). |
ownership_record_type |
string |
Whether each generator record is for one owner or represents a total of all ownerships. |
plant_id_eia |
integer |
The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration. |
prime_mover_code |
string |
Code for the type of prime mover (e.g. CT, CG) |
report_date |
date |
Date reported. |
utility_id_eia |
integer |
The EIA Utility Identification number. |
out_eia923__yearly_generation_fuel_combined#
Noticed any irregularities with this table? File a report.
Annual time series of generation and fuel consumption of all generation units.
- Most-recent data:
2024
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
EIA Form 923 – Power Plant Operations Report (Schedule 3)
- Primary key:
plant_id_eia, report_date, prime_mover_code, energy_source_code
Usage Warnings#
Date column arbitrarily uses the first of the month.
A small number of respondents only report annual fuel consumption, and all of it is reported in December.
Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.
Additional Details#
Denormalized, combined data from the core_eia923__monthly_generation_fuel and
core_eia923__monthly_generation_fuel_nuclear with nuclear generation aggregated from the nuclear
generation unit level up to the plant prime mover level, so as to be compatible with
fossil fuel generation data.
This data can’t be easily linked to individual boilers, generators, and
generation units, but it is provides the most complete coverage
of fuel consumption and electricity generation for the entire generation fleet. We use
the primary fuels and prime movers reported for each generator along with their
capacities to attribute fuel consumption and generation when it isn’t directly reported
in the core_eia923__monthly_generation and core_eia923__monthly_boiler_fuel tables in order to calculate
capacity factors, heat rates, and the marginal cost of electricity.
The table makes a distinction between all fuel consumed and fuel consumed for electricity generation because some units are also combined heat and power (CHP) units, and also provide high temperature process heat at the expense of net electricity generation.
This table inherits canonicalized values for plants and utilities. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia923__yearly_generation_fuel_combined.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia923__yearly_generation_fuel_combined.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia923__yearly_generation_fuel_combined.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia923__yearly_generation_fuel_combined.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
data_maturity |
string |
Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity. |
energy_source_code |
string |
A 2-3 letter code indicating the energy source (e.g. fuel type) associated with the record. |
fuel_consumed_for_electricity_mmbtu |
number |
Total consumption of fuel to produce electricity, in physical unit, year to date. |
fuel_consumed_for_electricity_units |
number |
Consumption for electric generation of the fuel type in physical unit. |
fuel_consumed_mmbtu |
number |
Total consumption of fuel in physical unit, year to date. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production. |
fuel_consumed_units |
number |
Consumption of the fuel type in physical unit. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production. |
fuel_mmbtu_per_unit |
number |
Heat content of the fuel in millions of Btus per physical unit. |
fuel_type_code_pudl |
string |
Simplified fuel type code used in PUDL |
net_generation_mwh |
number |
Net electricity generation for the specified period in megawatt-hours (MWh). |
plant_id_eia |
integer |
The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration. |
plant_id_pudl |
integer |
A manually assigned PUDL plant ID. May not be constant over time. |
plant_name_eia |
string |
Plant name. |
prime_mover_code |
string |
Code for the type of prime mover (e.g. CT, CG) |
report_date |
date |
Date reported. |
utility_id_eia |
integer |
The EIA Utility Identification number. |
utility_id_pudl |
integer |
A manually assigned PUDL utility ID. May not be stable over time. |
utility_name_eia |
string |
The name of the utility. |
out_eia930__hourly_aggregated_demand#
Noticed any irregularities with this table? File a report.
Hourly time series of aggregated balancing authority demand by EIA region, interconnect, and continental US.
- Most-recent data:
2026
- Processing:
Data is ready for use in analyses, but for practical reasons has not been denormalized and remains in narrow format.
- Source:
EIA Form 930 – Hourly and Daily Balancing Authority Operations Report
- Primary key:
datetime_utc, aggregation_level, aggregation_group
Usage Warnings#
Some columns contain subtotals; use caution when choosing columns to aggregate.
This table is experimental and/or a work in progress and may change in the future.
Additional Details#
The spatial granularity of each record is indicated by aggregation_level.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia930__hourly_aggregated_demand.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia930__hourly_aggregated_demand.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia930__hourly_aggregated_demand.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia930__hourly_aggregated_demand.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
aggregation_group |
string |
Label identifying a group of balancing authorities to be used in aggregating demand E.g. a region of the US or a whole interconnect. |
aggregation_level |
string |
Indicates the spacial granularity of aggregated value. |
datetime_utc |
datetime |
Date and time converted to Coordinated Universal Time (UTC). |
demand_imputed_pudl_mwh |
number |
Electricity demand calculated by subtracting BA interchange from net generation, with outliers and missing values imputed in PUDL. |
out_eia930__hourly_operations#
Noticed any irregularities with this table? File a report.
Hourly time series of balancing authority net generation, interchange, and demand with imputed demand.
- Most-recent data:
2026
- Processing:
Data is ready for use in analyses, but for practical reasons has not been denormalized and remains in narrow format.
- Source:
EIA Form 930 – Hourly and Daily Balancing Authority Operations Report
- Primary key:
datetime_utc, balancing_authority_code_eia
Usage Warnings#
Contains rows where missing values were imputed.
This table is experimental and/or a work in progress and may change in the future.
Additional Details#
This table is based on core_eia930__hourly_operations,
but adds imputed demand where the original data was missing or anomalous. Codes
explaining why values have been imputed can be found in the
core_pudl__codes_imputation_reasons table.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia930__hourly_operations.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia930__hourly_operations.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia930__hourly_operations.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia930__hourly_operations.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
balancing_authority_code_eia |
string |
EIA short code identifying a balancing authority. May include Canadian and Mexican BAs. |
datetime_utc |
datetime |
Date and time converted to Coordinated Universal Time (UTC). |
demand_adjusted_mwh |
number |
Electricity demand adjusted by EIA to reflect non-physical commercial transfers through pseudo-ties and dynamic scheduling. |
demand_forecast_mwh |
number |
Day ahead demand forecast. |
demand_imputed_eia_mwh |
number |
Electricity demand calculated by subtracting BA interchange from net generation, with outliers and missing values imputed by EIA. |
demand_imputed_pudl_mwh |
number |
Electricity demand calculated by subtracting BA interchange from net generation, with outliers and missing values imputed in PUDL. |
demand_imputed_pudl_mwh_imputation_code |
string |
Code describing why a demand value was flagged for imputation. |
demand_reported_mwh |
number |
Originally reported electricity demand, calculated by taking the net generation within the BA and subtracting the interchange with adjacent BAs. |
interchange_adjusted_mwh |
number |
Energy interchange between adjacent balancing authorities, adjusted by EIA to reflect non-physical commercial transfers through pseudo-ties and dynamic scheduling. |
interchange_imputed_eia_mwh |
number |
Energy interchange between adjacent balancing authorities, with outliers and missing values imputed by EIA. |
interchange_reported_mwh |
number |
Original reported energy interchange between adjacent balancing authorities. |
net_generation_adjusted_mwh |
number |
Reported net generation adjusted by EIA to reflect non-physical commercial transfers through pseudo-ties and dynamic scheduling. |
net_generation_imputed_eia_mwh |
number |
Reported net generation with outlying values removed and missing values imputed by EIA. |
net_generation_reported_mwh |
number |
Unaltered originally reported net generation for the specified period. |
out_eia930__hourly_subregion_demand#
Noticed any irregularities with this table? File a report.
Hourly time series of balancing authority subregion demand with imputed demand.
- Most-recent data:
2026
- Processing:
Data is ready for use in analyses, but for practical reasons has not been denormalized and remains in narrow format.
- Source:
EIA Form 930 – Hourly and Daily Balancing Authority Operations Report
- Primary key:
datetime_utc, balancing_authority_code_eia, balancing_authority_subregion_code_eia
Usage Warnings#
Contains rows where missing values were imputed.
This table is experimental and/or a work in progress and may change in the future.
Additional Details#
This table is based on core_eia930__hourly_subregion_demand, but adds imputed demand where the original data was missing or anomalous. Codes explaining why values have been imputed can be found in the core_pudl__codes_imputation_reasons table.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia930__hourly_subregion_demand.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia930__hourly_subregion_demand.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia930__hourly_subregion_demand.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia930__hourly_subregion_demand.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
balancing_authority_code_eia |
string |
EIA short code identifying a balancing authority. May include Canadian and Mexican BAs. |
balancing_authority_subregion_code_eia |
string |
Code identifying subregions of larger balancing authorities. |
datetime_utc |
datetime |
Date and time converted to Coordinated Universal Time (UTC). |
demand_imputed_pudl_mwh |
number |
Electricity demand calculated by subtracting BA interchange from net generation, with outliers and missing values imputed in PUDL. |
demand_imputed_pudl_mwh_imputation_code |
string |
Code describing why a demand value was flagged for imputation. |
demand_reported_mwh |
number |
Originally reported electricity demand, calculated by taking the net generation within the BA and subtracting the interchange with adjacent BAs. |
out_eia__monthly_generators#
Noticed any irregularities with this table? File a report.
Monthly time series of all generator attributes including calculated capacity factor, heat rate, fuel cost per MMBTU and fuel cost per MWh.
- Most-recent data:
2026
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
EIA – Mix of multiple EIA Forms
- Primary key:
report_date, plant_id_eia, generator_id
Usage Warnings#
Contains estimated values.
Due to coverage problems in other tables, some generators have heat rate estimates but not fuel cost estimates.
Not all columns are originally reported in or calculable from the input tables. Expect nulls.
Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.
Contains information from multiple raw inputs.
Additional Details#
This table includes all monthly attributes for all generators reported to EIA-860 and EIA-923.
To provide a complete picture of generator data, this table compiles data from many different EIA tables, including directly reported, imputed and calculated columns. We suggest using this table if you want to explore monthly attributes about generators and would rather use a more complete and denormalized table. If you are more interested in the originally reported values, we recommend searching for core_eia with the column you are most interested in.
The calculations of capacity factor, heat rate, fuel cost per MMBTU and fuel cost per MWh are based on the allocation of net generation reported on the basis of plant, prime mover and energy source to individual generators - which comes from this table: out_eia923__monthly_generation_fuel_by_generator.
Heat rates by generator-monthly are estimated by using allocated estimates for per-generator net generation and fuel consumption as well as the core_eia923__monthly_boiler_fuel table, which reports fuel consumed by boiler. Heat rates are necessary to estimate the amount of fuel consumed by a generation unit, and thus the fuel cost per MWh generated.
Plant specific fuel prices are taken from the core_eia923__fuel_receipts_costs table, which only has ~70% coverage, leading to some generators with heat rate estimates still lacking fuel cost estimates.
This table inherits canonicalized values for generators, plants, and utilities. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia__monthly_generators.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia__monthly_generators.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia__monthly_generators.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia__monthly_generators.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
associated_combined_heat_power |
boolean |
Indicates whether the generator is associated with a combined heat and power system |
balancing_authority_code_eia |
string |
EIA short code identifying a balancing authority. May include Canadian and Mexican BAs. |
balancing_authority_name_eia |
string |
Name of the balancing authority. |
bga_source |
string |
The source from where the unit_id_pudl is compiled. The unit_id_pudl comes directly from EIA 860, or string association (which looks at all the boilers and generators that are not associated with a unit and tries to find a matching string in the respective collection of boilers or generator), or from a unit connection (where the unit_id_eia is employed to find additional boiler generator connections). |
bypass_heat_recovery |
boolean |
Can this generator operate while bypassing the heat recovery steam generator? |
can_burn_multiple_fuels |
boolean |
Whether the generator can burn multiple fuels. |
can_cofire_fuels |
boolean |
Whether the generator can co-fire fuels. |
can_switch_oil_gas |
boolean |
Whether the generator can switch between oil and natural gas. |
can_switch_when_operating |
boolean |
Indicates whether a fuel switching generator can switch fuels while operating. |
capacity_factor |
number |
Fraction of potential generation that was actually reported for a plant part. Energy generated over time period / nameplate capacity * time period (hours/years/etc.). |
capacity_mw |
number |
Total installed (nameplate) capacity, in megawatts. |
carbon_capture |
boolean |
Indicates whether the generator uses carbon capture technology. |
city |
string |
Name of the city. |
county |
string |
County name. |
current_planned_generator_operating_date |
date |
The most recently updated effective date on which the generator is scheduled to start operation |
data_maturity |
string |
Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity. |
deliver_power_transgrid |
boolean |
Indicate whether the generator can deliver power to the transmission grid. |
distributed_generation |
boolean |
Whether the generator is considered distributed generation |
duct_burners |
boolean |
Indicates whether the unit has duct-burners for supplementary firing of the turbine exhaust gas |
energy_source_1_transport_1 |
string |
Primary mode of transport for energy source 1. |
energy_source_1_transport_2 |
string |
Secondary mode of transport for energy source 1. |
energy_source_1_transport_3 |
string |
Tertiary mode of transport for energy source 1. |
energy_source_2_transport_1 |
string |
Primary mode of transport for energy source 2. |
energy_source_2_transport_2 |
string |
Secondary mode of transport for energy source 2. |
energy_source_2_transport_3 |
string |
Tertiary mode of transport for energy source 2. |
energy_source_code_1 |
string |
The code representing the most predominant type of energy that fuels the generator. |
energy_source_code_2 |
string |
The code representing the second most predominant type of energy that fuels the generator |
energy_source_code_3 |
string |
The code representing the third most predominant type of energy that fuels the generator |
energy_source_code_4 |
string |
The code representing the fourth most predominant type of energy that fuels the generator |
energy_source_code_5 |
string |
The code representing the fifth most predominant type of energy that fuels the generator |
energy_source_code_6 |
string |
The code representing the sixth most predominant type of energy that fuels the generator |
energy_storage_capacity_mwh |
number |
Energy storage capacity in MWh (e.g. for batteries). |
ferc_qualifying_facility |
boolean |
Indicates whether or not a generator is a qualifying FERC cogeneration facility. |
fluidized_bed_tech |
boolean |
Indicates whether the generator uses fluidized bed technology |
fuel_cost_per_mmbtu |
number |
Average fuel cost per MMBTU of heat content in nominal USD. |
fuel_cost_per_mmbtu_source |
string |
Indicates the source of the values in the fuel_cost_per_mmbtu column. The fuel cost either comes directly from the EIA forms (original), was filled in from the EIA’s API using state-level averages (eiaapi), was filled in using a rolling average (rolling_avg) or When the records get aggregated together and contain multiple sources (mixed). |
fuel_cost_per_mwh |
number |
Derived from MCOE, a unit level value. Average fuel cost per MWh of heat content in nominal USD. |
fuel_type_code_pudl |
string |
Simplified fuel type code used in PUDL |
fuel_type_count |
integer |
A count of how many different simple energy sources there are associated with a generator. |
generator_id |
string |
Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string! |
generator_operating_date |
date |
Date the generator began commercial operation. If harvested values are inconsistent, we default to using the most recently reported date. |
generator_retirement_date |
date |
Date of the scheduled or effected retirement of the generator. |
latitude |
number |
Latitude of the plant’s location, in degrees. |
longitude |
number |
Longitude of the plant’s location, in degrees. |
minimum_load_mw |
number |
The minimum load at which the generator can operate at continuosuly. |
nameplate_power_factor |
number |
The nameplate power factor of the generator. |
net_capacity_mwdc |
number |
Generation capacity in megawatts of direct current that is subject to a net metering agreement. Typically used for behind-the-meter solar PV. |
net_generation_mwh |
number |
Net electricity generation for the specified period in megawatt-hours (MWh). |
operational_status |
string |
The operating status of the asset. For generators this is based on which tab the generator was listed in in EIA 860. |
operational_status_code |
string |
The operating status of the asset. |
original_planned_generator_operating_date |
date |
The date the generator was originally scheduled to be operational |
other_combustion_tech |
boolean |
Indicates whether the generator uses other combustion technologies |
other_modifications_date |
date |
Planned effective date that the generator is scheduled to enter commercial operation after any other planned modification is complete. |
other_planned_modifications |
boolean |
Indicates whether there are there other modifications planned for the generator. |
owned_by_non_utility |
boolean |
Whether any part of generator is owned by a nonutilty |
ownership_code |
string |
Identifies the ownership for each generator. |
planned_derate_date |
date |
Planned effective month that the generator is scheduled to enter operation after the derate modification. |
planned_energy_source_code_1 |
string |
New energy source code for the planned repowered generator. |
planned_generator_retirement_date |
date |
Planned effective date of the scheduled retirement of the generator. |
planned_modifications |
boolean |
Indicates whether there are any planned capacity uprates/derates, repowering, other modifications, or generator retirements scheduled for the next 5 years. |
planned_net_summer_capacity_derate_mw |
number |
Decrease in summer capacity expected to be realized from the derate modification to the equipment. |
planned_net_summer_capacity_uprate_mw |
number |
Increase in summer capacity expected to be realized from the modification to the equipment. |
planned_net_winter_capacity_derate_mw |
number |
Decrease in winter capacity expected to be realized from the derate modification to the equipment. |
planned_net_winter_capacity_uprate_mw |
number |
Increase in winter capacity expected to be realized from the uprate modification to the equipment. |
planned_new_capacity_mw |
number |
The expected new namplate capacity for the generator. |
planned_new_prime_mover_code |
string |
New prime mover for the planned repowered generator. |
planned_repower_date |
date |
Planned effective date that the generator is scheduled to enter operation after the repowering is complete. |
planned_uprate_date |
date |
Planned effective date that the generator is scheduled to enter operation after the uprate modification. |
plant_id_eia |
integer |
The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration. |
plant_id_pudl |
integer |
A manually assigned PUDL plant ID. May not be constant over time. |
plant_name_eia |
string |
Plant name. |
previously_canceled |
boolean |
Indicates whether the generator was previously reported as indefinitely postponed or canceled |
prime_mover_code |
string |
Code for the type of prime mover (e.g. CT, CG) |
pulverized_coal_tech |
boolean |
Indicates whether the generator uses pulverized coal technology |
reactive_power_output_mvar |
number |
Reactive Power Output (MVAr) |
report_date |
date |
Date reported. |
rto_iso_lmp_node_id |
string |
The designation used to identify the price node in RTO/ISO Locational Marginal Price reports |
rto_iso_location_wholesale_reporting_id |
string |
The designation used to report the specific location of the wholesale sales transactions to FERC for the Electric Quarterly Report |
solid_fuel_gasification |
boolean |
Indicates whether the generator is part of a solid fuel gasification system |
startup_source_code_1 |
string |
The code representing the first, second, third or fourth start-up and flame stabilization energy source used by the combustion unit(s) associated with this generator. |
startup_source_code_2 |
string |
The code representing the first, second, third or fourth start-up and flame stabilization energy source used by the combustion unit(s) associated with this generator. |
startup_source_code_3 |
string |
The code representing the first, second, third or fourth start-up and flame stabilization energy source used by the combustion unit(s) associated with this generator. |
startup_source_code_4 |
string |
The code representing the first, second, third or fourth start-up and flame stabilization energy source used by the combustion unit(s) associated with this generator. |
state |
string |
Two letter US state abbreviation. |
stoker_tech |
boolean |
Indicates whether the generator uses stoker technology |
street_address |
string |
Physical street address. |
subcritical_tech |
boolean |
Indicates whether the generator uses subcritical technology |
summer_capacity_estimate |
boolean |
Whether the summer capacity value was an estimate |
summer_capacity_mw |
number |
The net summer capacity. |
summer_estimated_capability_mw |
number |
EIA estimated summer capacity (in MWh). |
supercritical_tech |
boolean |
Indicates whether the generator uses supercritical technology |
synchronized_transmission_grid |
boolean |
Indicates whether standby generators (SB status) can be synchronized to the grid. |
technology_description |
string |
High level description of the technology used by the generator to produce electricity. |
time_cold_shutdown_full_load_code |
string |
The minimum amount of time required to bring the unit to full load from shutdown. |
timezone |
string |
IANA timezone name |
topping_bottoming_code |
string |
If the generator is associated with a combined heat and power system, indicates whether the generator is part of a topping cycle or a bottoming cycle |
total_fuel_cost |
number |
Total annual reported fuel costs for the plant part. Includes costs from all fuels. |
total_mmbtu |
number |
Total annual heat content of fuel consumed by a plant part record in the plant parts list. |
turbines_inverters_hydrokinetics |
integer |
Number of wind turbines, or hydrokinetic buoys. |
turbines_num |
integer |
Number of wind turbines, or hydrokinetic buoys. |
ultrasupercritical_tech |
boolean |
Indicates whether the generator uses ultra-supercritical technology |
unit_heat_rate_mmbtu_per_mwh |
number |
Fuel content per unit of electricity generated. Coming from MCOE calculation. |
unit_id_pudl |
integer |
Dynamically assigned PUDL unit id. WARNING: This ID is not guaranteed to be static long term as the input data and algorithm may evolve over time. |
uprate_derate_completed_date |
date |
The date when the uprate or derate was completed. |
uprate_derate_during_year |
boolean |
Was an uprate or derate completed on this generator during the reporting year? |
utility_id_eia |
integer |
The EIA Utility Identification number. |
utility_id_pudl |
integer |
A manually assigned PUDL utility ID. May not be stable over time. |
utility_name_eia |
string |
The name of the utility. |
winter_capacity_estimate |
boolean |
Whether the winter capacity value was an estimate |
winter_capacity_mw |
number |
The net winter capacity. |
winter_estimated_capability_mw |
number |
EIA estimated winter capacity (in MWh). |
zip_code |
string |
Five digit US Zip Code. |
out_eia__yearly_assn_plant_parts_plant_gen#
Noticed any irregularities with this table? File a report.
Annual Association table providing connections between plant parts (identified as record_id_eia) and ‘plant_gen’ records (identified as record_id_eia_plant_gen) from the out_eia__yearly_plant_parts table.
- Most-recent data:
2026
- Processing:
Data is ready for use in analyses, but for practical reasons has not been denormalized and remains in narrow format.
- Source:
EIA – Mix of multiple EIA Forms
- Primary key:
record_id_eia, record_id_eia_plant_gen
Usage Warnings#
Some columns contain subtotals; use caution when choosing columns to aggregate.
Contains information from multiple raw inputs.
Additional Details#
The purpose of this table is to help determine what generator records are associated
with every plant part record. The plant part records have a one to many relationship
to the associated ‘plant_gen’ records. The ‘plant_gen’ records are also included in
the plant part records, so for these records there is a one to one relationship and
the record_id_eia and the record_id_eia_plant_gen are the same.
All of the columns in this table that have a suffix of ‘_plant_gen’ are attributes of
the record_id_eia_plant_gen record.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia__yearly_assn_plant_parts_plant_gen.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia__yearly_assn_plant_parts_plant_gen.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia__yearly_assn_plant_parts_plant_gen.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia__yearly_assn_plant_parts_plant_gen.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
energy_source_code_1_plant_gen |
string |
Code representing the most predominant type of energy that fuels the record_id_eia_plant_gen’s generator. |
ferc1_generator_agg_id_plant_gen |
integer |
ID dynamically assigned by PUDL to EIA records with multiple matches to a single FERC ID in the FERC-EIA manual matching process. This ID is associated with the record_id_eia_plant_gen record. It depends on ferc1_generator_agg_id, which has not been updated since 2020, but only affects a couple hundred records total across all years. |
ferc_acct_name_plant_gen |
string |
Name of FERC account, derived from technology description and prime mover code. This name is associated with the record_id_eia_plant_gen record. |
generator_id_plant_gen |
string |
Generator ID of the record_id_eia_plant_gen record. This is usually numeric, but sometimes includes letters. Make sure you treat it as a string! |
generator_operating_year_plant_gen |
integer |
The year an associated plant_gen’s generator went into service. |
generators_number |
integer |
The number of generators associated with each |
operational_status_pudl_plant_gen |
string |
The operating status of the asset using PUDL categories of the record_id_eia_plant_gen record . |
ownership_record_type |
string |
Whether each generator record is for one owner or represents a total of all ownerships. |
plant_id_eia |
integer |
The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration. |
prime_mover_code_plant_gen |
string |
Code for the type of prime mover (e.g. CT, CG) associated with the record_id_eia_plant_gen. |
record_id_eia |
string |
Identifier for EIA plant parts analysis records. |
record_id_eia_plant_gen |
string |
Identifier for EIA plant parts analysis records which is at the plant_part level of plant_gen - meaning each record pertains to one generator. |
report_date |
date |
Date reported. |
technology_description_plant_gen |
string |
High level description of the technology used by the record_id_eia_plant_gen’s generator to produce electricity. |
unit_id_pudl_plant_gen |
integer |
Dynamically assigned PUDL unit id of the record_id_eia_plant_gen. WARNING: This ID is not guaranteed to be static long term as the input data and algorithm may evolve over time. |
utility_id_eia |
integer |
The EIA Utility Identification number. |
out_eia__yearly_boilers#
Noticed any irregularities with this table? File a report.
Annual time series of all EIA boiler attributes.
- Most-recent data:
2025
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
EIA – Mix of multiple EIA Forms
- Primary key:
plant_id_eia, boiler_id, report_date
Usage Warnings#
Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.
Contains information from multiple raw inputs.
Additional Details#
This table inherits canonicalized values for boilers, plants, and utilities. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia__yearly_boilers.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia__yearly_boilers.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia__yearly_boilers.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia__yearly_boilers.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
air_flow_100pct_load_cubic_feet_per_minute |
number |
Total air flow including excess air at 100 percent load, reported at standard temperature and pressure (i.e. 68 F and one atmosphere pressure). |
boiler_fuel_code_1 |
string |
The code representing the most predominant type of energy that fuels the boiler. |
boiler_fuel_code_2 |
string |
The code representing the second most predominant type of energy that fuels the boiler. |
boiler_fuel_code_3 |
string |
The code representing the third most predominant type of energy that fuels the boiler. |
boiler_fuel_code_4 |
string |
The code representing the fourth most predominant type of energy that fuels the boiler. |
boiler_id |
string |
Alphanumeric boiler ID. |
boiler_manufacturer |
string |
Name of boiler manufacturer. |
boiler_manufacturer_code |
string |
EIA short code for boiler manufacturer. |
boiler_operating_date |
date |
Date the boiler began or is planned to begin commercial operation. |
boiler_retirement_date |
date |
Date of the scheduled or effected retirement of the boiler. |
boiler_status |
string |
EIA short code identifying boiler operational status. |
boiler_type |
string |
EIA short code indicating the standards under which the boiler is operating as described in the U.S. EPA regulation under 40 CFR. |
city |
string |
Name of the city. |
compliance_year_mercury |
integer |
Year boiler was or is expected to be in compliance with federal, state and/or local regulations for mercury emissions. |
compliance_year_nox |
integer |
Year boiler was or is expected to be in compliance with federal, state and/or local regulations for nitrogen oxide emissions. |
compliance_year_particulate |
integer |
Year boiler was or is expected to be in compliance with federal, state and/or local regulations for particulate matter emissions. |
compliance_year_so2 |
integer |
Year boiler was or is expected to be in compliance with federal, state and/or local regulations for sulfur dioxide emissions. |
county |
string |
County name. |
data_maturity |
string |
Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity. |
efficiency_100pct_load |
number |
Boiler efficiency percentage when burning at 100 percent load to the nearest 0.1 percent. |
efficiency_50pct_load |
number |
Boiler efficiency percentage when burning at 50 percent load to the nearest 0.1 percent. |
firing_rate_using_coal_tons_per_hour |
number |
Design firing rate at maximum continuous steam flow for coal to the nearest 0.1 ton per hour. |
firing_rate_using_gas_mcf_per_hour |
number |
Design firing rate at maximum continuous steam flow for gas to the nearest 0.1 cubic feet per hour. |
firing_rate_using_oil_bbls_per_hour |
number |
Design firing rate at maximum continuous steam flow for pet coke to the nearest 0.1 barrels per hour. |
firing_rate_using_other_fuels |
number |
Design firing rate at maximum continuous steam flow for energy sources other than coal, petroleum, or natural gas. |
firing_type_1 |
string |
EIA short code indicating the type of firing used by this boiler. |
firing_type_2 |
string |
EIA short code indicating the type of firing used by this boiler. |
firing_type_3 |
string |
EIA short code indicating the type of firing used by this boiler. |
fly_ash_reinjection |
boolean |
Indicates whether the boiler is capable of re-injecting fly ash. |
hrsg |
boolean |
indicates if the boiler is a heat recovery steam generator (HRSG). |
latitude |
number |
Latitude of the plant’s location, in degrees. |
longitude |
number |
Longitude of the plant’s location, in degrees. |
max_steam_flow_lbs_per_hour |
number |
Maximum continuous steam flow at 100 percent load. |
mercury_control_existing_strategy_1 |
string |
Existing strategy to comply with the most stringent mercury regulation. |
mercury_control_existing_strategy_2 |
string |
Existing strategy to comply with the most stringent mercury regulation. |
mercury_control_existing_strategy_3 |
string |
Existing strategy to comply with the most stringent mercury regulation. |
mercury_control_existing_strategy_4 |
string |
Existing strategy to comply with the most stringent mercury regulation. |
mercury_control_existing_strategy_5 |
string |
Existing strategy to comply with the most stringent mercury regulation. |
mercury_control_existing_strategy_6 |
string |
Existing strategy to comply with the most stringent mercury regulation. |
mercury_control_proposed_strategy_1 |
string |
Proposed strategy to comply with the most stringent mercury regulation. |
mercury_control_proposed_strategy_2 |
string |
Proposed strategy to comply with the most stringent mercury regulation. |
mercury_control_proposed_strategy_3 |
string |
Proposed strategy to comply with the most stringent mercury regulation. |
new_source_review |
boolean |
Indicates whether the boiler is subject to New Source Review requirements. |
new_source_review_date |
date |
Month of issued New Source Review permit. |
new_source_review_permit |
string |
New Source Review permit number. |
nox_control_existing_caaa_compliance_strategy_1 |
string |
Existing strategies to meet the nitrogen oxide requirements of Title IV of the Clean Air Act Amendment of 1990. |
nox_control_existing_caaa_compliance_strategy_2 |
string |
Existing strategies to meet the nitrogen oxide requirements of Title IV of the Clean Air Act Amendment of 1990. |
nox_control_existing_caaa_compliance_strategy_3 |
string |
Existing strategies to meet the nitrogen oxide requirements of Title IV of the Clean Air Act Amendment of 1990. |
nox_control_existing_strategy_1 |
string |
Existing strategy to comply with the most stringent nitrogen oxide regulation. |
nox_control_existing_strategy_2 |
string |
Existing strategy to comply with the most stringent nitrogen oxide regulation. |
nox_control_existing_strategy_3 |
string |
Existing strategy to comply with the most stringent nitrogen oxide regulation. |
nox_control_manufacturer |
string |
Name of nitrogen oxide control manufacturer. |
nox_control_manufacturer_code |
string |
Code indicating the nitrogen oxide control burner manufacturer. |
nox_control_out_of_compliance_strategy_1 |
string |
If boiler is not in compliance with nitrogen oxide regulations, strategy for compliance. |
nox_control_out_of_compliance_strategy_2 |
string |
If boiler is not in compliance with nitrogen oxide regulations, strategy for compliance. |
nox_control_out_of_compliance_strategy_3 |
string |
If boiler is not in compliance with nitrogen oxide regulations, strategy for compliance. |
nox_control_planned_caaa_compliance_strategy_1 |
string |
Planned strategies to meet the nitrogen oxide requirements of Title IV of the Clean Air Act Amendment of 1990. |
nox_control_planned_caaa_compliance_strategy_2 |
string |
Planned strategies to meet the nitrogen oxide requirements of Title IV of the Clean Air Act Amendment of 1990. |
nox_control_planned_caaa_compliance_strategy_3 |
string |
Planned strategies to meet the nitrogen oxide requirements of Title IV of the Clean Air Act Amendment of 1990. |
nox_control_proposed_strategy_1 |
string |
Proposed strategy to comply with the most stringent nitrogen oxide regulation. |
nox_control_proposed_strategy_2 |
string |
Proposed strategy to comply with the most stringent nitrogen oxide regulation. |
nox_control_proposed_strategy_3 |
string |
Proposed strategy to comply with the most stringent nitrogen oxide regulation. |
nox_control_status_code |
string |
Nitrogen oxide control status code. |
particulate_control_out_of_compliance_strategy_1 |
string |
If boiler is not in compliance with particulate matter regulations, strategy for compliance. |
particulate_control_out_of_compliance_strategy_2 |
string |
If boiler is not in compliance with particulate matter regulations, strategy for compliance. |
particulate_control_out_of_compliance_strategy_3 |
string |
If boiler is not in compliance with particulate matter regulations, strategy for compliance. |
plant_id_eia |
integer |
The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration. |
plant_id_pudl |
integer |
A manually assigned PUDL plant ID. May not be constant over time. |
plant_name_eia |
string |
Plant name. |
regulation_mercury |
string |
Most stringent type of statute or regulation code under which the boiler is operating for mercury control standards. |
regulation_nox |
string |
EIA short code for most stringent type of statute or regulation code under which the boiler is operating for nitrogen oxide control standards. |
regulation_particulate |
string |
EIA short code for most stringent type of statute or regulation code under which the boiler is operating for particulate matter control standards. |
regulation_so2 |
string |
EIA short code for most stringent type of statute or regulation code under which the boiler is operating for sulfur dioxide control standards. |
report_date |
date |
Date reported. |
so2_control_existing_caaa_compliance_strategy_1 |
string |
Existing strategies to meet the sulfur dioxide requirements of Title IV of the Clean Air Act Amendment of 1990. |
so2_control_existing_caaa_compliance_strategy_2 |
string |
Existing strategies to meet the sulfur dioxide requirements of Title IV of the Clean Air Act Amendment of 1990. |
so2_control_existing_caaa_compliance_strategy_3 |
string |
Existing strategies to meet the sulfur dioxide requirements of Title IV of the Clean Air Act Amendment of 1990. |
so2_control_existing_strategy_1 |
string |
Existing strategy to comply with the most stringent sulfur dioxide regulation. |
so2_control_existing_strategy_2 |
string |
Existing strategy to comply with the most stringent sulfur dioxide regulation. |
so2_control_existing_strategy_3 |
string |
Existing strategy to comply with the most stringent sulfur dioxide regulation. |
so2_control_out_of_compliance_strategy_1 |
string |
If boiler is not in compliance with sulfur dioxide regulations, strategy for compliance. |
so2_control_out_of_compliance_strategy_2 |
string |
If boiler is not in compliance with sulfur dioxide regulations, strategy for compliance. |
so2_control_out_of_compliance_strategy_3 |
string |
If boiler is not in compliance with sulfur dioxide regulations, strategy for compliance. |
so2_control_planned_caaa_compliance_strategy_1 |
string |
Planned strategies to meet the sulfur dioxide requirements of Title IV of the Clean Air Act Amendment of 1990. |
so2_control_planned_caaa_compliance_strategy_2 |
string |
Planned strategies to meet the sulfur dioxide requirements of Title IV of the Clean Air Act Amendment of 1990. |
so2_control_planned_caaa_compliance_strategy_3 |
string |
Planned strategies to meet the sulfur dioxide requirements of Title IV of the Clean Air Act Amendment of 1990. |
so2_control_proposed_strategy_1 |
string |
Proposed strategy to comply with the most stringent sulfur dioxide regulation. |
so2_control_proposed_strategy_2 |
string |
Proposed strategy to comply with the most stringent sulfur dioxide regulation. |
so2_control_proposed_strategy_3 |
string |
Proposed strategy to comply with the most stringent sulfur dioxide regulation. |
standard_nox_rate |
number |
Numeric value for the unit of measurement specified for nitrogen oxide. |
standard_particulate_rate |
number |
Numeric value for the unit of measurement specified for particulate matter. |
standard_so2_percent_scrubbed |
number |
The percent of sulfur dioxide to be scrubbed specified by the most stringent sulfur dioxide regulation. |
standard_so2_rate |
number |
Numeric value for the unit of measurement specified for sulfur dioxide. |
state |
string |
Two letter US state abbreviation. |
street_address |
string |
Physical street address. |
timezone |
string |
IANA timezone name |
turndown_ratio |
number |
The turndown ratio for the boiler. |
unit_id_pudl |
integer |
Dynamically assigned PUDL unit id. WARNING: This ID is not guaranteed to be static long term as the input data and algorithm may evolve over time. |
unit_nox |
string |
Numeric value for the unit of measurement specified for nitrogen oxide. |
unit_particulate |
string |
Numeric value for the unit of measurement specified for particulate matter. |
unit_so2 |
string |
Numeric value for the unit of measurement specified for sulfur dioxide. |
utility_id_eia |
integer |
The EIA Utility Identification number. |
utility_id_pudl |
integer |
A manually assigned PUDL utility ID. May not be stable over time. |
utility_name_eia |
string |
The name of the utility. |
waste_heat_input_mmbtu_per_hour |
number |
Design waste-heat input rate at maximum continuous steam flow where a waste-heat boiler is a boiler that receives all or a substantial portion of its energy input from the noncumbustible exhaust gases of a separate fuel-burning process (MMBTU per hour). |
wet_dry_bottom |
string |
Wet or Dry Bottom where Wet Bottom is defined as slag tanks that are installed at furnace throat to contain and remove molten ash from the furnace, and Dry Bottom is defined as having no slag tanks at furnace throat area, throat area is clear, and bottom ash drops through throat to bottom ash water hoppers. |
zip_code |
string |
Five digit US Zip Code. |
out_eia__yearly_generators#
Noticed any irregularities with this table? File a report.
Annual time series of all generator attributes including calculated capacity factor, heat rate, fuel cost per MMBTU and fuel cost per MWh.
- Most-recent data:
2026
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
EIA – Mix of multiple EIA Forms
- Primary key:
report_date, plant_id_eia, generator_id
Usage Warnings#
Contains estimated values.
Due to coverage problems in other tables, some generators have heat rate estimates but not fuel cost estimates.
Not all columns are originally reported in or calculable from the input tables. Expect nulls.
Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.
Contains information from multiple raw inputs.
Additional Details#
This table includes all yearly attributes for all generators reported to EIA-860 and EIA-923.
To provide a complete picture of generator data, this table compiles data from many different EIA tables, including directly reported, imputed and calculated columns. We suggest using this table if you want to explore yearly attributes about generators and would rather use a more complete and denormalized table. If you are more interested in the originally reported values, we recommend searching for core_eia with the column you are most interested in.
The calculations of capacity factor, heat rate, fuel cost per MMBTU and fuel cost per MWh are based on the allocation of net generation reported on the basis of plant, prime mover and energy source to individual generators - which comes from this table: out_eia923__yearly_generation_fuel_by_generator.
Heat rates by generator-yearly are estimated by using allocated estimates for per-generator net generation and fuel consumption as well as the core_eia923__monthly_boiler_fuel table, which reports fuel consumed by boiler. Heat rates are necessary to estimate the amount of fuel consumed by a generation unit, and thus the fuel cost per MWh generated.
Plant specific fuel prices are taken from the core_eia923__fuel_receipts_costs table, which only has ~70% coverage, leading to some generators with heat rate estimates still lacking fuel cost estimates.
This table inherits canonicalized values for generators, plants, and utilities. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia__yearly_generators.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia__yearly_generators.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia__yearly_generators.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia__yearly_generators.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
associated_combined_heat_power |
boolean |
Indicates whether the generator is associated with a combined heat and power system |
balancing_authority_code_eia |
string |
EIA short code identifying a balancing authority. May include Canadian and Mexican BAs. |
balancing_authority_name_eia |
string |
Name of the balancing authority. |
bga_source |
string |
The source from where the unit_id_pudl is compiled. The unit_id_pudl comes directly from EIA 860, or string association (which looks at all the boilers and generators that are not associated with a unit and tries to find a matching string in the respective collection of boilers or generator), or from a unit connection (where the unit_id_eia is employed to find additional boiler generator connections). |
bypass_heat_recovery |
boolean |
Can this generator operate while bypassing the heat recovery steam generator? |
can_burn_multiple_fuels |
boolean |
Whether the generator can burn multiple fuels. |
can_cofire_fuels |
boolean |
Whether the generator can co-fire fuels. |
can_switch_oil_gas |
boolean |
Whether the generator can switch between oil and natural gas. |
can_switch_when_operating |
boolean |
Indicates whether a fuel switching generator can switch fuels while operating. |
capacity_factor |
number |
Fraction of potential generation that was actually reported for a plant part. Energy generated over time period / nameplate capacity * time period (hours/years/etc.). |
capacity_mw |
number |
Total installed (nameplate) capacity, in megawatts. |
carbon_capture |
boolean |
Indicates whether the generator uses carbon capture technology. |
city |
string |
Name of the city. |
county |
string |
County name. |
current_planned_generator_operating_date |
date |
The most recently updated effective date on which the generator is scheduled to start operation |
data_maturity |
string |
Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity. |
deliver_power_transgrid |
boolean |
Indicate whether the generator can deliver power to the transmission grid. |
distributed_generation |
boolean |
Whether the generator is considered distributed generation |
duct_burners |
boolean |
Indicates whether the unit has duct-burners for supplementary firing of the turbine exhaust gas |
energy_source_1_transport_1 |
string |
Primary mode of transport for energy source 1. |
energy_source_1_transport_2 |
string |
Secondary mode of transport for energy source 1. |
energy_source_1_transport_3 |
string |
Tertiary mode of transport for energy source 1. |
energy_source_2_transport_1 |
string |
Primary mode of transport for energy source 2. |
energy_source_2_transport_2 |
string |
Secondary mode of transport for energy source 2. |
energy_source_2_transport_3 |
string |
Tertiary mode of transport for energy source 2. |
energy_source_code_1 |
string |
The code representing the most predominant type of energy that fuels the generator. |
energy_source_code_2 |
string |
The code representing the second most predominant type of energy that fuels the generator |
energy_source_code_3 |
string |
The code representing the third most predominant type of energy that fuels the generator |
energy_source_code_4 |
string |
The code representing the fourth most predominant type of energy that fuels the generator |
energy_source_code_5 |
string |
The code representing the fifth most predominant type of energy that fuels the generator |
energy_source_code_6 |
string |
The code representing the sixth most predominant type of energy that fuels the generator |
energy_storage_capacity_mwh |
number |
Energy storage capacity in MWh (e.g. for batteries). |
ferc_qualifying_facility |
boolean |
Indicates whether or not a generator is a qualifying FERC cogeneration facility. |
fluidized_bed_tech |
boolean |
Indicates whether the generator uses fluidized bed technology |
fuel_cost_per_mmbtu |
number |
Average fuel cost per MMBTU of heat content in nominal USD. |
fuel_cost_per_mmbtu_source |
string |
Indicates the source of the values in the fuel_cost_per_mmbtu column. The fuel cost either comes directly from the EIA forms (original), was filled in from the EIA’s API using state-level averages (eiaapi), was filled in using a rolling average (rolling_avg) or When the records get aggregated together and contain multiple sources (mixed). |
fuel_cost_per_mwh |
number |
Derived from MCOE, a unit level value. Average fuel cost per MWh of heat content in nominal USD. |
fuel_type_code_pudl |
string |
Simplified fuel type code used in PUDL |
fuel_type_count |
integer |
A count of how many different simple energy sources there are associated with a generator. |
generator_id |
string |
Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string! |
generator_operating_date |
date |
Date the generator began commercial operation. If harvested values are inconsistent, we default to using the most recently reported date. |
generator_retirement_date |
date |
Date of the scheduled or effected retirement of the generator. |
latitude |
number |
Latitude of the plant’s location, in degrees. |
longitude |
number |
Longitude of the plant’s location, in degrees. |
minimum_load_mw |
number |
The minimum load at which the generator can operate at continuosuly. |
nameplate_power_factor |
number |
The nameplate power factor of the generator. |
net_capacity_mwdc |
number |
Generation capacity in megawatts of direct current that is subject to a net metering agreement. Typically used for behind-the-meter solar PV. |
net_generation_mwh |
number |
Net electricity generation for the specified period in megawatt-hours (MWh). |
operational_status |
string |
The operating status of the asset. For generators this is based on which tab the generator was listed in in EIA 860. |
operational_status_code |
string |
The operating status of the asset. |
original_planned_generator_operating_date |
date |
The date the generator was originally scheduled to be operational |
other_combustion_tech |
boolean |
Indicates whether the generator uses other combustion technologies |
other_modifications_date |
date |
Planned effective date that the generator is scheduled to enter commercial operation after any other planned modification is complete. |
other_planned_modifications |
boolean |
Indicates whether there are there other modifications planned for the generator. |
owned_by_non_utility |
boolean |
Whether any part of generator is owned by a nonutilty |
ownership_code |
string |
Identifies the ownership for each generator. |
planned_derate_date |
date |
Planned effective month that the generator is scheduled to enter operation after the derate modification. |
planned_energy_source_code_1 |
string |
New energy source code for the planned repowered generator. |
planned_generator_retirement_date |
date |
Planned effective date of the scheduled retirement of the generator. |
planned_modifications |
boolean |
Indicates whether there are any planned capacity uprates/derates, repowering, other modifications, or generator retirements scheduled for the next 5 years. |
planned_net_summer_capacity_derate_mw |
number |
Decrease in summer capacity expected to be realized from the derate modification to the equipment. |
planned_net_summer_capacity_uprate_mw |
number |
Increase in summer capacity expected to be realized from the modification to the equipment. |
planned_net_winter_capacity_derate_mw |
number |
Decrease in winter capacity expected to be realized from the derate modification to the equipment. |
planned_net_winter_capacity_uprate_mw |
number |
Increase in winter capacity expected to be realized from the uprate modification to the equipment. |
planned_new_capacity_mw |
number |
The expected new namplate capacity for the generator. |
planned_new_prime_mover_code |
string |
New prime mover for the planned repowered generator. |
planned_repower_date |
date |
Planned effective date that the generator is scheduled to enter operation after the repowering is complete. |
planned_uprate_date |
date |
Planned effective date that the generator is scheduled to enter operation after the uprate modification. |
plant_id_eia |
integer |
The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration. |
plant_id_pudl |
integer |
A manually assigned PUDL plant ID. May not be constant over time. |
plant_name_eia |
string |
Plant name. |
previously_canceled |
boolean |
Indicates whether the generator was previously reported as indefinitely postponed or canceled |
prime_mover_code |
string |
Code for the type of prime mover (e.g. CT, CG) |
pulverized_coal_tech |
boolean |
Indicates whether the generator uses pulverized coal technology |
reactive_power_output_mvar |
number |
Reactive Power Output (MVAr) |
report_date |
date |
Date reported. |
rto_iso_lmp_node_id |
string |
The designation used to identify the price node in RTO/ISO Locational Marginal Price reports |
rto_iso_location_wholesale_reporting_id |
string |
The designation used to report the specific location of the wholesale sales transactions to FERC for the Electric Quarterly Report |
solid_fuel_gasification |
boolean |
Indicates whether the generator is part of a solid fuel gasification system |
startup_source_code_1 |
string |
The code representing the first, second, third or fourth start-up and flame stabilization energy source used by the combustion unit(s) associated with this generator. |
startup_source_code_2 |
string |
The code representing the first, second, third or fourth start-up and flame stabilization energy source used by the combustion unit(s) associated with this generator. |
startup_source_code_3 |
string |
The code representing the first, second, third or fourth start-up and flame stabilization energy source used by the combustion unit(s) associated with this generator. |
startup_source_code_4 |
string |
The code representing the first, second, third or fourth start-up and flame stabilization energy source used by the combustion unit(s) associated with this generator. |
state |
string |
Two letter US state abbreviation. |
stoker_tech |
boolean |
Indicates whether the generator uses stoker technology |
street_address |
string |
Physical street address. |
subcritical_tech |
boolean |
Indicates whether the generator uses subcritical technology |
summer_capacity_estimate |
boolean |
Whether the summer capacity value was an estimate |
summer_capacity_mw |
number |
The net summer capacity. |
summer_estimated_capability_mw |
number |
EIA estimated summer capacity (in MWh). |
supercritical_tech |
boolean |
Indicates whether the generator uses supercritical technology |
synchronized_transmission_grid |
boolean |
Indicates whether standby generators (SB status) can be synchronized to the grid. |
technology_description |
string |
High level description of the technology used by the generator to produce electricity. |
time_cold_shutdown_full_load_code |
string |
The minimum amount of time required to bring the unit to full load from shutdown. |
timezone |
string |
IANA timezone name |
topping_bottoming_code |
string |
If the generator is associated with a combined heat and power system, indicates whether the generator is part of a topping cycle or a bottoming cycle |
total_fuel_cost |
number |
Total annual reported fuel costs for the plant part. Includes costs from all fuels. |
total_mmbtu |
number |
Total annual heat content of fuel consumed by a plant part record in the plant parts list. |
turbines_inverters_hydrokinetics |
integer |
Number of wind turbines, or hydrokinetic buoys. |
turbines_num |
integer |
Number of wind turbines, or hydrokinetic buoys. |
ultrasupercritical_tech |
boolean |
Indicates whether the generator uses ultra-supercritical technology |
unit_heat_rate_mmbtu_per_mwh |
number |
Fuel content per unit of electricity generated. Coming from MCOE calculation. |
unit_id_pudl |
integer |
Dynamically assigned PUDL unit id. WARNING: This ID is not guaranteed to be static long term as the input data and algorithm may evolve over time. |
uprate_derate_completed_date |
date |
The date when the uprate or derate was completed. |
uprate_derate_during_year |
boolean |
Was an uprate or derate completed on this generator during the reporting year? |
utility_id_eia |
integer |
The EIA Utility Identification number. |
utility_id_pudl |
integer |
A manually assigned PUDL utility ID. May not be stable over time. |
utility_name_eia |
string |
The name of the utility. |
winter_capacity_estimate |
boolean |
Whether the winter capacity value was an estimate |
winter_capacity_mw |
number |
The net winter capacity. |
winter_estimated_capability_mw |
number |
EIA estimated winter capacity (in MWh). |
zip_code |
string |
Five digit US Zip Code. |
out_eia__yearly_generators_by_ownership#
Noticed any irregularities with this table? File a report.
Annual time series of all EIA generators with ownership integrated.
- Most-recent data:
2026
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
EIA – Mix of multiple EIA Forms
- Primary key:
This table has no primary key. This table would have a primary key of the following columns, except there are ~200 records without utility IDs: [‘plant_id_eia’, ‘generator_id’, ‘report_date’, ‘utility_id_eia’, ‘ownership_record_type’].
Usage Warnings#
Some columns contain subtotals; use caution when choosing columns to aggregate.
Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.
Contains information from multiple raw inputs.
Additional Details#
Generators that report to EIA can have
multiple utility owners. EIA reports the ownership fraction for each of the owners
of generators (see core_eia860__scd_ownership). This table includes two records
for each generator’s owner: one record represents a total ownership stake and the
other record represents a proportional ownership stake (the ownership_record_type
column labeled as “total” and “owned” respectively). All of the values reported for
these generator-owner records are scaled to the ownership stake (represented by
fraction_owned).
This table was created to represent every possible version of ownership slices of the EIA generators. It is primarily used as an input to out_eia__yearly_plant_parts.
This table inherits canonicalized values for generators, plants, and utilities. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia__yearly_generators_by_ownership.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia__yearly_generators_by_ownership.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia__yearly_generators_by_ownership.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia__yearly_generators_by_ownership.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
capacity_eoy_mw |
number |
Total end of year installed (nameplate) capacity for a plant part, in megawatts. |
capacity_factor |
number |
Fraction of potential generation that was actually reported for a plant part. Energy generated over time period / nameplate capacity * time period (hours/years/etc.). |
capacity_mw |
number |
Total installed (nameplate) capacity, in megawatts. |
energy_source_code_1 |
string |
The code representing the most predominant type of energy that fuels the generator. |
ferc_acct_name |
string |
Name of FERC account, derived from technology description and prime mover code. |
fraction_owned |
number |
Proportion of generator ownership attributable to this utility. |
fuel_cost_per_mmbtu |
number |
Average fuel cost per MMBTU of heat content in nominal USD. |
fuel_cost_per_mmbtu_source |
string |
Indicates the source of the values in the fuel_cost_per_mmbtu column. The fuel cost either comes directly from the EIA forms (original), was filled in from the EIA’s API using state-level averages (eiaapi), was filled in using a rolling average (rolling_avg) or When the records get aggregated together and contain multiple sources (mixed). |
fuel_cost_per_mwh |
number |
Derived from MCOE, a unit level value. Average fuel cost per MWh of heat content in nominal USD. |
fuel_type_code_pudl |
string |
Simplified fuel type code used in PUDL |
generator_id |
string |
Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string! |
generator_operating_date |
date |
Date the generator began commercial operation. If harvested values are inconsistent, we default to using the most recently reported date. |
generator_operating_year |
integer |
Year a generator went into service. |
generator_retirement_date |
date |
Date of the scheduled or effected retirement of the generator. |
net_generation_mwh |
number |
Net electricity generation for the specified period in megawatt-hours (MWh). |
operational_status |
string |
The operating status of the asset. For generators this is based on which tab the generator was listed in in EIA 860. |
operational_status_pudl |
string |
The operating status of the asset using PUDL categories. |
ownership_record_type |
string |
Whether each generator record is for one owner or represents a total of all ownerships. |
planned_generator_retirement_date |
date |
Planned effective date of the scheduled retirement of the generator. |
plant_id_eia |
integer |
The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration. |
plant_id_pudl |
integer |
A manually assigned PUDL plant ID. May not be constant over time. |
plant_name_eia |
string |
Plant name. |
prime_mover_code |
string |
Code for the type of prime mover (e.g. CT, CG) |
report_date |
date |
Date reported. |
technology_description |
string |
High level description of the technology used by the generator to produce electricity. |
total_fuel_cost |
number |
Total annual reported fuel costs for the plant part. Includes costs from all fuels. |
total_mmbtu |
number |
Total annual heat content of fuel consumed by a plant part record in the plant parts list. |
unit_heat_rate_mmbtu_per_mwh |
number |
Fuel content per unit of electricity generated. Coming from MCOE calculation. |
unit_id_pudl |
integer |
Dynamically assigned PUDL unit id. WARNING: This ID is not guaranteed to be static long term as the input data and algorithm may evolve over time. |
utility_id_eia |
integer |
The EIA Utility Identification number. |
utility_id_pudl |
integer |
A manually assigned PUDL utility ID. May not be stable over time. |
utility_name_eia |
string |
The name of the utility. |
out_eia__yearly_plant_parts#
Noticed any irregularities with this table? File a report.
Annual time series of all EIA plant parts, for use in matching to FERC 1.
- Most-recent data:
2026
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
EIA – Mix of multiple EIA Forms
- Primary key:
record_id_eia
Usage Warnings#
Some columns contain subtotals; use caution when choosing columns to aggregate.
Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.
Contains information from multiple raw inputs.
Additional Details#
Practically speaking, a plant is a collection of generator(s). There are many attributes of generators (i.e. prime mover, primary fuel source, technology type). We can use these generator attributes to group generator records into larger aggregate records which we call “plant parts”. A plant part is a record which corresponds to a particular collection of generators that all share an identical attribute and utility owner. E.g. all of the generators with unit_id=2, or all of the generators with coal as their primary fuel source.
The EIA data about power plants (from EIA 923 and 860) is reported in tables with records that correspond to mostly generators and plants. Other datasets (cough cough FERC1) are less well organized and include plants, generators and other plant parts all in the same table without any clear labels. This plant part table is an attempt to create records corresponding to many different plant parts in order to connect specific slices of EIA plants to other datasets.
Because generators are often owned by multiple utilities, another dimension of
this plant part table involves generating two records for each owner: one for the
portion of the plant part they own and one for the plant part as a whole. The
portion records are labeled in the ownership_record_type column as owned
and the total records are labeled as total.
This table includes A LOT of duplicative information about EIA plants. It is primarily meant for use as an input into the record linkage between FERC1 plants and EIA.
This table inherits canonicalized values for generators, plants, and utilities. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia__yearly_plant_parts.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia__yearly_plant_parts.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia__yearly_plant_parts.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia__yearly_plant_parts.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
appro_part_label |
string |
Plant part of the associated true granularity record. |
appro_record_id_eia |
string |
EIA record ID of the associated true granularity record. |
capacity_eoy_mw |
number |
Total end of year installed (nameplate) capacity for a plant part, in megawatts. |
capacity_factor |
number |
Fraction of potential generation that was actually reported for a plant part. Energy generated over time period / nameplate capacity * time period (hours/years/etc.). |
capacity_mw |
number |
Total installed (nameplate) capacity, in megawatts. |
construction_year |
integer |
Year the plant’s oldest still operational unit was built. |
energy_source_code_1 |
string |
The code representing the most predominant type of energy that fuels the generator. |
ferc1_generator_agg_id |
integer |
ID dynamically assigned by PUDL to EIA records with multiple matches to a single FERC ID in the FERC-EIA manual matching process. The ID is manually assigned and has not been updated since 2020, but only affects a couple hundred records total across all years. |
ferc_acct_name |
string |
Name of FERC account, derived from technology description and prime mover code. |
fraction_owned |
number |
Proportion of generator ownership attributable to this utility. |
fuel_cost_per_mmbtu |
number |
Average fuel cost per MMBTU of heat content in nominal USD. |
fuel_cost_per_mwh |
number |
Derived from MCOE, a unit level value. Average fuel cost per MWh of heat content in nominal USD. |
fuel_type_code_pudl |
string |
Simplified fuel type code used in PUDL |
generator_id |
string |
Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string! |
generator_operating_year |
integer |
Year a generator went into service. |
generator_retirement_date |
date |
Date of the scheduled or effected retirement of the generator. |
installation_year |
integer |
Year the plant’s most recently built unit was installed. |
net_generation_mwh |
number |
Net electricity generation for the specified period in megawatt-hours (MWh). |
operational_status |
string |
The operating status of the asset. For generators this is based on which tab the generator was listed in in EIA 860. |
operational_status_pudl |
string |
The operating status of the asset using PUDL categories. |
ownership_dupe |
boolean |
Whether a plant part record has a duplicate record with different ownership status. |
ownership_record_type |
string |
Whether each generator record is for one owner or represents a total of all ownerships. |
planned_generator_retirement_date |
date |
Planned effective date of the scheduled retirement of the generator. |
plant_id_eia |
integer |
The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration. |
plant_id_pudl |
integer |
A manually assigned PUDL plant ID. May not be constant over time. |
plant_id_report_year |
string |
PUDL plant ID and report year of the record. |
plant_name_eia |
string |
Plant name. |
plant_name_ppe |
string |
Derived plant name that includes EIA plant name and other strings associated with ID and PK columns of the plant part. |
plant_part |
string |
The part of the plant a record corresponds to. |
plant_part_id_eia |
string |
Contains EIA plant ID, plant part, ownership, and EIA utility id |
prime_mover_code |
string |
Code for the type of prime mover (e.g. CT, CG) |
record_count |
integer |
Number of distinct generator IDs that participated in the aggregation for a plant part list record. |
record_id_eia |
string |
Identifier for EIA plant parts analysis records. |
report_date |
date |
Date reported. |
report_year |
integer |
Four-digit year in which the data was reported. |
technology_description |
string |
High level description of the technology used by the generator to produce electricity. |
total_fuel_cost |
number |
Total annual reported fuel costs for the plant part. Includes costs from all fuels. |
total_mmbtu |
number |
Total annual heat content of fuel consumed by a plant part record in the plant parts list. |
true_gran |
boolean |
Indicates whether a plant part list record is associated with the highest priority plant part for all identical records. |
unit_heat_rate_mmbtu_per_mwh |
number |
Fuel content per unit of electricity generated. Coming from MCOE calculation. |
unit_id_pudl |
integer |
Dynamically assigned PUDL unit id. WARNING: This ID is not guaranteed to be static long term as the input data and algorithm may evolve over time. |
utility_id_eia |
integer |
The EIA Utility Identification number. |
utility_id_pudl |
integer |
A manually assigned PUDL utility ID. May not be stable over time. |
utility_name_eia |
string |
The name of the utility. |
out_eia__yearly_plants#
Noticed any irregularities with this table? File a report.
Annual time series of all EIA plant attributes.
- Most-recent data:
2026
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
EIA – Mix of multiple EIA Forms
- Primary key:
plant_id_eia, report_date
Usage Warnings#
Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.
Contains information from multiple raw inputs.
Additional Details#
This table inherits canonicalized values for plants. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia__yearly_plants.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia__yearly_plants.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia__yearly_plants.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia__yearly_plants.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
ash_impoundment |
boolean |
Is there an ash impoundment (e.g. pond, reservoir) at the plant? |
ash_impoundment_lined |
boolean |
If there is an ash impoundment at the plant, is the impoundment lined? |
ash_impoundment_status |
string |
If there is an ash impoundment at the plant, the ash impoundment status as of December 31 of the reporting year. |
balancing_authority_code_eia |
string |
EIA short code identifying a balancing authority. May include Canadian and Mexican BAs. |
balancing_authority_code_eia_consistent_rate |
number |
Percentage consistency of balancing authority code across entity records. |
balancing_authority_name_eia |
string |
Name of the balancing authority. |
city |
string |
Name of the city. |
county |
string |
County name. |
data_maturity |
string |
Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity. |
datum |
string |
Geodetic coordinate system identifier (e.g. NAD27, NAD83, or WGS84). |
energy_storage |
boolean |
Indicates if the facility has energy storage capabilities. |
ferc_cogen_docket_no |
string |
The docket number relating to the FERC cogenerator status. See FERC Form 556. |
ferc_cogen_status |
boolean |
Indicates whether the plant has FERC qualifying facility cogenerator status. See FERC Form 556. |
ferc_exempt_wholesale_generator |
boolean |
Indicates whether the plant has FERC qualifying facility exempt wholesale generator status |
ferc_exempt_wholesale_generator_docket_no |
string |
The docket number relating to the FERC qualifying facility exempt wholesale generator status. |
ferc_qualifying_facility_docket_no |
string |
The docket number relating to the FERC qualifying facility cogenerator status. See FERC Form 556. |
ferc_small_power_producer |
boolean |
Indicates whether the plant has FERC qualifying facility small power producer status. See FERC Form 556. |
ferc_small_power_producer_docket_no |
string |
The docket number relating to the FERC qualifying facility small power producer status. See FERC Form 556. |
grid_voltage_1_kv |
number |
Plant’s grid voltage at point of interconnection to transmission or distribution facilities |
grid_voltage_2_kv |
number |
Plant’s grid voltage at point of interconnection to transmission or distribution facilities |
grid_voltage_3_kv |
number |
Plant’s grid voltage at point of interconnection to transmission or distribution facilities |
has_net_metering |
boolean |
Whether the plant has a net metering agreement in effect during the reporting year. (Only displayed for facilities that report the sun or wind as an energy source). This field was only reported up until 2015 |
iso_rto_code |
string |
The code of the plant’s ISO or RTO. NA if not reported in that year. |
latitude |
number |
Latitude of the plant’s location, in degrees. |
liquefied_natural_gas_storage |
boolean |
Indicates if the facility have the capability to store the natural gas in the form of liquefied natural gas. |
longitude |
number |
Longitude of the plant’s location, in degrees. |
natural_gas_local_distribution_company |
string |
Names of Local Distribution Company (LDC), connected to natural gas burning power plants. |
natural_gas_pipeline_name_1 |
string |
The name of the owner or operator of natural gas pipeline that connects directly to this facility or that connects to a lateral pipeline owned by this facility. |
natural_gas_pipeline_name_2 |
string |
The name of the owner or operator of natural gas pipeline that connects directly to this facility or that connects to a lateral pipeline owned by this facility. |
natural_gas_pipeline_name_3 |
string |
The name of the owner or operator of natural gas pipeline that connects directly to this facility or that connects to a lateral pipeline owned by this facility. |
natural_gas_storage |
boolean |
Indicates if the facility have on-site storage of natural gas. |
nerc_region |
string |
NERC region in which the plant is located |
pipeline_notes |
string |
Additional owner or operator of natural gas pipeline. |
plant_id_eia |
integer |
The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration. |
plant_id_pudl |
integer |
A manually assigned PUDL plant ID. May not be constant over time. |
plant_name_eia |
string |
Plant name. |
primary_purpose_id_naics |
integer |
North American Industry Classification System (NAICS) code that best describes the primary purpose of the reporting plant |
regulatory_status_code |
string |
Indicates whether the plant is regulated or non-regulated. |
report_date |
date |
Date reported. |
reporting_frequency_code |
string |
Code that specifies what time period data has to be reported (i.e. monthly data or annual totals) and how often the power plant reports this data to EIA. See reporting_frequencies_eia for more details. |
sector_id_eia |
integer |
EIA assigned sector ID, corresponding to high level NAICS sector, designated by the primary purpose, regulatory status and plant-level combined heat and power status |
sector_name_eia |
string |
EIA assigned sector name, corresponding to high level NAICS sector, designated by the primary purpose, regulatory status and plant-level combined heat and power status |
service_area |
string |
Service area in which plant is located; for unregulated companies, it’s the electric utility with which plant is interconnected |
state |
string |
Two letter US state abbreviation. |
street_address |
string |
Physical street address. |
timezone |
string |
IANA timezone name |
transmission_distribution_owner_id |
integer |
EIA-assigned code for owner of transmission/distribution system to which the plant is interconnected. |
transmission_distribution_owner_name |
string |
Name of the owner of the transmission or distribution system to which the plant is interconnected. |
transmission_distribution_owner_state |
string |
State location for owner of transmission/distribution system to which the plant is interconnected. |
utility_id_eia |
integer |
The EIA Utility Identification number. |
utility_id_pudl |
integer |
A manually assigned PUDL utility ID. May not be stable over time. |
utility_name_eia |
string |
The name of the utility. |
water_source |
string |
Name of water source associated with the plant. |
zip_code |
string |
Five digit US Zip Code. |
out_eia__yearly_utilities#
Noticed any irregularities with this table? File a report.
Annual time series of all EIA utility attributes.
- Most-recent data:
2026
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
EIA – Mix of multiple EIA Forms
- Primary key:
utility_id_eia, report_date
Usage Warnings#
Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.
Contains information from multiple raw inputs.
Additional Details#
This table inherits canonicalized values for utilities. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia__yearly_utilities.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia__yearly_utilities.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia__yearly_utilities.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_eia__yearly_utilities.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
address_2 |
string |
Second line of the address. |
attention_line |
string |
Mail attention name of the operator/owner. |
city |
string |
Name of the city. |
contact_firstname |
string |
First name of utility contact 1. |
contact_firstname_2 |
string |
First name of utility contact 2. |
contact_lastname |
string |
Last name of utility contact 1. |
contact_lastname_2 |
string |
Last name of utility contact 2. |
contact_title |
string |
Title of of utility contact 1. |
contact_title_2 |
string |
Title of utility contact 2. |
data_maturity |
string |
Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity. |
entity_type |
string |
Entity type of principal owner. |
phone_extension |
string |
Phone extension for utility contact 1 |
phone_extension_2 |
string |
Phone extension for utility contact 2 |
phone_number |
string |
Phone number for utility contact 1. |
phone_number_2 |
string |
Phone number for utility contact 2. |
plants_reported_asset_manager |
boolean |
Is the reporting entity an asset manager of power plants reported on Schedule 2 of the form? |
plants_reported_operator |
boolean |
Is the reporting entity an operator of power plants reported on Schedule 2 of the form? |
plants_reported_other_relationship |
boolean |
Does the reporting entity have any other relationship to the power plants reported on Schedule 2 of the form? |
plants_reported_owner |
boolean |
Is the reporting entity an owner of power plants reported on Schedule 2 of the form? |
report_date |
date |
Date reported. |
state |
string |
Two letter US state abbreviation. |
street_address |
string |
Physical street address. |
utility_id_eia |
integer |
The EIA Utility Identification number. |
utility_id_pudl |
integer |
A manually assigned PUDL utility ID. May not be stable over time. |
utility_name_eia |
string |
The name of the utility. |
zip_code |
string |
Five digit US Zip Code. |
zip_code_4 |
string |
Four digit US Zip Code suffix. |
out_ferc1__yearly_all_plants#
Noticed any irregularities with this table? File a report.
Annual time series of plants reported in the steam, small generators, hydro, and pumped storage tables.
- Most-recent data:
2024
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
FERC Form 1 – Annual Report of Major Electric Utilities (Schedules 402, 404, 406 and 408)
- Primary key:
record_id
Usage Warnings#
Not all columns are originally reported in all of the input plant tables. Expect nulls.
FERC does not restrict respondents to report unique and non-duplicative plant records. There are sporadic instances of respondents reporting portions of plants and then the total plant (ex: unit 1, unit 2 and total). Use caution when aggregating.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.
Additional Details#
This table is a concatenation of the following plant tables:
core_ferc1__yearly_steam_plants_sched402
core_ferc1__yearly_hydroelectric_plants_sched406
core_ferc1__yearly_small_plants_sched410
core_ferc1__yearly_pumped_storage_plants_sched408
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_all_plants.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_all_plants.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_all_plants.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_all_plants.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
asset_retirement_cost |
number |
Asset retirement cost (USD). |
avg_num_employees |
number |
The average number of employees assigned to each plant. |
capacity_factor |
number |
Fraction of potential generation that was actually reported for a plant part. Energy generated over time period / nameplate capacity * time period (hours/years/etc.). |
capacity_mw |
number |
Total installed (nameplate) capacity, in megawatts. |
capex_annual_addition |
number |
Annual capital addition into capex_total. |
capex_annual_addition_rolling |
number |
Year-to-date capital addition into capex_total. |
capex_annual_per_kw |
number |
Annual capital addition into capex_total per kw. |
capex_annual_per_mw |
number |
Annual capital addition into capex_total per MW. |
capex_annual_per_mw_rolling |
number |
Year-to-date capital addition into capex_total per MW. |
capex_annual_per_mwh |
number |
Annual capital addition into capex_total per MWh. |
capex_annual_per_mwh_rolling |
number |
Year-to-date capital addition into capex_total per MWh. |
capex_equipment |
number |
Cost of plant: equipment (USD). |
capex_equipment_electric |
number |
Cost of plant: accessory electric equipment (USD). |
capex_equipment_misc |
number |
Cost of plant: miscellaneous power plant equipment (USD). |
capex_facilities |
number |
Cost of plant: reservoirs, dams, and waterways (USD). |
capex_land |
number |
Cost of plant: land and land rights (USD). |
capex_per_mw |
number |
Cost of plant per megawatt of installed (nameplate) capacity. Nominal USD. |
capex_roads |
number |
Cost of plant: roads, railroads, and bridges (USD). |
capex_structures |
number |
Cost of plant: structures and improvements (USD). |
capex_total |
number |
Total cost of plant (USD). |
capex_wheels_turbines_generators |
number |
Cost of plant: water wheels, turbines, and generators (USD). |
capex_wo_retirement_total |
number |
Total cost of plant (USD) without retirements. |
construction_type |
string |
Type of plant construction (‘outdoor’, ‘semioutdoor’, or ‘conventional’). Categorized by PUDL based on our best guess of intended value in FERC1 freeform strings. |
construction_year |
integer |
Year the plant’s oldest still operational unit was built. |
energy_used_for_pumping_mwh |
number |
Energy used for pumping, in megawatt-hours. |
ferc_license_id |
string |
The FERC license ID of a project. |
fuel_cost_per_mmbtu |
number |
Average fuel cost per MMBTU of heat content in nominal USD. |
fuel_cost_per_mwh |
number |
Derived from MCOE, a unit level value. Average fuel cost per MWh of heat content in nominal USD. |
fuel_type |
string |
Type of fuel. |
installation_year |
integer |
Year the plant’s most recently built unit was installed. |
license_id_ferc1 |
integer |
FERC issued operating license ID for the facility, if available. This value is extracted from the original plant name where possible. |
net_capacity_adverse_conditions_mw |
number |
Net plant capability under the least favorable operating conditions, in megawatts. |
net_capacity_favorable_conditions_mw |
number |
Net plant capability under the most favorable operating conditions, in megawatts. |
net_generation_mwh |
number |
Net electricity generation for the specified period in megawatt-hours (MWh). |
net_load_mwh |
number |
Net output for load (net generation - energy used for pumping) in megawatt-hours. |
not_water_limited_capacity_mw |
number |
Plant capacity in MW when not limited by condenser water. |
opex_allowances |
number |
Allowances. |
opex_boiler |
number |
Maintenance of boiler (or reactor) plant. |
opex_coolants |
number |
Cost of coolants and water (nuclear plants only) |
opex_dams |
number |
Production expenses: maintenance of reservoirs, dams, and waterways (USD). |
opex_electric |
number |
Production expenses: electric expenses (USD). |
opex_engineering |
number |
Production expenses: maintenance, supervision, and engineering (USD). |
opex_fuel |
number |
Production expenses: fuel (USD). |
opex_generation_misc |
number |
Production expenses: miscellaneous power generation expenses (USD). |
opex_hydraulic |
number |
Production expenses: hydraulic expenses (USD). |
opex_maintenance |
number |
Production expenses: Maintenance (USD). |
opex_misc_plant |
number |
Production expenses: maintenance of miscellaneous hydraulic plant (USD). |
opex_misc_power |
number |
Miscellaneous steam (or nuclear) expenses. |
opex_misc_steam |
number |
Maintenance of miscellaneous steam (or nuclear) plant. |
opex_nonfuel_per_mwh |
number |
Investments in non-fuel production expenses per Mwh. |
opex_operations |
number |
Production expenses: operations, supervision, and engineering (USD). |
opex_per_mwh |
number |
Total production expenses (USD per MWh generated). |
opex_plant |
number |
Production expenses: maintenance of electric plant (USD). |
opex_production_before_pumping |
number |
Total production expenses before pumping (USD). |
opex_production_total |
number |
Total operating expenses. |
opex_pumped_storage |
number |
Production expenses: pumped storage (USD). |
opex_pumping |
number |
Production expenses: We are here to PUMP YOU UP! (USD). |
opex_rents |
number |
Production expenses: rents (USD). |
opex_steam |
number |
Steam expenses. |
opex_steam_other |
number |
Steam from other sources. |
opex_structures |
number |
Production expenses: maintenance of structures (USD). |
opex_total |
number |
Total production expenses, excluding fuel (USD). |
opex_total_nonfuel |
number |
Total production expenses, excluding fuel (USD). |
opex_transfer |
number |
Steam transferred (Credit). |
opex_water_for_power |
number |
Production expenses: water for power (USD). |
peak_demand_mw |
number |
Net peak demand for 60 minutes. Note: in some cases peak demand for other time periods may have been reported instead, if hourly peak demand was unavailable. |
plant_capability_mw |
number |
Net plant capability in megawatts. |
plant_hours_connected_while_generating |
number |
Hours the plant was connected to load while generating in the report year. |
plant_id_ferc1 |
integer |
Algorithmically assigned PUDL FERC Plant ID. WARNING: NOT STABLE BETWEEN PUDL DB INITIALIZATIONS. |
plant_id_pudl |
integer |
A manually assigned PUDL plant ID. May not be constant over time. |
plant_name_ferc1 |
string |
Name of the plant, as reported to FERC. This is a freeform string, not guaranteed to be consistent across references to the same plant. |
plant_type |
string |
Type of plant. |
record_id |
string |
Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped. |
report_year |
integer |
Four-digit year in which the data was reported. |
utility_id_ferc1 |
integer |
PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year. |
utility_id_ferc1_dbf |
integer |
FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year. |
utility_id_ferc1_xbrl |
string |
FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year. |
utility_id_pudl |
integer |
A manually assigned PUDL utility ID. May not be stable over time. |
utility_name_ferc1 |
string |
Name of the responding utility, as it is reported in FERC Form 1. For human readability only. |
water_limited_capacity_mw |
number |
Plant capacity in MW when limited by condenser water. |
out_ferc1__yearly_balance_sheet_assets_sched110#
Noticed any irregularities with this table? File a report.
Annual time series of utility assets and other debits.
- Most-recent data:
2024
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 110)
- Primary key:
utility_id_ferc1, report_year, asset_type
Usage Warnings#
Some columns contain subtotals; use caution when choosing columns to aggregate.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_balance_sheet_assets_sched110.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_balance_sheet_assets_sched110.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_balance_sheet_assets_sched110.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_balance_sheet_assets_sched110.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
asset_type |
string |
Type of asset being reported to the core_ferc1__yearly_balance_sheet_assets_sched110 table. |
balance |
string |
Indication of whether a column is a credit or debit, as reported in the XBRL taxonomy. |
ending_balance |
number |
Account balance at end of year. |
ferc_account |
string |
Actual FERC Account number (e.g. ‘359.1’) if available, or a PUDL assigned ID when FERC accounts have been split or combined in reporting. |
record_id |
string |
Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped. |
report_year |
integer |
Four-digit year in which the data was reported. |
row_type_xbrl |
string |
Indicates whether the value reported in the row is calculated, or uniquely reported within the table. |
starting_balance |
number |
Account balance at beginning of year. |
utility_id_ferc1 |
integer |
PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year. |
utility_id_ferc1_dbf |
integer |
FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year. |
utility_id_ferc1_xbrl |
string |
FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year. |
utility_id_pudl |
integer |
A manually assigned PUDL utility ID. May not be stable over time. |
utility_name_ferc1 |
string |
Name of the responding utility, as it is reported in FERC Form 1. For human readability only. |
utility_type |
string |
Listing of utility plant types. Examples include Electric Utility, Gas Utility, and Other Utility. |
out_ferc1__yearly_balance_sheet_liabilities_sched110#
Noticed any irregularities with this table? File a report.
Annual time series of utility liabilities and other credits.
- Most-recent data:
2024
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 110)
- Primary key:
utility_id_ferc1, report_year, liability_type
Usage Warnings#
Some columns contain subtotals; use caution when choosing columns to aggregate.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_balance_sheet_liabilities_sched110.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_balance_sheet_liabilities_sched110.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_balance_sheet_liabilities_sched110.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_balance_sheet_liabilities_sched110.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
balance |
string |
Indication of whether a column is a credit or debit, as reported in the XBRL taxonomy. |
ending_balance |
number |
Account balance at end of year. |
ferc_account |
string |
Actual FERC Account number (e.g. ‘359.1’) if available, or a PUDL assigned ID when FERC accounts have been split or combined in reporting. |
liability_type |
string |
Type of liability being reported to the core_ferc1__yearly_balance_sheet_liabilities_sched110 table. |
record_id |
string |
Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped. |
report_year |
integer |
Four-digit year in which the data was reported. |
row_type_xbrl |
string |
Indicates whether the value reported in the row is calculated, or uniquely reported within the table. |
starting_balance |
number |
Account balance at beginning of year. |
utility_id_ferc1 |
integer |
PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year. |
utility_id_ferc1_dbf |
integer |
FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year. |
utility_id_ferc1_xbrl |
string |
FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year. |
utility_id_pudl |
integer |
A manually assigned PUDL utility ID. May not be stable over time. |
utility_name_ferc1 |
string |
Name of the responding utility, as it is reported in FERC Form 1. For human readability only. |
utility_type |
string |
Listing of utility plant types. Examples include Electric Utility, Gas Utility, and Other Utility. |
out_ferc1__yearly_cash_flows_sched120#
Noticed any irregularities with this table? File a report.
Annual time series of utility cash flow.
- Most-recent data:
2024
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 120)
- Primary key:
utility_id_ferc1, report_year, amount_type
Usage Warnings#
Some columns contain subtotals; use caution when choosing columns to aggregate.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_cash_flows_sched120.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_cash_flows_sched120.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_cash_flows_sched120.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_cash_flows_sched120.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
amount |
number |
Reported amount of dollars. This could be a balance or a change in value. |
amount_type |
string |
Label describing the type of amount being reported. This could be a balance or a change in value. |
balance |
string |
Indication of whether a column is a credit or debit, as reported in the XBRL taxonomy. |
record_id |
string |
Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped. |
report_year |
integer |
Four-digit year in which the data was reported. |
row_type_xbrl |
string |
Indicates whether the value reported in the row is calculated, or uniquely reported within the table. |
utility_id_ferc1 |
integer |
PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year. |
utility_id_ferc1_dbf |
integer |
FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year. |
utility_id_ferc1_xbrl |
string |
FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year. |
utility_id_pudl |
integer |
A manually assigned PUDL utility ID. May not be stable over time. |
utility_name_ferc1 |
string |
Name of the responding utility, as it is reported in FERC Form 1. For human readability only. |
out_ferc1__yearly_depreciation_by_function_sched219#
Noticed any irregularities with this table? File a report.
Annual time series of ending balances in accumulated provision for depreciation of electric utility plant.
- Most-recent data:
2024
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 219 - Section B)
- Primary key:
utility_id_ferc1, report_year, depreciation_type, plant_function, plant_status, utility_type
Usage Warnings#
Some columns contain subtotals; use caution when choosing columns to aggregate.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.
Additional Details#
Electric utility plant refers to FERC Account 108. Section B includes the Balances at End of Year According to Functional Classification (plant_function).
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_depreciation_by_function_sched219.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_depreciation_by_function_sched219.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_depreciation_by_function_sched219.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_depreciation_by_function_sched219.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
balance |
string |
Indication of whether a column is a credit or debit, as reported in the XBRL taxonomy. |
depreciation_type |
string |
Type of depreciation provision within FERC Account 108, including cost ofremoval, depreciation expenses, salvage, cost of retired plant, etc. |
ending_balance |
number |
Account balance at end of year. |
plant_function |
string |
Functional role played by utility plant (steam production, nuclear production, distribution, transmission, etc.). |
plant_status |
string |
Utility plant financial status (in service, future, leased, total). |
record_id |
string |
Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped. |
report_year |
integer |
Four-digit year in which the data was reported. |
row_type_xbrl |
string |
Indicates whether the value reported in the row is calculated, or uniquely reported within the table. |
utility_id_ferc1 |
integer |
PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year. |
utility_id_ferc1_dbf |
integer |
FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year. |
utility_id_ferc1_xbrl |
string |
FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year. |
utility_id_pudl |
integer |
A manually assigned PUDL utility ID. May not be stable over time. |
utility_name_ferc1 |
string |
Name of the responding utility, as it is reported in FERC Form 1. For human readability only. |
utility_type |
string |
Listing of utility plant types. Examples include Electric Utility, Gas Utility, and Other Utility. |
out_ferc1__yearly_depreciation_changes_sched219#
Noticed any irregularities with this table? File a report.
Annual time series of changes in accumulated provision for depreciation of electric utility plant.
- Most-recent data:
2024
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 219 - Section A)
- Primary key:
utility_id_ferc1, report_year, depreciation_type, plant_status, utility_type
Usage Warnings#
Some columns contain subtotals; use caution when choosing columns to aggregate.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.
Additional Details#
Electric utility plant refers to FERC Account 108.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_depreciation_changes_sched219.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_depreciation_changes_sched219.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_depreciation_changes_sched219.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_depreciation_changes_sched219.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
balance |
string |
Indication of whether a column is a credit or debit, as reported in the XBRL taxonomy. |
depreciation_type |
string |
Type of depreciation provision within FERC Account 108, including cost ofremoval, depreciation expenses, salvage, cost of retired plant, etc. |
dollar_value |
number |
Dollar value of reported income, expense, asset, or liability. |
ferc_account |
string |
Actual FERC Account number (e.g. ‘359.1’) if available, or a PUDL assigned ID when FERC accounts have been split or combined in reporting. |
plant_status |
string |
Utility plant financial status (in service, future, leased, total). |
record_id |
string |
Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped. |
report_year |
integer |
Four-digit year in which the data was reported. |
row_type_xbrl |
string |
Indicates whether the value reported in the row is calculated, or uniquely reported within the table. |
utility_id_ferc1 |
integer |
PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year. |
utility_id_ferc1_dbf |
integer |
FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year. |
utility_id_ferc1_xbrl |
string |
FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year. |
utility_id_pudl |
integer |
A manually assigned PUDL utility ID. May not be stable over time. |
utility_name_ferc1 |
string |
Name of the responding utility, as it is reported in FERC Form 1. For human readability only. |
utility_type |
string |
Listing of utility plant types. Examples include Electric Utility, Gas Utility, and Other Utility. |
out_ferc1__yearly_depreciation_factors_sched336#
Noticed any irregularities with this table? File a report.
Annual time series of factors used in estimating depreciation charges.
- Most-recent data:
2024
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 336 - Section C)
- Primary key:
This table has no primary key. This table is too unstructured to have a primary key.
Usage Warnings#
Some columns contain subtotals; use caution when choosing columns to aggregate.
Contains columns which may appear categorical, but are actually free text.
The rate and percentage (pct) columns are reported either as values between 0-1 or 0-100. It is often consistent within a given utility-year but use caution when comparing value across different utilities and even the same utility filing in a different year.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.
Additional Details#
This table contains details at a variety of levels of granularity.There are many free-form text fields in this table which results in respondents filling out this table very differently from each other or from year to year. We recommend using this table carefully with one utility-year at a time - not attempting to perform analysis across long time-series or across utilities without much caution and cleaning. This table only contains information from Section C: Factors Used in Estimating Depreciation Charges.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_depreciation_factors_sched336.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_depreciation_factors_sched336.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_depreciation_factors_sched336.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_depreciation_factors_sched336.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
account_num |
string |
Account number(s) in connection with factors used in estimating depreciation charges. |
depreciable_plant_base |
number |
Depreciable plant balance (depreciable base) to which rates are applied. |
depreciation_factors |
string |
Label of the factor of depreciation factors. This field contains is an unstructured, free-form strings. It often includes FERC account IDs, sometimes includes plant names and sometimes includes headers indicating the beginning of a new section - such as a plant or asset type header which is followed by sub-components like FERC account IDs or plant names depending on the section. |
depreciation_rate |
string |
Depreciation rate applied to utility plant balance.Be aware that the formatting of this column is not expected to be standard - expect some values between 0-1 and some between 0-100. |
mortality_curve_type |
string |
Description of the type of mortality curve selected in plant mortality studies prepared to assist in estimating average service lives. |
net_salvage_pct |
number |
Percentage representing the estimated value of utility plant at the end of its service life. Be aware that the formatting of this column is not expected to be standard - expect some values between 0-1 and some between 0-100. |
order_num |
number |
This field is defined in FERC-XBRL documentation as a field that is used to sequence a table.FERC-XBRL documentation notes: ‘This field is added to a table to control ordering of the items on the table.’FERC’s documentation also notes that this field should always be an integer - although there are many instances of floating point values which seem to increment by decimal points. Nonetheless, this field can be used to help understand the original order of the table. This field did not exist prior to FERC publishing Form 1 as XBRL and thus is always null prior to 2021. |
record_id |
string |
Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped. |
remaining_life_avg |
number |
Estimated weighted average of remaining life of utility plant assets.The original DBF data was reported as years while the XBRL data was reported as strings in this time duration format: PnYnMnDTnHnMnS. PUDL converts these string time durations into years. |
report_year |
integer |
Four-digit year in which the data was reported. |
service_life_avg |
number |
Estimated average service life of utility plant. The original DBF data was reported as years while the XBRL data was reported as strings in this time duration format: PnYnMnDTnHnMnS. PUDL converts these string time durations into years. |
utility_id_ferc1 |
integer |
PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year. |
utility_id_ferc1_dbf |
integer |
FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year. |
utility_id_ferc1_xbrl |
string |
FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year. |
utility_id_pudl |
integer |
A manually assigned PUDL utility ID. May not be stable over time. |
utility_name_ferc1 |
string |
Name of the responding utility, as it is reported in FERC Form 1. For human readability only. |
out_ferc1__yearly_depreciation_summary_sched336#
Noticed any irregularities with this table? File a report.
Annual time series of depreciation and amortization of electric plant.
- Most-recent data:
2024
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 336 - Section A)
- Primary key:
utility_id_ferc1, report_year, plant_function, ferc_account_label
Usage Warnings#
Some columns contain subtotals; use caution when choosing columns to aggregate.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.
Additional Details#
Electric Plant refers to FERC Accounts 403, 404, and 405. This table only contains information from Section A: Summary of depreciation and amortization changes.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_depreciation_summary_sched336.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_depreciation_summary_sched336.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_depreciation_summary_sched336.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_depreciation_summary_sched336.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
dollar_value |
number |
Dollar value of reported income, expense, asset, or liability. |
ferc_account |
string |
Actual FERC Account number (e.g. ‘359.1’) if available, or a PUDL assigned ID when FERC accounts have been split or combined in reporting. |
ferc_account_label |
string |
Long FERC account identifier derived from values reported in the XBRL taxonomies. May also refer to aggregations of individual FERC accounts. |
plant_function |
string |
Functional role played by utility plant (steam production, nuclear production, distribution, transmission, etc.). |
record_id |
string |
Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped. |
report_year |
integer |
Four-digit year in which the data was reported. |
utility_id_ferc1 |
integer |
PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year. |
utility_id_ferc1_dbf |
integer |
FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year. |
utility_id_ferc1_xbrl |
string |
FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year. |
utility_id_pudl |
integer |
A manually assigned PUDL utility ID. May not be stable over time. |
utility_name_ferc1 |
string |
Name of the responding utility, as it is reported in FERC Form 1. For human readability only. |
utility_type |
string |
Listing of utility plant types. Examples include Electric Utility, Gas Utility, and Other Utility. |
out_ferc1__yearly_detailed_balance_sheet_assets#
Noticed any irregularities with this table? File a report.
Annual time series of granular, de-duplicated accounting data of utilities’ balance sheet assets.
- Most-recent data:
2024
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
FERC Form 1 – Annual Report of Major Electric Utilities (Schedules 110, 200, 204 and 219)
- Primary key:
This table has no primary key.
Usage Warnings#
The data from these xbrl_factoid tables contains nested totals and subtotals - making aggregations difficult. We used FERC 1’s reported calculations to determine the most granular, non-duplicative records. The reported calculations are not always perfect - we’ve corrected many of them, but errors could still exist.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.
Additional Details#
This table is derived from four FERC Form 1 accounting tables with nested calculations:
core_ferc1__yearly_balance_sheet_assets_sched110
core_ferc1__yearly_utility_plant_summary_sched200
core_ferc1__yearly_plant_in_service_sched204
core_ferc1__yearly_depreciation_by_function_sched219
We reconciled the nested calculations within these tables and then identified the most granular data across the tables.
We applied slight modifications to three columns (utility_type, plant_function & plant_status) as compared to the originally reported values in our core tables. The modifications were applied to either provide more specificity (i.e. we converted some total utility_type’s into electric) or to condense similar categories for easier analysis (i.e. creating a hydraulic_production plant_function by combining hydraulic_production_conventional and hydraulic_production_pumped_storage).
See pudl.output.ferc1.Exploder for more details. This table was made entirely with support and direction from RMI.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_detailed_balance_sheet_assets.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_detailed_balance_sheet_assets.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_detailed_balance_sheet_assets.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_detailed_balance_sheet_assets.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
ending_balance |
number |
Account balance at end of year. |
in_rate_base |
boolean |
Whether or not a record from the detailed FERC1 accounting tables should be considered allowable in a utility’s rate base based on utility accounting standards. This flag was manually compiled by RMI utility accounting experts based on the xbrl_factoid and sometimes varies based on the utility_type, plant_status or plant_function. |
plant_function |
string |
Functional role played by utility plant (steam production, nuclear production, distribution, transmission, etc.). |
plant_status |
string |
Utility plant financial status (in service, future, leased, total). |
rate_base_category |
string |
A category of asset or liability that RMI compiled to use as a shorthand for various types of utility assets. These tags were compiled manually based on the xbrl_factoid and sometimes varies based on the utility_type, plant_function or plant_status as well. |
report_year |
integer |
Four-digit year in which the data was reported. |
table_name |
string |
The name of the PUDL database table where a given record originated from. |
utility_id_ferc1 |
integer |
PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year. |
utility_id_ferc1_dbf |
integer |
FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year. |
utility_id_ferc1_xbrl |
string |
FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year. |
utility_id_pudl |
integer |
A manually assigned PUDL utility ID. May not be stable over time. |
utility_name_ferc1 |
string |
Name of the responding utility, as it is reported in FERC Form 1. For human readability only. |
utility_type |
string |
Listing of utility plant types. Examples include Electric Utility, Gas Utility, and Other Utility. |
utility_type_other |
string |
Freeform description of type of utility reported in one of the other three other utility_type sections in the core_ferc1__yearly_utility_plant_summary_sched200 table. This field is reported only in the DBF reporting years (1994-2020). |
xbrl_factoid |
string |
The name of type of value which is a derivative of the XBRL fact name. |
out_ferc1__yearly_detailed_balance_sheet_liabilities#
Noticed any irregularities with this table? File a report.
Annual time series of granular, de-duplicated accounting data of utilities’ balance sheet liabilities.
- Most-recent data:
2024
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 110 and 118)
- Primary key:
This table has no primary key.
Usage Warnings#
The data from these xbrl_factoid tables contains nested totals and subtotals - making aggregations difficult. We used FERC 1’s reported calculations to determine the most granular, non-duplicative records. The reported calculations are not always perfect - we’ve corrected many of them, but errors could still exist.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.
Additional Details#
This table is derived from two FERC Form 1 accounting tables with nested calculations:
core_ferc1__yearly_balance_sheet_liabilities_sched110
core_ferc1__yearly_retained_earnings_sched118
We reconciled the nested calculations within these tables and then identified the most granular data across the tables.
We applied slight modifications to three columns (utility_type, plant_function & plant_status) as compared to the originally reported values in our core tables. The modifications were applied to either provide more specificity (i.e. we converted some total utility_type’s into electric) or to condense similar categories for easier analysis (i.e. creating a hydraulic_production plant_function by combining hydraulic_production_conventional and hydraulic_production_pumped_storage).
See pudl.output.ferc1.Exploder for more details. This table was made entirely with support and direction from RMI.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_detailed_balance_sheet_liabilities.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_detailed_balance_sheet_liabilities.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_detailed_balance_sheet_liabilities.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_detailed_balance_sheet_liabilities.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
ending_balance |
number |
Account balance at end of year. |
in_rate_base |
boolean |
Whether or not a record from the detailed FERC1 accounting tables should be considered allowable in a utility’s rate base based on utility accounting standards. This flag was manually compiled by RMI utility accounting experts based on the xbrl_factoid and sometimes varies based on the utility_type, plant_status or plant_function. |
rate_base_category |
string |
A category of asset or liability that RMI compiled to use as a shorthand for various types of utility assets. These tags were compiled manually based on the xbrl_factoid and sometimes varies based on the utility_type, plant_function or plant_status as well. |
report_year |
integer |
Four-digit year in which the data was reported. |
table_name |
string |
The name of the PUDL database table where a given record originated from. |
utility_id_ferc1 |
integer |
PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year. |
utility_id_ferc1_dbf |
integer |
FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year. |
utility_id_ferc1_xbrl |
string |
FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year. |
utility_id_pudl |
integer |
A manually assigned PUDL utility ID. May not be stable over time. |
utility_name_ferc1 |
string |
Name of the responding utility, as it is reported in FERC Form 1. For human readability only. |
utility_type |
string |
Listing of utility plant types. Examples include Electric Utility, Gas Utility, and Other Utility. |
xbrl_factoid |
string |
The name of type of value which is a derivative of the XBRL fact name. |
out_ferc1__yearly_detailed_income_statements#
Noticed any irregularities with this table? File a report.
Annual time series of granular, de-duplicated accounting data of utilities’ income statements.
- Most-recent data:
2024
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
FERC Form 1 – Annual Report of Major Electric Utilities (Schedules 114, 300, 320 and 336)
- Primary key:
This table has no primary key.
Usage Warnings#
The data from these xbrl_factoid tables contains nested totals and subtotals - making aggregations difficult. We used FERC 1’s reported calculations to determine the most granular, non-duplicative records. The reported calculations are not always perfect - we’ve corrected many of them, but errors could still exist.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.
Additional Details#
This table is derived from four FERC Form 1 accounting tables with nested calculations:
We reconciled the nested calculations within these tables and then identified the most granular data across the tables.
We applied slight modifications to two columns (utility_type & plant_function) as compared to the originally reported values in our core tables. The modifications were applied to either provide more specificity (i.e. we converted some total utility_type’s into electric) or to condense similar categories for easier analysis (i.e. creating a hydraulic_production plant_function by combining hydraulic_production_conventional and hydraulic_production_pumped_storage).
See pudl.output.ferc1.Exploder for more details. This table was made entirely with support and direction from RMI.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_detailed_income_statements.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_detailed_income_statements.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_detailed_income_statements.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_detailed_income_statements.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
dollar_value |
number |
Dollar value of reported income, expense, asset, or liability. |
in_revenue_requirement |
boolean |
Whether or not a record from the detailed income statement data is typically included in a utility’s revenue requirement. This flag was manually compiled by RMI utility accounting experts based on the xbrl_factoid and sometimes varies based on the utility_type or plant_function. |
plant_function |
string |
Functional role played by utility plant (steam production, nuclear production, distribution, transmission, etc.). |
report_year |
integer |
Four-digit year in which the data was reported. |
revenue_requirement_category |
string |
The category of revenue requirement associated with each component of utility’sincome statements. These categories were manually compiled by RMI utility accounting experts based on the xbrl_factoid and sometimes vary based on the utility_type or plant_function. This column is intended to be used to aggregate this table. |
revenue_requirement_technology |
string |
The technology type associated with components of a utility’s revenue requirement. These categories were manually compiled by RMI utility accounting experts based on the xbrl_factoid and sometimes vary based on the utility_type or plant_function as well. This column is intended to be used to aggregate this table. |
table_name |
string |
The name of the PUDL database table where a given record originated from. |
utility_id_ferc1 |
integer |
PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year. |
utility_id_ferc1_dbf |
integer |
FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year. |
utility_id_ferc1_xbrl |
string |
FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year. |
utility_id_pudl |
integer |
A manually assigned PUDL utility ID. May not be stable over time. |
utility_name_ferc1 |
string |
Name of the responding utility, as it is reported in FERC Form 1. For human readability only. |
utility_type |
string |
Listing of utility plant types. Examples include Electric Utility, Gas Utility, and Other Utility. |
xbrl_factoid |
string |
The name of type of value which is a derivative of the XBRL fact name. |
out_ferc1__yearly_energy_dispositions_sched401#
Noticed any irregularities with this table? File a report.
Annual time series of dispositions of electric energy sold, exchanged, or stored.
- Most-recent data:
2024
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 401a)
- Primary key:
utility_id_ferc1, report_year, energy_disposition_type
Usage Warnings#
Some columns contain subtotals; use caution when choosing columns to aggregate.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.
Additional Details#
Electric Energy Account, dispositions only. Schedule 401a. Electricity utilities delivered to end users, internal losses, etc.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_energy_dispositions_sched401.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_energy_dispositions_sched401.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_energy_dispositions_sched401.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_energy_dispositions_sched401.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
energy_disposition_type |
string |
Type of energy disposition reported in the core_ferc1__yearly_energy_dispositions_sched401. Dispositions include sales to customers, re-sales of energy, energy used internally, losses, etc. |
energy_mwh |
number |
Sources and uses of energy in MWh. |
record_id |
string |
Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped. |
report_year |
integer |
Four-digit year in which the data was reported. |
row_type_xbrl |
string |
Indicates whether the value reported in the row is calculated, or uniquely reported within the table. |
utility_id_ferc1 |
integer |
PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year. |
utility_id_ferc1_dbf |
integer |
FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year. |
utility_id_ferc1_xbrl |
string |
FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year. |
utility_id_pudl |
integer |
A manually assigned PUDL utility ID. May not be stable over time. |
utility_name_ferc1 |
string |
Name of the responding utility, as it is reported in FERC Form 1. For human readability only. |
out_ferc1__yearly_energy_sources_sched401#
Noticed any irregularities with this table? File a report.
Annual time series of sources of electric energy generated or purchased, exchanged and wheeled.
- Most-recent data:
2024
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 401a)
- Primary key:
utility_id_ferc1, report_year, energy_source_type
Usage Warnings#
Some columns contain subtotals; use caution when choosing columns to aggregate.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.
Additional Details#
Electric Energy Account, sources only. Schedule 401a. Amount of electricity the utility obtained from each of several sources.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_energy_sources_sched401.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_energy_sources_sched401.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_energy_sources_sched401.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_energy_sources_sched401.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
energy_mwh |
number |
Sources and uses of energy in MWh. |
energy_source_type |
string |
Type of energy source reported in the core_ferc1__yearly_energy_sources_sched401 table. There are three groups of energy sources: generation, power exchanges and transmission. |
record_id |
string |
Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped. |
report_year |
integer |
Four-digit year in which the data was reported. |
row_type_xbrl |
string |
Indicates whether the value reported in the row is calculated, or uniquely reported within the table. |
utility_id_ferc1 |
integer |
PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year. |
utility_id_ferc1_dbf |
integer |
FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year. |
utility_id_ferc1_xbrl |
string |
FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year. |
utility_id_pudl |
integer |
A manually assigned PUDL utility ID. May not be stable over time. |
utility_name_ferc1 |
string |
Name of the responding utility, as it is reported in FERC Form 1. For human readability only. |
out_ferc1__yearly_hydroelectric_plants_sched406#
Noticed any irregularities with this table? File a report.
Annual time series of plant statistics for large hydroelectric generating plants.
- Most-recent data:
2024
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 406)
- Primary key:
record_id The best approximation for primary keys for this table would be:
report_year,utility_id_ferc1,plant_name_ferc1. FERC does not publish plant IDs. The main identifying column isplant_name_ferc1but that is a free-form string field and there are duplicate records.
Usage Warnings#
The
plant_typeandconstruction_typeare standardized into categorical values from free-form strings.FERC does not restrict respondents to report unique and non-duplicative plant records. There are sporadic instances of respondents reporting portions of plants and then the total plant (ex: unit 1, unit 2 and total). Use caution when aggregating.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.
Additional Details#
Large plants have an installed nameplate capacity of more than 10 MW.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_hydroelectric_plants_sched406.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_hydroelectric_plants_sched406.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_hydroelectric_plants_sched406.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_hydroelectric_plants_sched406.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
asset_retirement_cost |
number |
Asset retirement cost (USD). |
avg_num_employees |
number |
The average number of employees assigned to each plant. |
capacity_factor |
number |
Fraction of potential generation that was actually reported for a plant part. Energy generated over time period / nameplate capacity * time period (hours/years/etc.). |
capacity_mw |
number |
Total installed (nameplate) capacity, in megawatts. |
capex_equipment |
number |
Cost of plant: equipment (USD). |
capex_facilities |
number |
Cost of plant: reservoirs, dams, and waterways (USD). |
capex_land |
number |
Cost of plant: land and land rights (USD). |
capex_per_mw |
number |
Cost of plant per megawatt of installed (nameplate) capacity. Nominal USD. |
capex_roads |
number |
Cost of plant: roads, railroads, and bridges (USD). |
capex_structures |
number |
Cost of plant: structures and improvements (USD). |
capex_total |
number |
Total cost of plant (USD). |
construction_type |
string |
Type of plant construction (‘outdoor’, ‘semioutdoor’, or ‘conventional’). Categorized by PUDL based on our best guess of intended value in FERC1 freeform strings. |
construction_year |
integer |
Year the plant’s oldest still operational unit was built. |
installation_year |
integer |
Year the plant’s most recently built unit was installed. |
net_capacity_adverse_conditions_mw |
number |
Net plant capability under the least favorable operating conditions, in megawatts. |
net_capacity_favorable_conditions_mw |
number |
Net plant capability under the most favorable operating conditions, in megawatts. |
net_generation_mwh |
number |
Net electricity generation for the specified period in megawatt-hours (MWh). |
opex_dams |
number |
Production expenses: maintenance of reservoirs, dams, and waterways (USD). |
opex_electric |
number |
Production expenses: electric expenses (USD). |
opex_engineering |
number |
Production expenses: maintenance, supervision, and engineering (USD). |
opex_generation_misc |
number |
Production expenses: miscellaneous power generation expenses (USD). |
opex_hydraulic |
number |
Production expenses: hydraulic expenses (USD). |
opex_misc_plant |
number |
Production expenses: maintenance of miscellaneous hydraulic plant (USD). |
opex_operations |
number |
Production expenses: operations, supervision, and engineering (USD). |
opex_per_mwh |
number |
Total production expenses (USD per MWh generated). |
opex_plant |
number |
Production expenses: maintenance of electric plant (USD). |
opex_rents |
number |
Production expenses: rents (USD). |
opex_structures |
number |
Production expenses: maintenance of structures (USD). |
opex_total |
number |
Total production expenses, excluding fuel (USD). |
opex_total_nonfuel |
number |
Total production expenses, excluding fuel (USD). |
opex_water_for_power |
number |
Production expenses: water for power (USD). |
peak_demand_mw |
number |
Net peak demand for 60 minutes. Note: in some cases peak demand for other time periods may have been reported instead, if hourly peak demand was unavailable. |
plant_hours_connected_while_generating |
number |
Hours the plant was connected to load while generating in the report year. |
plant_id_pudl |
integer |
A manually assigned PUDL plant ID. May not be constant over time. |
plant_name_ferc1 |
string |
Name of the plant, as reported to FERC. This is a freeform string, not guaranteed to be consistent across references to the same plant. |
plant_type |
string |
Type of plant. |
project_num |
integer |
FERC Licensed Project Number. |
record_id |
string |
Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped. |
report_year |
integer |
Four-digit year in which the data was reported. |
utility_id_ferc1 |
integer |
PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year. |
utility_id_ferc1_dbf |
integer |
FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year. |
utility_id_ferc1_xbrl |
string |
FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year. |
utility_id_pudl |
integer |
A manually assigned PUDL utility ID. May not be stable over time. |
utility_name_ferc1 |
string |
Name of the responding utility, as it is reported in FERC Form 1. For human readability only. |
out_ferc1__yearly_income_statements_sched114#
Noticed any irregularities with this table? File a report.
Annual time series of utility income statements.
- Most-recent data:
2024
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 114)
- Primary key:
utility_id_ferc1, report_year, utility_type, income_type
Usage Warnings#
Some columns contain subtotals; use caution when choosing columns to aggregate.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_income_statements_sched114.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_income_statements_sched114.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_income_statements_sched114.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_income_statements_sched114.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
balance |
string |
Indication of whether a column is a credit or debit, as reported in the XBRL taxonomy. |
dollar_value |
number |
Dollar value of reported income, expense, asset, or liability. |
ferc_account |
string |
Actual FERC Account number (e.g. ‘359.1’) if available, or a PUDL assigned ID when FERC accounts have been split or combined in reporting. |
income_type |
string |
Type of income reported in core_ferc1__yearly_income_statements_sched114 table. |
record_id |
string |
Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped. |
report_year |
integer |
Four-digit year in which the data was reported. |
row_type_xbrl |
string |
Indicates whether the value reported in the row is calculated, or uniquely reported within the table. |
utility_id_ferc1 |
integer |
PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year. |
utility_id_ferc1_dbf |
integer |
FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year. |
utility_id_ferc1_xbrl |
string |
FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year. |
utility_id_pudl |
integer |
A manually assigned PUDL utility ID. May not be stable over time. |
utility_name_ferc1 |
string |
Name of the responding utility, as it is reported in FERC Form 1. For human readability only. |
utility_type |
string |
Listing of utility plant types. Examples include Electric Utility, Gas Utility, and Other Utility. |
out_ferc1__yearly_operating_expenses_sched320#
Noticed any irregularities with this table? File a report.
Annual time series of operating and maintenance costs associated with producing electricity.
- Most-recent data:
2024
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 320)
- Primary key:
utility_id_ferc1, report_year, expense_type
Usage Warnings#
Some columns contain subtotals; use caution when choosing columns to aggregate.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_operating_expenses_sched320.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_operating_expenses_sched320.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_operating_expenses_sched320.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_operating_expenses_sched320.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
dollar_value |
number |
Dollar value of reported income, expense, asset, or liability. |
expense_type |
string |
The type of expense. |
ferc_account |
string |
Actual FERC Account number (e.g. ‘359.1’) if available, or a PUDL assigned ID when FERC accounts have been split or combined in reporting. |
record_id |
string |
Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped. |
report_year |
integer |
Four-digit year in which the data was reported. |
row_type_xbrl |
string |
Indicates whether the value reported in the row is calculated, or uniquely reported within the table. |
utility_id_ferc1 |
integer |
PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year. |
utility_id_ferc1_dbf |
integer |
FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year. |
utility_id_ferc1_xbrl |
string |
FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year. |
utility_id_pudl |
integer |
A manually assigned PUDL utility ID. May not be stable over time. |
utility_name_ferc1 |
string |
Name of the responding utility, as it is reported in FERC Form 1. For human readability only. |
utility_type |
string |
Listing of utility plant types. Examples include Electric Utility, Gas Utility, and Other Utility. |
out_ferc1__yearly_operating_revenues_sched300#
Noticed any irregularities with this table? File a report.
Annual time series of utilities’ electric operating revenues.
- Most-recent data:
2024
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 300)
- Primary key:
utility_id_ferc1, report_year, revenue_type
Usage Warnings#
Some columns contain subtotals; use caution when choosing columns to aggregate.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.
Additional Details#
This table includes only the structured part of schedule 300. There are a number of revenue_type’s that do not have sales_mwh,or avg_customers_per_month provided, in which case these columnswill be NULL.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_operating_revenues_sched300.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_operating_revenues_sched300.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_operating_revenues_sched300.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_operating_revenues_sched300.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
avg_customers_per_month |
number |
Average number of customers per month. |
dollar_value |
number |
Dollar value of reported income, expense, asset, or liability. |
ferc_account |
string |
Actual FERC Account number (e.g. ‘359.1’) if available, or a PUDL assigned ID when FERC accounts have been split or combined in reporting. |
record_id |
string |
Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped. |
report_year |
integer |
Four-digit year in which the data was reported. |
revenue_type |
string |
Label describing types of revenues. |
row_type_xbrl |
string |
Indicates whether the value reported in the row is calculated, or uniquely reported within the table. |
sales_mwh |
number |
Quantity of electricity sold in MWh. |
utility_id_ferc1 |
integer |
PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year. |
utility_id_ferc1_dbf |
integer |
FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year. |
utility_id_ferc1_xbrl |
string |
FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year. |
utility_id_pudl |
integer |
A manually assigned PUDL utility ID. May not be stable over time. |
utility_name_ferc1 |
string |
Name of the responding utility, as it is reported in FERC Form 1. For human readability only. |
utility_type |
string |
Listing of utility plant types. Examples include Electric Utility, Gas Utility, and Other Utility. |
out_ferc1__yearly_other_regulatory_assets_sched232#
Noticed any irregularities with this table? File a report.
Annual time series of utilities’ other regulatory assets, including rate order docket number.
- Most-recent data:
2024
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 232)
- Primary key:
This table has no primary key. Respondents are able to enter any number of assets across many rows. There are no IDs or set fields enforced in the original table.
Usage Warnings#
Contains columns which may appear categorical, but are actually free text.
Some columns contain subtotals; use caution when choosing columns to aggregate.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_other_regulatory_assets_sched232.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_other_regulatory_assets_sched232.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_other_regulatory_assets_sched232.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_other_regulatory_assets_sched232.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
account_detail |
string |
Description of the account number credited from making debit adjustment to other regulatory liabilities. |
additional_description |
string |
Additional free-form description column. In most instances, the values in this column are exactly the same as the description column,but it differs from the standard description enough that we kept both columns. This field did not exist prior to FERC publishing Form 1 as XBRL and thus is always null prior to 2021. |
credits_written_off_recovered |
number |
Recovered amount of divestiture of other assets lacking physical substance resulting from the ratemaking actions of regulatory agencies. |
debits |
number |
The increase (decrease) during the period in the value of other assets resulting from the ratemaking actions of regulatory agencies. |
description |
string |
Long human-readable description of the meaning of a code/label. |
ending_balance |
number |
Account balance at end of year. |
report_year |
integer |
Four-digit year in which the data was reported. |
starting_balance |
number |
Account balance at beginning of year. |
utility_id_ferc1 |
integer |
PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year. |
utility_id_ferc1_dbf |
integer |
FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year. |
utility_id_ferc1_xbrl |
string |
FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year. |
utility_id_pudl |
integer |
A manually assigned PUDL utility ID. May not be stable over time. |
utility_name_ferc1 |
string |
Name of the responding utility, as it is reported in FERC Form 1. For human readability only. |
out_ferc1__yearly_other_regulatory_liabilities_sched278#
Noticed any irregularities with this table? File a report.
Annual time series of utilities’ other regulatory liabilities, including rate order docket number.
- Most-recent data:
2024
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 278)
- Primary key:
This table has no primary key. Respondents are able to enter any number of liabilities across many rows. There are no IDs or set fields enforced in the original table.
Usage Warnings#
Contains columns which may appear categorical, but are actually free text.
Some columns contain subtotals; use caution when choosing columns to aggregate.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_other_regulatory_liabilities_sched278.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_other_regulatory_liabilities_sched278.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_other_regulatory_liabilities_sched278.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_other_regulatory_liabilities_sched278.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
account_detail |
string |
Description of the account number credited from making debit adjustment to other regulatory liabilities. |
decrease_in_other_regulatory_liabilities |
number |
The decrease during the reporting period of other regulatory liabilities. |
description |
string |
Long human-readable description of the meaning of a code/label. |
ending_balance |
number |
Account balance at end of year. |
increase_in_other_regulatory_liabilities |
number |
The increase during the reporting period of other regulatory liabilities. |
report_year |
integer |
Four-digit year in which the data was reported. |
starting_balance |
number |
Account balance at beginning of year. |
utility_id_ferc1 |
integer |
PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year. |
utility_id_ferc1_dbf |
integer |
FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year. |
utility_id_ferc1_xbrl |
string |
FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year. |
utility_id_pudl |
integer |
A manually assigned PUDL utility ID. May not be stable over time. |
utility_name_ferc1 |
string |
Name of the responding utility, as it is reported in FERC Form 1. For human readability only. |
out_ferc1__yearly_plant_in_service_sched204#
Noticed any irregularities with this table? File a report.
Annual time series of utilities’ balances and changes to FERC Electric Plant in Service accounts.
- Most-recent data:
2024
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 204)
- Primary key:
utility_id_ferc1, report_year, ferc_account_label
Usage Warnings#
Some columns contain subtotals; use caution when choosing columns to aggregate.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.
Additional Details#
Account numbers correspond to the FERC Uniform System of Accounts for Electric Plant, which is defined in Code of Federal Regulations (CFR) Title 18, Chapter I, Subchapter C, Part 101. (See e.g. https://www.law.cornell.edu/cfr/text/18/part-101). Each FERC respondent reports starting and ending balances for each account annually. Balances are organization wide, and are not broken down on a per-plant basis. End of year balance should equal beginning year balance plus the sum of additions, retirements, adjustments, and transfers.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_plant_in_service_sched204.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_plant_in_service_sched204.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_plant_in_service_sched204.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_plant_in_service_sched204.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
additions |
number |
Cost of acquisition of items classified within the account. |
adjustments |
number |
Cost of adjustments to the account. |
ending_balance |
number |
Account balance at end of year. |
ferc_account |
string |
Actual FERC Account number (e.g. ‘359.1’) if available, or a PUDL assigned ID when FERC accounts have been split or combined in reporting. |
ferc_account_label |
string |
Long FERC account identifier derived from values reported in the XBRL taxonomies. May also refer to aggregations of individual FERC accounts. |
plant_status |
string |
Utility plant financial status (in service, future, leased, total). |
record_id |
string |
Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped. |
report_year |
integer |
Four-digit year in which the data was reported. |
retirements |
number |
Cost of disposal of items classified within the account. |
row_type_xbrl |
string |
Indicates whether the value reported in the row is calculated, or uniquely reported within the table. |
starting_balance |
number |
Account balance at beginning of year. |
transfers |
number |
Cost of transfers into (out of) the account. |
utility_id_ferc1 |
integer |
PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year. |
utility_id_ferc1_dbf |
integer |
FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year. |
utility_id_ferc1_xbrl |
string |
FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year. |
utility_id_pudl |
integer |
A manually assigned PUDL utility ID. May not be stable over time. |
utility_name_ferc1 |
string |
Name of the responding utility, as it is reported in FERC Form 1. For human readability only. |
utility_type |
string |
Listing of utility plant types. Examples include Electric Utility, Gas Utility, and Other Utility. |
out_ferc1__yearly_pumped_storage_plants_sched408#
Noticed any irregularities with this table? File a report.
Annual time series of plant statistics for hydroelectric pumped storage plants with an installed nameplate capacity of 10+ MW.
- Most-recent data:
2024
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 408)
- Primary key:
record_id The best approximation for primary keys for this table would be:
report_year,utility_id_ferc1,plant_name_ferc1. FERC does not publish plant IDs. The main identifying column isplant_name_ferc1but that is a free-form string field and there are duplicate records.
Usage Warnings#
FERC does not restrict respondents to report unique and non-duplicative plant records. There are sporadic instances of respondents reporting portions of plants and then the total plant (ex: unit 1, unit 2 and total). Use caution when aggregating.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.
Additional Details#
As reported in Schedule 408 of FERC Form 1.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_pumped_storage_plants_sched408.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_pumped_storage_plants_sched408.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_pumped_storage_plants_sched408.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_pumped_storage_plants_sched408.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
asset_retirement_cost |
number |
Asset retirement cost (USD). |
avg_num_employees |
number |
The average number of employees assigned to each plant. |
capacity_factor |
number |
Fraction of potential generation that was actually reported for a plant part. Energy generated over time period / nameplate capacity * time period (hours/years/etc.). |
capacity_mw |
number |
Total installed (nameplate) capacity, in megawatts. |
capex_equipment_electric |
number |
Cost of plant: accessory electric equipment (USD). |
capex_equipment_misc |
number |
Cost of plant: miscellaneous power plant equipment (USD). |
capex_facilities |
number |
Cost of plant: reservoirs, dams, and waterways (USD). |
capex_land |
number |
Cost of plant: land and land rights (USD). |
capex_per_mw |
number |
Cost of plant per megawatt of installed (nameplate) capacity. Nominal USD. |
capex_roads |
number |
Cost of plant: roads, railroads, and bridges (USD). |
capex_structures |
number |
Cost of plant: structures and improvements (USD). |
capex_total |
number |
Total cost of plant (USD). |
capex_wheels_turbines_generators |
number |
Cost of plant: water wheels, turbines, and generators (USD). |
construction_type |
string |
Type of plant construction (‘outdoor’, ‘semioutdoor’, or ‘conventional’). Categorized by PUDL based on our best guess of intended value in FERC1 freeform strings. |
construction_year |
integer |
Year the plant’s oldest still operational unit was built. |
energy_used_for_pumping_mwh |
number |
Energy used for pumping, in megawatt-hours. |
installation_year |
integer |
Year the plant’s most recently built unit was installed. |
net_generation_mwh |
number |
Net electricity generation for the specified period in megawatt-hours (MWh). |
net_load_mwh |
number |
Net output for load (net generation - energy used for pumping) in megawatt-hours. |
opex_dams |
number |
Production expenses: maintenance of reservoirs, dams, and waterways (USD). |
opex_electric |
number |
Production expenses: electric expenses (USD). |
opex_engineering |
number |
Production expenses: maintenance, supervision, and engineering (USD). |
opex_generation_misc |
number |
Production expenses: miscellaneous power generation expenses (USD). |
opex_misc_plant |
number |
Production expenses: maintenance of miscellaneous hydraulic plant (USD). |
opex_operations |
number |
Production expenses: operations, supervision, and engineering (USD). |
opex_per_mwh |
number |
Total production expenses (USD per MWh generated). |
opex_plant |
number |
Production expenses: maintenance of electric plant (USD). |
opex_production_before_pumping |
number |
Total production expenses before pumping (USD). |
opex_pumped_storage |
number |
Production expenses: pumped storage (USD). |
opex_pumping |
number |
Production expenses: We are here to PUMP YOU UP! (USD). |
opex_rents |
number |
Production expenses: rents (USD). |
opex_structures |
number |
Production expenses: maintenance of structures (USD). |
opex_total |
number |
Total production expenses, excluding fuel (USD). |
opex_total_nonfuel |
number |
Total production expenses, excluding fuel (USD). |
opex_water_for_power |
number |
Production expenses: water for power (USD). |
peak_demand_mw |
number |
Net peak demand for 60 minutes. Note: in some cases peak demand for other time periods may have been reported instead, if hourly peak demand was unavailable. |
plant_capability_mw |
number |
Net plant capability in megawatts. |
plant_hours_connected_while_generating |
number |
Hours the plant was connected to load while generating in the report year. |
plant_id_pudl |
integer |
A manually assigned PUDL plant ID. May not be constant over time. |
plant_name_ferc1 |
string |
Name of the plant, as reported to FERC. This is a freeform string, not guaranteed to be consistent across references to the same plant. |
project_num |
integer |
FERC Licensed Project Number. |
record_id |
string |
Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped. |
report_year |
integer |
Four-digit year in which the data was reported. |
utility_id_ferc1 |
integer |
PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year. |
utility_id_ferc1_dbf |
integer |
FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year. |
utility_id_ferc1_xbrl |
string |
FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year. |
utility_id_pudl |
integer |
A manually assigned PUDL utility ID. May not be stable over time. |
utility_name_ferc1 |
string |
Name of the responding utility, as it is reported in FERC Form 1. For human readability only. |
out_ferc1__yearly_purchased_power_and_exchanges_sched326#
Noticed any irregularities with this table? File a report.
Annual time series of purchased power (Account 555) including power exchanges (transactions involving a balancing of debits and credits for energy, capacity, etc.) and any settlements for imbalanced exchanges.
- Most-recent data:
2024
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 326)
- Primary key:
This table has no primary key. The primary key for this table would be record_id, however a small number of these values are duplicated in 2021, 2022, and 2024.
Usage Warnings#
Contains columns which may appear categorical, but are actually free text.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.
Additional Details#
This table has data about inter-utility power purchases. This includes how much electricity was purchased, how much it cost, and who it was purchased from. Unfortunately the field describing which other utility the power was being bought from (seller_name) is poorly standardized, making it difficult to correlate with other data.
Purchased Power is considered FERC Account 555 according to FERC’s Uniform System of Accounts. Reported on pages 326-327 of FERC Form 1.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_purchased_power_and_exchanges_sched326.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_purchased_power_and_exchanges_sched326.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_purchased_power_and_exchanges_sched326.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_purchased_power_and_exchanges_sched326.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
billing_demand_mw |
number |
Monthly average billing demand (for requirements purchases, and any transactions involving demand charges). In megawatts. |
coincident_peak_demand_mw |
number |
Average monthly coincident peak (CP) demand (for requirements purchases, and any transactions involving demand charges). Monthly CP demand is the metered demand during the hour (60-minute integration) in which the supplier’s system reaches its monthly peak. In megawatts. |
delivered_mwh |
number |
Gross megawatt-hours delivered in power exchanges and used as the basis for settlement. |
demand_charges |
number |
Demand charges (USD). |
energy_charges |
number |
Energy charges (USD). |
non_coincident_peak_demand_mw |
number |
Average monthly non-coincident peak (NCP) demand (for requirements purhcases, and any transactions involving demand charges). Monthly NCP demand is the maximum metered hourly (60-minute integration) demand in a month. In megawatts. |
other_charges |
number |
Other charges, including out-of-period adjustments (USD). |
purchase_type_code |
string |
Categorization based on the original contractual terms and conditions of the service. Must be one of ‘requirements’, ‘long_firm’, ‘intermediate_firm’, ‘short_firm’, ‘long_unit’, ‘intermediate_unit’, ‘electricity_exchange’, ‘other_service’, or ‘adjustment’. Requirements service is ongoing high reliability service, with load integrated into system resource planning. ‘Long term’ means 5+ years. ‘Intermediate term’ is 1-5 years. ‘Short term’ is less than 1 year. ‘Firm’ means not interruptible for economic reasons. ‘unit’ indicates service from a particular designated generating unit. ‘exchange’ is an in-kind transaction. |
purchased_mwh |
number |
Megawatt-hours shown on bills rendered to the respondent. Includes both electricity purchased for storage and non-storage purposes, which were lumped together prior to 2021. |
purchased_other_than_storage_mwh |
number |
Number of megawatt hours purchased during the period for other than energy storage. |
purchased_storage_mwh |
number |
Number of megawatt hours purchased during the period for energy storage. |
received_mwh |
number |
Gross megawatt-hours received in power exchanges and used as the basis for settlement. |
record_id |
string |
Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped. |
report_year |
integer |
Four-digit year in which the data was reported. |
seller_name |
string |
Name of the seller, or the other party in an exchange transaction. |
tariff |
string |
FERC Rate Schedule Number or Tariff. (Note: may be incomplete if originally reported on multiple lines.) |
total_settlement |
number |
Sum of demand, energy, and other charges (USD). For power exchanges, the settlement amount for the net receipt of energy. If more energy was delivered than received, this amount is negative. |
utility_id_ferc1 |
integer |
PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year. |
utility_id_ferc1_dbf |
integer |
FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year. |
utility_id_ferc1_xbrl |
string |
FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year. |
utility_id_pudl |
integer |
A manually assigned PUDL utility ID. May not be stable over time. |
utility_name_ferc1 |
string |
Name of the responding utility, as it is reported in FERC Form 1. For human readability only. |
out_ferc1__yearly_rate_base#
Noticed any irregularities with this table? File a report.
Annual time series of granular accounting data consisting of what utilities can typically include in their rate bases.
- Most-recent data:
2024
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
FERC Form 1 – Annual Report of Major Electric Utilities (Schedules 110, 118, 200, 204 219 and 320)
- Primary key:
This table has no primary key.
Usage Warnings#
The data from these xbrl_factoid tables contains nested totals and subtotals - making aggregations difficult. We used FERC 1’s reported calculations to determine the most granular, non-duplicative records. The reported calculations are not always perfect - we’ve corrected many of them, but errors could still exist.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.
Additional Details#
This table is derived from seven FERC Form 1 accounting tables with nested calculations. We reconciled these nested calculations and then identified the most granular data across the tables.
Here are the three direct upstream inputs - the two detailed tables have several core_ferc1 inputs each:
We applied slight modifications to three columns (utility_type, plant_function & plant_status) as compared to the originally reported values in our core tables. The modifications were applied to either provide more specificity (i.e. we converted some total utility_type’s into electric) or to condense similar categories for easier analysis (i.e. creating a hydraulic_production plant_function by combining hydraulic_production_conventional and hydraulic_production_pumped_storage).
See pudl.output.ferc1.Exploder for more details. This table was made entirely with support and direction from RMI.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_rate_base.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_rate_base.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_rate_base.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_rate_base.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
ending_balance |
number |
Account balance at end of year. |
ferc_account |
string |
Actual FERC Account number (e.g. ‘359.1’) if available, or a PUDL assigned ID when FERC accounts have been split or combined in reporting. |
is_disaggregated_in_rate_base |
boolean |
Indicates whether or not records with null values in the in_rate_base column were disaggregated. See documentation for process: pudl.output.ferc1.disaggregate_null_or_total_tag |
is_disaggregated_utility_type |
boolean |
Indicates whether or not records with null or total values in the utility_type column were disaggregated. See documentation for process: pudl.output.ferc1.disaggregate_null_or_total_tag |
plant_function |
string |
Functional role played by utility plant (steam production, nuclear production, distribution, transmission, etc.). |
plant_status |
string |
Utility plant financial status (in service, future, leased, total). |
rate_base_category |
string |
A category of asset or liability that RMI compiled to use as a shorthand for various types of utility assets. These tags were compiled manually based on the xbrl_factoid and sometimes varies based on the utility_type, plant_function or plant_status as well. |
record_id |
string |
Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped. |
report_year |
integer |
Four-digit year in which the data was reported. |
row_type_xbrl |
string |
Indicates whether the value reported in the row is calculated, or uniquely reported within the table. |
table_name |
string |
The name of the PUDL database table where a given record originated from. |
utility_id_ferc1 |
integer |
PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year. |
utility_id_ferc1_dbf |
integer |
FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year. |
utility_id_ferc1_xbrl |
string |
FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year. |
utility_id_pudl |
integer |
A manually assigned PUDL utility ID. May not be stable over time. |
utility_name_ferc1 |
string |
Name of the responding utility, as it is reported in FERC Form 1. For human readability only. |
utility_type |
string |
Listing of utility plant types. |
utility_type_other |
string |
Freeform description of type of utility reported in one of the other three other utility_type sections in the core_ferc1__yearly_utility_plant_summary_sched200 table. This field is reported only in the DBF reporting years (1994-2020). |
xbrl_factoid |
string |
The name of type of value which is a derivative of the XBRL fact name. |
out_ferc1__yearly_retained_earnings_sched118#
Noticed any irregularities with this table? File a report.
Annual time series of utilities’ statements of retained earnings.
- Most-recent data:
2024
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 118)
- Primary key:
utility_id_ferc1, report_year, earnings_type
Usage Warnings#
Some columns contain subtotals; use caution when choosing columns to aggregate.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_retained_earnings_sched118.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_retained_earnings_sched118.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_retained_earnings_sched118.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_retained_earnings_sched118.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
balance |
string |
Indication of whether a column is a credit or debit, as reported in the XBRL taxonomy. |
earnings_type |
string |
Label describing types of earnings. |
ending_balance |
number |
Account balance at end of year. |
ferc_account |
string |
Actual FERC Account number (e.g. ‘359.1’) if available, or a PUDL assigned ID when FERC accounts have been split or combined in reporting. |
record_id |
string |
Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped. |
report_year |
integer |
Four-digit year in which the data was reported. |
row_type_xbrl |
string |
Indicates whether the value reported in the row is calculated, or uniquely reported within the table. |
starting_balance |
number |
Account balance at beginning of year. |
utility_id_ferc1 |
integer |
PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year. |
utility_id_ferc1_dbf |
integer |
FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year. |
utility_id_ferc1_xbrl |
string |
FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year. |
utility_id_pudl |
integer |
A manually assigned PUDL utility ID. May not be stable over time. |
utility_name_ferc1 |
string |
Name of the responding utility, as it is reported in FERC Form 1. For human readability only. |
utility_type |
string |
Listing of utility plant types. Examples include Electric Utility, Gas Utility, and Other Utility. |
out_ferc1__yearly_sales_by_rate_schedules_sched304#
Noticed any irregularities with this table? File a report.
Annual time series of utilities’ electric sales from all rate schedules in effect throughout the year.
- Most-recent data:
2024
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 304)
- Primary key:
This table has no primary key.
Usage Warnings#
Some columns contain subtotals; use caution when choosing columns to aggregate.
Values in rate_schedule_description are free-form strings.
Data prior to 2021 does not include information in columns: rate_schedule_type and billing_status.
Units of revenue_per_kwh are suspected to include a mix of dollars and possibly cents.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.
Additional Details#
The pre-2021 data in this table (extracted from FoxProDB vs. XBRL) is extremely unstructured. Where the post-2020 data (from XBRL) sorts the data into rate schedule types: residential, industrial, commercial, public_lighting, public_authorities, railroads, interdepartmental, provision_for_rate_refund, commercial_and_industrial, total, and billing status: billed, unbilled, total, the pre-2021 data stuffs all of that information (if you’re lucky) into the rate_schedule_description column. There’s no point trying to parse through the pre 2021 rate_schedule_description column en masse because it’s just too messy. The contents of rate_schedule_description often contain numbers and acronyms that have little to no meaning out of context. The table is structured somewhat like the FERC1 small generators table with headings about rate structure type also embedded into the rate_schedule_description column. To all who dare, beware.
This table is a combination of one pre-2021 (DBF) table and nine post-2020 (XBRL) tables–one for each rate schedule type plus totals–hence increase in data clarity post-2020. The rate_schedule_type and billing_status columns are only relevant for post-2020 data as they can be reliably parsed from each of the tables and incorporated into columns. The rate_schedule_description is supposed to contain sub-rate_schedule_type names for charges (Ex: Residential 1, Residential 2, etc.). However, the pre-2021 data contains a little bit of everything (or nothing) and the post-2020 has some totals or wonky data thrown in. That’s to say, even when working with post-2020 data, be wary of aggregating the data. That’s what the “total” rows are for.
The values that come from from the totals table are marked with the string “total” in the rate_schedule_description column. The totals table is a product of the transition to XBRL, so these distinguishable totals are only available for data post-2020 (otherwise you could try keyword searching for “total” in rate_schedule_description). The total table contains two types of totals, the utility totals across all rate schedules in a given year (marked with rate_schedule_description = “total” and rate_schedule_type = “total”) and each of the utility’s individual rate schedule totals in a given year (marked with rate_schedule_description = “total” and rate_schedule_type = “residential” or any other rate schedule type).
The rate schedule based XBRL tables only report billed values whereas the total tables report billed, unbilled, and total values. (See the column description for more info on the difference between billed and unbilled). This is important to consider if you’re endeavoring to compare the subtotal values with the total values. We have not attempted to fix or verify any subtotals or totals that don’t add up.
Another important note is the possibility of unit discrepancies in certain columns. The revenue_per_kwh column does not specify reporting units, and closer inspection of the data reveals two clear peaks approximate two orders of magnitude apart. This indicates that values may be reported in both dollars and cents. However, because the price of energy per kwh varies so much regionally, we cannot guarantee which is which and have not put any cleaning mechanisms in place to account for this.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_sales_by_rate_schedules_sched304.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_sales_by_rate_schedules_sched304.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_sales_by_rate_schedules_sched304.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_sales_by_rate_schedules_sched304.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
avg_customers_per_month |
number |
Average number of customers per month. |
billing_status |
string |
Whether an amount is billed, unbilled, or both. Billed amounts pertain to the exchange of energy and unbilled amounts pertain to other sources of revenue such as contracts with peaker plants to keep them on standby or charging rent to host cell antennas on transmission towers. |
dollar_value |
number |
Dollar value of reported income, expense, asset, or liability. |
kwh_per_customer |
number |
kWh per customer. |
rate_schedule_description |
string |
Free-form description of what the rate schedule name is. Not standardized. Often a sub-category of rate_schedule_type. |
rate_schedule_type |
string |
Categorization of rate schedule type. |
record_id |
string |
Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped. |
report_year |
integer |
Four-digit year in which the data was reported. |
revenue_per_kwh |
number |
The amount of revenue per kWh by rate schedule acquired in the given report year. |
sales_mwh |
number |
Quantity of electricity sold in MWh. |
utility_id_ferc1 |
integer |
PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year. |
utility_id_ferc1_dbf |
integer |
FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year. |
utility_id_ferc1_xbrl |
string |
FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year. |
utility_id_pudl |
integer |
A manually assigned PUDL utility ID. May not be stable over time. |
utility_name_ferc1 |
string |
Name of the responding utility, as it is reported in FERC Form 1. For human readability only. |
out_ferc1__yearly_small_plants_sched410#
Noticed any irregularities with this table? File a report.
Annual time series of plant statistics for internal combustion plants, gas turbine-plants, conventional hydro plants, and pumped storage plants with less than 10 MW installed nameplate capacity and steam plants with less than 25 MW installed nameplate capacity.
- Most-recent data:
2024
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 410)
- Primary key:
record_id The best approximation for primary keys for this table would be:
report_year,utility_id_ferc1,plant_name_ferc1. FERC does not publish plant IDs. The main identifying column isplant_name_ferc1but that is a free-form string field and there are duplicate records.
Usage Warnings#
FERC does not restrict respondents to report unique and non-duplicative plant records. There are sporadic instances of respondents reporting portions of plants and then the total plant (ex: unit 1, unit 2 and total). Use caution when aggregating.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.
Additional Details#
As reported on FERC Form 1 Schedule 410 (pages 410-411)
and extracted from the FERC Visual FoxPro and XBRL. See our
pudl.extract.ferc1.TABLE_NAME_MAP_FERC1 for links to the raw tables.
The raw version of this table is more like a digitized PDF than an actual data table. The rows contain lots of information in addition to what the columns might suggest. For instance, a single column may contain header rows, note rows, and total rows. This extraneous information is useful, but it prevents proper analysis when mixed in with the rest of the values data in the column. We employ a couple of data transformations to extract these rows from the data and preserve some of the information they contain (fuel type, plant type, FERC license, or general notes about the plant) in separate columns.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_small_plants_sched410.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_small_plants_sched410.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_small_plants_sched410.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_small_plants_sched410.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
capacity_mw |
number |
Total installed (nameplate) capacity, in megawatts. |
capex_per_mw |
number |
Cost of plant per megawatt of installed (nameplate) capacity. Nominal USD. |
capex_total |
number |
Total cost of plant (USD). |
construction_year |
integer |
Year the plant’s oldest still operational unit was built. |
fuel_cost_per_mmbtu |
number |
Average fuel cost per MMBTU of heat content in nominal USD. |
fuel_type |
string |
Type of fuel. |
license_id_ferc1 |
integer |
FERC issued operating license ID for the facility, if available. This value is extracted from the original plant name where possible. |
net_generation_mwh |
number |
Net electricity generation for the specified period in megawatt-hours (MWh). |
opex_fuel |
number |
Production expenses: fuel (USD). |
opex_maintenance |
number |
Production expenses: Maintenance (USD). |
opex_operations |
number |
Production expenses: operations, supervision, and engineering (USD). |
opex_total |
number |
Total production expenses, excluding fuel (USD). |
opex_total_nonfuel |
number |
Total production expenses, excluding fuel (USD). |
peak_demand_mw |
number |
Net peak demand for 60 minutes. Note: in some cases peak demand for other time periods may have been reported instead, if hourly peak demand was unavailable. |
plant_id_pudl |
integer |
A manually assigned PUDL plant ID. May not be constant over time. |
plant_name_ferc1 |
string |
Name of the plant, as reported to FERC. This is a freeform string, not guaranteed to be consistent across references to the same plant. |
plant_type |
string |
Type of plant. |
record_id |
string |
Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped. |
report_year |
integer |
Four-digit year in which the data was reported. |
utility_id_ferc1 |
integer |
PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year. |
utility_id_ferc1_dbf |
integer |
FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year. |
utility_id_ferc1_xbrl |
string |
FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year. |
utility_id_pudl |
integer |
A manually assigned PUDL utility ID. May not be stable over time. |
utility_name_ferc1 |
string |
Name of the responding utility, as it is reported in FERC Form 1. For human readability only. |
out_ferc1__yearly_steam_plants_fuel_by_plant_sched402#
Noticed any irregularities with this table? File a report.
Annual time series of FERC fuel data by plant.
- Most-recent data:
2024
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 402)
- Primary key:
report_year, utility_id_ferc1, plant_name_ferc1
Usage Warnings#
Some columns contain subtotals; use caution when choosing columns to aggregate.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_steam_plants_fuel_by_plant_sched402.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_steam_plants_fuel_by_plant_sched402.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_steam_plants_fuel_by_plant_sched402.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_steam_plants_fuel_by_plant_sched402.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
coal_fraction_cost |
number |
Coal cost as a percentage of overall fuel cost. |
coal_fraction_mmbtu |
number |
Coal heat content as a percentage of overall fuel heat content (MMBTU). |
fuel_cost |
number |
Total fuel cost for plant (in $USD). |
fuel_mmbtu |
number |
Total heat content for plant (in MMBTU). |
gas_fraction_cost |
number |
Natural gas cost as a percentage of overall fuel cost. |
gas_fraction_mmbtu |
number |
Natural gas heat content as a percentage of overall fuel heat content (MMBTU). |
nuclear_fraction_cost |
number |
Nuclear cost as a percentage of overall fuel cost. |
nuclear_fraction_mmbtu |
number |
Nuclear heat content as a percentage of overall fuel heat content (MMBTU). |
oil_fraction_cost |
number |
Oil cost as a percentage of overall fuel cost. |
oil_fraction_mmbtu |
number |
Oil heat content as a percentage of overall fuel heat content (MMBTU). |
plant_id_pudl |
integer |
A manually assigned PUDL plant ID. May not be constant over time. |
plant_name_ferc1 |
string |
Name of the plant, as reported to FERC. This is a freeform string, not guaranteed to be consistent across references to the same plant. |
primary_fuel_by_cost |
string |
Primary fuel for plant as a percentage of cost. |
primary_fuel_by_mmbtu |
string |
Primary fuel for plant as a percentage of heat content. |
report_year |
integer |
Four-digit year in which the data was reported. |
utility_id_ferc1 |
integer |
PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year. |
utility_id_ferc1_dbf |
integer |
FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year. |
utility_id_ferc1_xbrl |
string |
FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year. |
utility_id_pudl |
integer |
A manually assigned PUDL utility ID. May not be stable over time. |
utility_name_ferc1 |
string |
Name of the responding utility, as it is reported in FERC Form 1. For human readability only. |
waste_fraction_cost |
number |
Waste-heat cost as a percentage of overall fuel cost. |
waste_fraction_mmbtu |
number |
Waste-heat heat content as a percentage of overall fuel heat content (MMBTU). |
out_ferc1__yearly_steam_plants_fuel_sched402#
Noticed any irregularities with this table? File a report.
Annual time series of fuel cost and quantity for steam plants with a capacity of 25+ MW, internal combustion and gas-turbine plants of 10+ MW, and all nuclear plants.
- Most-recent data:
2024
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 402)
- Primary key:
record_id
Usage Warnings#
The
fuel_type_code_pudlis inferred from a free-form string field.FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.
Additional Details#
This table is a subset of the steam plant table reported on page 402 of FERC Form 1.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_steam_plants_fuel_sched402.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_steam_plants_fuel_sched402.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_steam_plants_fuel_sched402.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_steam_plants_fuel_sched402.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
fuel_consumed_mmbtu |
number |
Total consumption of fuel in physical unit, year to date. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production. |
fuel_consumed_total_cost |
number |
Total cost of consumed fuel. |
fuel_consumed_units |
number |
Consumption of the fuel type in physical unit. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production. |
fuel_cost_per_mmbtu |
number |
Average fuel cost per MMBTU of heat content in nominal USD. |
fuel_cost_per_unit_burned |
number |
Average cost of fuel consumed in the report year per reported fuel unit (USD). |
fuel_cost_per_unit_delivered |
number |
Average cost of fuel delivered in the report year per reported fuel unit (USD). |
fuel_mmbtu_per_unit |
number |
Heat content of the fuel in millions of Btus per physical unit. |
fuel_type_code_pudl |
string |
Simplified fuel type code used in PUDL |
fuel_units |
string |
Reported unit of measure for fuel. |
plant_id_pudl |
integer |
A manually assigned PUDL plant ID. May not be constant over time. |
plant_name_ferc1 |
string |
Name of the plant, as reported to FERC. This is a freeform string, not guaranteed to be consistent across references to the same plant. |
record_id |
string |
Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped. |
report_year |
integer |
Four-digit year in which the data was reported. |
utility_id_ferc1 |
integer |
PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year. |
utility_id_ferc1_dbf |
integer |
FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year. |
utility_id_ferc1_xbrl |
string |
FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year. |
utility_id_pudl |
integer |
A manually assigned PUDL utility ID. May not be stable over time. |
utility_name_ferc1 |
string |
Name of the responding utility, as it is reported in FERC Form 1. For human readability only. |
out_ferc1__yearly_steam_plants_sched402#
Noticed any irregularities with this table? File a report.
Annual time series of plant statistics for steam plants with a capacity of 25+ MW, internal combustion and gas-turbine plants of 10+ MW, and all nuclear plants.
- Most-recent data:
2024
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 402)
- Primary key:
record_id The best approximation for primary keys for this table would be:
report_year,utility_id_ferc1,plant_name_ferc1. FERC does not publish plant IDs. The main identifying column isplant_name_ferc1but that is a free-form string field and there are duplicate records.
Usage Warnings#
FERC does not restrict respondents to report unique and non-duplicative plant records. There are sporadic instances of respondents reporting portions of plants and then the total plant (ex: unit 1, unit 2 and total). Use caution when aggregating.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_steam_plants_sched402.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_steam_plants_sched402.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_steam_plants_sched402.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_steam_plants_sched402.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
asset_retirement_cost |
number |
Asset retirement cost (USD). |
avg_num_employees |
number |
The average number of employees assigned to each plant. |
capacity_factor |
number |
Fraction of potential generation that was actually reported for a plant part. Energy generated over time period / nameplate capacity * time period (hours/years/etc.). |
capacity_mw |
number |
Total installed (nameplate) capacity, in megawatts. |
capex_annual_addition |
number |
Annual capital addition into capex_total. |
capex_annual_addition_rolling |
number |
Year-to-date capital addition into capex_total. |
capex_annual_per_kw |
number |
Annual capital addition into capex_total per kw. |
capex_annual_per_mw |
number |
Annual capital addition into capex_total per MW. |
capex_annual_per_mw_rolling |
number |
Year-to-date capital addition into capex_total per MW. |
capex_annual_per_mwh |
number |
Annual capital addition into capex_total per MWh. |
capex_annual_per_mwh_rolling |
number |
Year-to-date capital addition into capex_total per MWh. |
capex_equipment |
number |
Cost of plant: equipment (USD). |
capex_land |
number |
Cost of plant: land and land rights (USD). |
capex_per_mw |
number |
Cost of plant per megawatt of installed (nameplate) capacity. Nominal USD. |
capex_structures |
number |
Cost of plant: structures and improvements (USD). |
capex_total |
number |
Total cost of plant (USD). |
capex_wo_retirement_total |
number |
Total cost of plant (USD) without retirements. |
construction_type |
string |
Type of plant construction (‘outdoor’, ‘semioutdoor’, or ‘conventional’). Categorized by PUDL based on our best guess of intended value in FERC1 freeform strings. |
construction_year |
integer |
Year the plant’s oldest still operational unit was built. |
installation_year |
integer |
Year the plant’s most recently built unit was installed. |
net_generation_mwh |
number |
Net electricity generation for the specified period in megawatt-hours (MWh). |
not_water_limited_capacity_mw |
number |
Plant capacity in MW when not limited by condenser water. |
opex_allowances |
number |
Allowances. |
opex_boiler |
number |
Maintenance of boiler (or reactor) plant. |
opex_coolants |
number |
Cost of coolants and water (nuclear plants only) |
opex_electric |
number |
Production expenses: electric expenses (USD). |
opex_engineering |
number |
Production expenses: maintenance, supervision, and engineering (USD). |
opex_fuel |
number |
Production expenses: fuel (USD). |
opex_fuel_per_mwh |
number |
Production expenses: fuel (USD) per megawatt-hour (Mwh). |
opex_misc_power |
number |
Miscellaneous steam (or nuclear) expenses. |
opex_misc_steam |
number |
Maintenance of miscellaneous steam (or nuclear) plant. |
opex_nonfuel_per_mwh |
number |
Investments in non-fuel production expenses per Mwh. |
opex_operations |
number |
Production expenses: operations, supervision, and engineering (USD). |
opex_per_mwh |
number |
Total production expenses (USD per MWh generated). |
opex_plants |
number |
Maintenance of electrical plant. |
opex_production_total |
number |
Total operating expenses. |
opex_rents |
number |
Production expenses: rents (USD). |
opex_steam |
number |
Steam expenses. |
opex_steam_other |
number |
Steam from other sources. |
opex_structures |
number |
Production expenses: maintenance of structures (USD). |
opex_total_nonfuel |
number |
Total production expenses, excluding fuel (USD). |
opex_transfer |
number |
Steam transferred (Credit). |
peak_demand_mw |
number |
Net peak demand for 60 minutes. Note: in some cases peak demand for other time periods may have been reported instead, if hourly peak demand was unavailable. |
plant_capability_mw |
number |
Net plant capability in megawatts. |
plant_hours_connected_while_generating |
number |
Hours the plant was connected to load while generating in the report year. |
plant_id_ferc1 |
integer |
Algorithmically assigned PUDL FERC Plant ID. WARNING: NOT STABLE BETWEEN PUDL DB INITIALIZATIONS. |
plant_id_pudl |
integer |
A manually assigned PUDL plant ID. May not be constant over time. |
plant_name_ferc1 |
string |
Name of the plant, as reported to FERC. This is a freeform string, not guaranteed to be consistent across references to the same plant. |
plant_type |
string |
Type of plant. |
record_id |
string |
Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped. |
report_year |
integer |
Four-digit year in which the data was reported. |
utility_id_ferc1 |
integer |
PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year. |
utility_id_ferc1_dbf |
integer |
FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year. |
utility_id_ferc1_xbrl |
string |
FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year. |
utility_id_pudl |
integer |
A manually assigned PUDL utility ID. May not be stable over time. |
utility_name_ferc1 |
string |
Name of the responding utility, as it is reported in FERC Form 1. For human readability only. |
water_limited_capacity_mw |
number |
Plant capacity in MW when limited by condenser water. |
out_ferc1__yearly_transmission_lines_sched422#
Noticed any irregularities with this table? File a report.
Annual time series of statistics about transmission lines.
- Most-recent data:
2024
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 422)
- Primary key:
This table has no primary key. Each record of this table is supposed to represent one stretch of a transmission line, but there are no IDs and many nulls in the fields which would nominally distinguish unique transmission lines.
Usage Warnings#
Contains columns which may appear categorical, but are actually free text.
Some columns contain subtotals; use caution when choosing columns to aggregate.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.
Additional Details#
Information describing transmission lines, the cost of lines, annual operating and capital expenses, etc. This table includes transmission lines having nominal voltage of 132 kilovolts or greater. Transmission lines below these voltages are required to be reported in group totals only for each voltage.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_transmission_lines_sched422.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_transmission_lines_sched422.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_transmission_lines_sched422.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_transmission_lines_sched422.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
capex_land |
number |
Cost of plant: land and land rights (USD). |
capex_other |
number |
Other costs associated with the plant (USD). |
capex_total |
number |
Total cost of plant (USD). |
conductor_size_and_material |
string |
Size of transmission conductor and material of the transmission line. |
designed_voltage_kv |
number |
Manufactured (Designed) voltage, expressed in kilo-volts, for three-phase 60 cycle alternative current transmission lines |
end_point |
string |
The end point of a transmission line. |
num_transmission_circuits |
integer |
Number of circuits in a transmission line. |
operating_voltage_kv |
number |
The operating voltage, expressed kilo-volts, for three-phase 60 cycle alternative current transmission lines. |
opex_maintenance |
number |
Production expenses: Maintenance (USD). |
opex_operations |
number |
Production expenses: operations, supervision, and engineering (USD). |
opex_rents |
number |
Production expenses: rents (USD). |
opex_total |
number |
Total production expenses, excluding fuel (USD). |
record_id |
string |
Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped. |
report_year |
integer |
Four-digit year in which the data was reported. |
start_point |
string |
The starting point of a transmission line. |
supporting_structure_type |
string |
Supporting structure of the transmission line. |
transmission_line_and_structures_length_miles |
number |
Length (in pole miles or circuit miles (if transmission lines are underground)) for lines that are agrregated with other lines / structures (whose cost are aggregated and combined with other structures). |
transmission_line_length_miles |
number |
Length (in pole miles or circuit miles (if transmission lines are underground)) for lines that are stand alone structures (whose cost are reported on a stand-alone basis). |
utility_id_ferc1 |
integer |
PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year. |
utility_id_ferc1_dbf |
integer |
FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year. |
utility_id_ferc1_xbrl |
string |
FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year. |
utility_id_pudl |
integer |
A manually assigned PUDL utility ID. May not be stable over time. |
utility_name_ferc1 |
string |
Name of the responding utility, as it is reported in FERC Form 1. For human readability only. |
out_ferc1__yearly_utility_plant_summary_sched200#
Noticed any irregularities with this table? File a report.
Annual time series of utility plant and accumulated provisions for depreciation, amortization and depletion of utility plant assets.
- Most-recent data:
2024
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 200)
- Primary key:
utility_id_ferc1, report_year, utility_type, utility_plant_asset_type
Usage Warnings#
Some columns contain subtotals; use caution when choosing columns to aggregate.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_utility_plant_summary_sched200.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_utility_plant_summary_sched200.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_utility_plant_summary_sched200.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_utility_plant_summary_sched200.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
ending_balance |
number |
Account balance at end of year. |
record_id |
string |
Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped. |
report_year |
integer |
Four-digit year in which the data was reported. |
row_type_xbrl |
string |
Indicates whether the value reported in the row is calculated, or uniquely reported within the table. |
utility_id_ferc1 |
integer |
PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year. |
utility_id_ferc1_dbf |
integer |
FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year. |
utility_id_ferc1_xbrl |
string |
FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year. |
utility_id_pudl |
integer |
A manually assigned PUDL utility ID. May not be stable over time. |
utility_name_ferc1 |
string |
Name of the responding utility, as it is reported in FERC Form 1. For human readability only. |
utility_plant_asset_type |
string |
Type of utility plant asset reported in the core_ferc1__yearly_utility_plant_summary_sched200 table. Assets include those leased to others, held for future use, construction work-in-progress and details of accumulated depreciation. |
utility_type |
string |
Listing of utility plant types. Examples include Electric Utility, Gas Utility, and Other Utility. |
utility_type_other |
string |
Freeform description of type of utility reported in one of the other three other utility_type sections in the core_ferc1__yearly_utility_plant_summary_sched200 table. This field is reported only in the DBF reporting years (1994-2020). |
out_ferc714__georeferenced_respondents#
Noticed any irregularities with this table? File a report.
Annual summary of demand and other information about FERC-714 respondents.
- Most-recent data:
2024
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
FERC Form 714 – Annual Electric Balancing Authority Area and Planning Area Report
- Primary key:
respondent_id_ferc714, report_date
Usage Warnings#
Contains information from multiple raw inputs.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.
Additional Details#
This table differs from
out_ferc714__summarized_demand in that it also
includes a geometry column describing the respondent’s service territory in each year.
These service territories are based on the counties that the corresponding EIA-861
respondent reported serving in that year. There is sometimes ambiguity as to
whether a FERC-714 respondent should be interpreted as an individual utility or a
balancing authority. The respodent_type column indicates which type of entity has
been assumed in determining the service territory from EIA-861 data.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc714__georeferenced_respondents.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc714__georeferenced_respondents.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc714__georeferenced_respondents.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc714__georeferenced_respondents.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
area_km2 |
number |
County area in km2. |
balancing_authority_code_eia |
string |
EIA short code identifying a balancing authority. May include Canadian and Mexican BAs. |
balancing_authority_id_eia |
integer |
EIA balancing authority ID. This is often (but not always!) the same as the utility ID associated with the same legal entity. |
balancing_authority_name_eia |
string |
Name of the balancing authority. |
demand_annual_mwh |
number |
Annual electricity demand in a given report year. |
eia_code |
integer |
EIA utility or balancing area authority ID associated with this FERC Form 714 respondent. Note that many utilities are also balancing authorities and in many cases EIA uses the same integer ID to identify a utility in its role as a balancing authority AND as a utility, but there is no requirement that these IDs be the same, and in a number of cases they are different. |
geometry |
geometry |
Geospatial representation of the feature. |
population |
number |
County population, sourced from Census DP1 data. |
report_date |
date |
Date reported. |
respondent_id_ferc714 |
integer |
PUDL-assigned identifying a respondent to FERC Form 714. This ID associates natively reported respondent IDs from the original CSV and XBRL data sources. |
respondent_name_ferc714 |
string |
Name of the utility, balancing area authority, or planning authority responding to FERC Form 714. |
respondent_type |
string |
Whether a respondent to the FERC form 714 is a utility or a balancing authority. |
utility_id_eia |
integer |
The EIA Utility Identification number. |
utility_name_eia |
string |
The name of the utility. |
out_ferc714__hourly_estimated_state_demand#
Noticed any irregularities with this table? File a report.
Hourly time series of estimated electricity demand for each state, scaled such that it matches the total electricity sales by state reported in EIA 861.
- Most-recent data:
2024
- Processing:
Data is ready for use in analyses, but for practical reasons has not been denormalized and remains in narrow format.
- Source:
FERC Form 714 – Annual Electric Balancing Authority Area and Planning Area Report
- Primary key:
state_id_fips, datetime_utc
Usage Warnings#
Contains estimated values.
Contains information from multiple raw inputs.
Large table; do not attempt to open with Excel.
Contains information from multiple raw inputs.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.
Additional Details#
This table uses hourly electricity demand and information about service territories to estimate the total hourly electricity demand for each US state. Hourly demand is drawn from the FERC 714, which reports at the balancing authority and utility level. Service territories for utilities and balancing authorities are inferred from information in the EIA 861, including the counties served by each utility and the utilities that make up each balancing authority.
This table uses the total electricity sales by state reported in the EIA 861 as a scaling factor to ensure that the magnitude of electricity sales is roughly correct, and obtains the shape of the demand curve from the hourly planning area demand reported in the FERC 714.
This table includes the state FIPS codes - see out_censusdp1tract__states for additional information about FIPS state codes.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc714__hourly_estimated_state_demand.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc714__hourly_estimated_state_demand.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc714__hourly_estimated_state_demand.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc714__hourly_estimated_state_demand.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
datetime_utc |
datetime |
Date and time converted to Coordinated Universal Time (UTC). |
demand_mwh |
number |
Electricity demand (energy) within a given timeframe. |
scaled_demand_mwh |
number |
Estimated electricity demand scaled by the total sales within a state. |
state_id_fips |
string |
Two digit state FIPS code. |
out_ferc714__hourly_planning_area_demand#
Noticed any irregularities with this table? File a report.
Hourly time series of electricity demand by planning area.
- Most-recent data:
2024
- Processing:
Data is ready for use in analyses, but for practical reasons has not been denormalized and remains in narrow format.
- Source:
FERC Form 714 – Annual Electric Balancing Authority Area and Planning Area Report (Part III, Schedule 2a)
- Primary key:
respondent_id_ferc714, datetime_utc
Usage Warnings#
Contains rows where missing values were imputed.
The datetime_utc timestamps have been cleaned due to inconsistent datetime reporting. See core_ferc714__hourly_planning_area_demand for additional details.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.
Additional Details#
This table is based on core_ferc714__hourly_planning_area_demand, but adds imputed demand values where the original data was missing or anomalous. Codes explaining why values have been imputed can be found in the core_pudl__codes_imputation_reasons table.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc714__hourly_planning_area_demand.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc714__hourly_planning_area_demand.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc714__hourly_planning_area_demand.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc714__hourly_planning_area_demand.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
datetime_utc |
datetime |
Date and time converted to Coordinated Universal Time (UTC). |
demand_imputed_pudl_mwh |
number |
Electricity demand calculated by subtracting BA interchange from net generation, with outliers and missing values imputed in PUDL. |
demand_imputed_pudl_mwh_imputation_code |
string |
Code describing why a demand value was flagged for imputation. |
demand_reported_mwh |
number |
Originally reported electricity demand, calculated by taking the net generation within the BA and subtracting the interchange with adjacent BAs. |
report_date |
date |
Date reported. |
respondent_id_ferc714 |
integer |
PUDL-assigned identifying a respondent to FERC Form 714. This ID associates natively reported respondent IDs from the original CSV and XBRL data sources. |
respondent_id_ferc714_csv |
integer |
FERC Form 714 respondent ID from CSV reported data - published from years: 2006-2020. This ID is linked to the newer years of reported XBRL data through the PUDL-assigned respondent_id_ferc714 ID. This ID was originally reported as respondent_id. Note that this ID does not correspond to FERC respondent IDs from other forms. |
respondent_id_ferc714_xbrl |
string |
FERC Form 714 respondent ID from XBRL reported data - published from years: 2021-present. This ID is linked to the older years of reported CSV data through the PUDL-assigned respondent_id_ferc714 ID. This ID was originally reported as entity_id. Note that this ID does not correspond to FERC respondent IDs from other forms. |
timezone |
string |
IANA timezone name |
out_ferc714__respondents_with_fips#
Noticed any irregularities with this table? File a report.
Annual respondents with the county FIPS IDs for their service territories.
- Most-recent data:
2024
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
FERC Form 714 – Annual Electric Balancing Authority Area and Planning Area Report
- Primary key:
This table has no primary key. Note that the state and county FIPS columns can be null. Otherwise the natural primary key would be: [‘respondent_id_ferc714’, ‘report_date’, ‘county_id_fips’]. The state FIPS code is not part of the primary key because the first two digits of the county FIPS code indicate the state.
Usage Warnings#
Contains information from multiple raw inputs.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc714__respondents_with_fips.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc714__respondents_with_fips.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc714__respondents_with_fips.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc714__respondents_with_fips.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
balancing_authority_code_eia |
string |
EIA short code identifying a balancing authority. May include Canadian and Mexican BAs. |
balancing_authority_id_eia |
integer |
EIA balancing authority ID. This is often (but not always!) the same as the utility ID associated with the same legal entity. |
balancing_authority_name_eia |
string |
Name of the balancing authority. |
county |
string |
County name. |
county_id_fips |
string |
County ID from the Federal Information Processing Standard Publication 6-4. |
eia_code |
integer |
EIA utility or balancing area authority ID associated with this FERC Form 714 respondent. Note that many utilities are also balancing authorities and in many cases EIA uses the same integer ID to identify a utility in its role as a balancing authority AND as a utility, but there is no requirement that these IDs be the same, and in a number of cases they are different. |
report_date |
date |
Date reported. |
respondent_id_ferc714 |
integer |
PUDL-assigned identifying a respondent to FERC Form 714. This ID associates natively reported respondent IDs from the original CSV and XBRL data sources. |
respondent_name_ferc714 |
string |
Name of the utility, balancing area authority, or planning authority responding to FERC Form 714. |
respondent_type |
string |
Whether a respondent to the FERC form 714 is a utility or a balancing authority. |
state |
string |
Two letter US state abbreviation. |
state_id_fips |
string |
Two digit state FIPS code. |
utility_id_eia |
integer |
The EIA Utility Identification number. |
utility_name_eia |
string |
The name of the utility. |
out_ferc714__summarized_demand#
Noticed any irregularities with this table? File a report.
Summarized demand statistics and FERC-714 respondent attributes by respondent-year.
- Most-recent data:
2024
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
FERC Form 714 – Annual Electric Balancing Authority Area and Planning Area Report
- Primary key:
respondent_id_ferc714, report_date
Usage Warnings#
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc714__summarized_demand.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc714__summarized_demand.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc714__summarized_demand.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_ferc714__summarized_demand.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
area_km2 |
number |
County area in km2. |
balancing_authority_code_eia |
string |
EIA short code identifying a balancing authority. May include Canadian and Mexican BAs. |
balancing_authority_id_eia |
integer |
EIA balancing authority ID. This is often (but not always!) the same as the utility ID associated with the same legal entity. |
balancing_authority_name_eia |
string |
Name of the balancing authority. |
demand_annual_mwh |
number |
Annual electricity demand in a given report year. |
demand_annual_per_capita_mwh |
number |
Per-capita annual demand, averaged using Census county-level population estimates. |
demand_density_mwh_km2 |
number |
Annual demand per km2 of a given service territory. |
eia_code |
integer |
EIA utility or balancing area authority ID associated with this FERC Form 714 respondent. Note that many utilities are also balancing authorities and in many cases EIA uses the same integer ID to identify a utility in its role as a balancing authority AND as a utility, but there is no requirement that these IDs be the same, and in a number of cases they are different. |
population |
number |
County population, sourced from Census DP1 data. |
population_density_km2 |
number |
Average population per sq. km area of a service territory. |
report_date |
date |
Date reported. |
respondent_id_ferc714 |
integer |
PUDL-assigned identifying a respondent to FERC Form 714. This ID associates natively reported respondent IDs from the original CSV and XBRL data sources. |
respondent_name_ferc714 |
string |
Name of the utility, balancing area authority, or planning authority responding to FERC Form 714. |
respondent_type |
string |
Whether a respondent to the FERC form 714 is a utility or a balancing authority. |
utility_id_eia |
integer |
The EIA Utility Identification number. |
utility_name_eia |
string |
The name of the utility. |
out_gridpathratoolkit__hourly_available_capacity_factor#
Noticed any irregularities with this table? File a report.
Hourly time series of capacity factors defining the capacity available from an aggregated group of generators.
- Most-recent data:
2021
- Processing:
Data is ready for use in analyses, but for practical reasons has not been denormalized and remains in narrow format.
- Source:
GridPath Resource Adequacy Toolkit Data
- Primary key:
datetime_utc, aggregation_group
Usage Warnings#
Contains estimated values.
The original data is no longer being collected or reported in this way.
Additional Details#
This table contains a mix of profiles representing whole regions and individual plants, where the individual plants are hybrid wind or solar plus electricity storage facilities. Capacity factors are stated as a fraction of the aggregate nameplate capacity of the group.
For the hybrid facilities, the capacity factor represents the available output of only the renewable generators. Estimates of the generation profiles have been extended across a longer range of dates than the original data. For wind, the years 2015-2020 are synthesized data and for solar 2020 is synthesized. See the Appendix of the GridPath Resource Adequacy Toolkit report for details.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_gridpathratoolkit__hourly_available_capacity_factor.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_gridpathratoolkit__hourly_available_capacity_factor.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_gridpathratoolkit__hourly_available_capacity_factor.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_gridpathratoolkit__hourly_available_capacity_factor.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
aggregation_group |
string |
A label identifying a group of aggregated generator capacity factors. |
capacity_factor |
number |
Fraction of potential generation that was actually reported for a plant part. Energy generated over time period / nameplate capacity * time period (hours/years/etc.). |
datetime_utc |
datetime |
Date and time converted to Coordinated Universal Time (UTC). |
out_pudl__yearly_assn_eia_ferc1_plant_parts#
Noticed any irregularities with this table? File a report.
Annual Association table providing connections between power plant data reported in FERC Form 1 and any available EIA data related to the plant parts covered by each FERC Form 1 record.
- Most-recent data:
2024
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
The Public Utility Data Liberation (PUDL) Project
- Primary key:
record_id_ferc1
Usage Warnings#
Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.
Contains information from multiple raw inputs.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.
Additional Details#
This table answers the question “What EIA data reported about plants or generators should be associated with a given plant record found in the FERC Form 1?”
Each record in this table corresponds to a single FERC Form 1 record reported in one of several tables describing power plants (large steam, hydro, small, etc.). These FERC records can correspond to an entire plant, individual generators within a plant, all generators in a plant with the same prime mover type, or just the respondent’s ownership share of any of those categories (or other categories). Furthermore, the same utility may report the same plant in different ways in different years.
The EIA data associated with each FERC plant record comes from our Plant Parts EIA table, out_eia__yearly_plant_parts. The EIA data in each record represents an aggregation of several slices of an EIA plant, across both physical characteristics and utility ownership.
This table inherits canonicalized values for generators and plants. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_pudl__yearly_assn_eia_ferc1_plant_parts.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_pudl__yearly_assn_eia_ferc1_plant_parts.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_pudl__yearly_assn_eia_ferc1_plant_parts.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_pudl__yearly_assn_eia_ferc1_plant_parts.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
appro_part_label |
string |
Plant part of the associated true granularity record. |
appro_record_id_eia |
string |
EIA record ID of the associated true granularity record. |
asset_retirement_cost |
number |
Asset retirement cost (USD). |
avg_num_employees |
number |
The average number of employees assigned to each plant. |
capacity_factor_eia |
number |
Fraction of potential generation that was actually reported for a plant part. Energy generated over time period / nameplate capacity * time period (hours/years/etc.). |
capacity_factor_ferc1 |
number |
Fraction of potential generation that was actually reported for a plant part. Energy generated over time period / nameplate capacity * time period (hours/years/etc.). |
capacity_mw_eia |
number |
Total installed (nameplate) capacity, in megawatts. |
capacity_mw_ferc1 |
number |
Total installed (nameplate) capacity, in megawatts. |
capex_annual_addition |
number |
Annual capital addition into capex_total. |
capex_annual_addition_rolling |
number |
Year-to-date capital addition into capex_total. |
capex_annual_per_kw |
number |
Annual capital addition into capex_total per kw. |
capex_annual_per_mw |
number |
Annual capital addition into capex_total per MW. |
capex_annual_per_mw_rolling |
number |
Year-to-date capital addition into capex_total per MW. |
capex_annual_per_mwh |
number |
Annual capital addition into capex_total per MWh. |
capex_annual_per_mwh_rolling |
number |
Year-to-date capital addition into capex_total per MWh. |
capex_equipment |
number |
Cost of plant: equipment (USD). |
capex_equipment_electric |
number |
Cost of plant: accessory electric equipment (USD). |
capex_equipment_misc |
number |
Cost of plant: miscellaneous power plant equipment (USD). |
capex_facilities |
number |
Cost of plant: reservoirs, dams, and waterways (USD). |
capex_land |
number |
Cost of plant: land and land rights (USD). |
capex_per_mw |
number |
Cost of plant per megawatt of installed (nameplate) capacity. Nominal USD. |
capex_roads |
number |
Cost of plant: roads, railroads, and bridges (USD). |
capex_structures |
number |
Cost of plant: structures and improvements (USD). |
capex_total |
number |
Total cost of plant (USD). |
capex_wheels_turbines_generators |
number |
Cost of plant: water wheels, turbines, and generators (USD). |
capex_wo_retirement_total |
number |
Total cost of plant (USD) without retirements. |
construction_type |
string |
Type of plant construction (‘outdoor’, ‘semioutdoor’, or ‘conventional’). Categorized by PUDL based on our best guess of intended value in FERC1 freeform strings. |
construction_year_eia |
integer |
Year the plant’s oldest still operational unit was built. |
construction_year_ferc1 |
integer |
Year the plant’s oldest still operational unit was built. |
energy_source_code_1 |
string |
The code representing the most predominant type of energy that fuels the generator. |
energy_used_for_pumping_mwh |
number |
Energy used for pumping, in megawatt-hours. |
ferc_acct_name |
string |
Name of FERC account, derived from technology description and prime mover code. |
ferc_license_id |
string |
The FERC license ID of a project. |
fraction_owned |
number |
Proportion of generator ownership attributable to this utility. |
fuel_cost_per_mmbtu_eia |
number |
Average fuel cost per MMBTU of heat content in nominal USD. |
fuel_cost_per_mmbtu_ferc1 |
number |
Average fuel cost per MMBTU of heat content in nominal USD. |
fuel_cost_per_mwh_eia |
number |
Derived from MCOE, a unit level value. Average fuel cost per MWh of heat content in nominal USD. |
fuel_cost_per_mwh_ferc1 |
number |
Derived from MCOE, a unit level value. Average fuel cost per MWh of heat content in nominal USD. |
fuel_type |
string |
Type of fuel. |
fuel_type_code_pudl_eia |
string |
Simplified fuel type code used in PUDL |
fuel_type_code_pudl_ferc1 |
string |
Simplified fuel type code used in PUDL |
generator_id |
string |
Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string! |
generator_operating_year |
integer |
Year a generator went into service. |
installation_year_eia |
integer |
Year the plant’s most recently built unit was installed. |
installation_year_ferc1 |
integer |
Year the plant’s most recently built unit was installed. |
license_id_ferc1 |
integer |
FERC issued operating license ID for the facility, if available. This value is extracted from the original plant name where possible. |
match_type |
string |
Indicates the source and validation of the match between EIA and FERC. Match types include matches was generated from the model, verified by the training data, overridden by the training data, etc. |
net_capacity_adverse_conditions_mw |
number |
Net plant capability under the least favorable operating conditions, in megawatts. |
net_capacity_favorable_conditions_mw |
number |
Net plant capability under the most favorable operating conditions, in megawatts. |
net_generation_mwh_eia |
number |
Net electricity generation for the specified period in megawatt-hours (MWh). |
net_generation_mwh_ferc1 |
number |
Net electricity generation for the specified period in megawatt-hours (MWh). |
net_load_mwh |
number |
Net output for load (net generation - energy used for pumping) in megawatt-hours. |
not_water_limited_capacity_mw |
number |
Plant capacity in MW when not limited by condenser water. |
operational_status |
string |
The operating status of the asset. For generators this is based on which tab the generator was listed in in EIA 860. |
operational_status_pudl |
string |
The operating status of the asset using PUDL categories. |
opex_allowances |
number |
Allowances. |
opex_boiler |
number |
Maintenance of boiler (or reactor) plant. |
opex_coolants |
number |
Cost of coolants and water (nuclear plants only) |
opex_dams |
number |
Production expenses: maintenance of reservoirs, dams, and waterways (USD). |
opex_electric |
number |
Production expenses: electric expenses (USD). |
opex_engineering |
number |
Production expenses: maintenance, supervision, and engineering (USD). |
opex_fuel |
number |
Production expenses: fuel (USD). |
opex_generation_misc |
number |
Production expenses: miscellaneous power generation expenses (USD). |
opex_hydraulic |
number |
Production expenses: hydraulic expenses (USD). |
opex_maintenance |
number |
Production expenses: Maintenance (USD). |
opex_misc_plant |
number |
Production expenses: maintenance of miscellaneous hydraulic plant (USD). |
opex_misc_power |
number |
Miscellaneous steam (or nuclear) expenses. |
opex_misc_steam |
number |
Maintenance of miscellaneous steam (or nuclear) plant. |
opex_nonfuel_per_mwh |
number |
Investments in non-fuel production expenses per Mwh. |
opex_operations |
number |
Production expenses: operations, supervision, and engineering (USD). |
opex_per_mwh |
number |
Total production expenses (USD per MWh generated). |
opex_plant |
number |
Production expenses: maintenance of electric plant (USD). |
opex_production_before_pumping |
number |
Total production expenses before pumping (USD). |
opex_production_total |
number |
Total operating expenses. |
opex_pumped_storage |
number |
Production expenses: pumped storage (USD). |
opex_pumping |
number |
Production expenses: We are here to PUMP YOU UP! (USD). |
opex_rents |
number |
Production expenses: rents (USD). |
opex_steam |
number |
Steam expenses. |
opex_steam_other |
number |
Steam from other sources. |
opex_structures |
number |
Production expenses: maintenance of structures (USD). |
opex_total |
number |
Total production expenses, excluding fuel (USD). |
opex_total_nonfuel |
number |
Total production expenses, excluding fuel (USD). |
opex_transfer |
number |
Steam transferred (Credit). |
opex_water_for_power |
number |
Production expenses: water for power (USD). |
ownership_dupe |
boolean |
Whether a plant part record has a duplicate record with different ownership status. |
ownership_record_type |
string |
Whether each generator record is for one owner or represents a total of all ownerships. |
peak_demand_mw |
number |
Net peak demand for 60 minutes. Note: in some cases peak demand for other time periods may have been reported instead, if hourly peak demand was unavailable. |
plant_capability_mw |
number |
Net plant capability in megawatts. |
plant_hours_connected_while_generating |
number |
Hours the plant was connected to load while generating in the report year. |
plant_id_eia |
integer |
The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration. |
plant_id_ferc1 |
integer |
Algorithmically assigned PUDL FERC Plant ID. WARNING: NOT STABLE BETWEEN PUDL DB INITIALIZATIONS. |
plant_id_pudl |
integer |
A manually assigned PUDL plant ID. May not be constant over time. |
plant_name_eia |
string |
Plant name. |
plant_name_ferc1 |
string |
Name of the plant, as reported to FERC. This is a freeform string, not guaranteed to be consistent across references to the same plant. |
plant_name_ppe |
string |
Derived plant name that includes EIA plant name and other strings associated with ID and PK columns of the plant part. |
plant_part |
string |
The part of the plant a record corresponds to. |
plant_part_id_eia |
string |
Contains EIA plant ID, plant part, ownership, and EIA utility id |
plant_type |
string |
Type of plant. |
prime_mover_code |
string |
Code for the type of prime mover (e.g. CT, CG) |
record_count |
integer |
Number of distinct generator IDs that participated in the aggregation for a plant part list record. |
record_id_eia |
string |
Identifier for EIA plant parts analysis records. |
record_id_ferc1 |
string |
Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped. |
report_date |
date |
Date reported. |
report_year |
integer |
Four-digit year in which the data was reported. |
technology_description |
string |
High level description of the technology used by the generator to produce electricity. |
total_fuel_cost_eia |
number |
Total annual reported fuel costs for the plant part. Includes costs from all fuels. |
total_fuel_cost_ferc1 |
number |
Total annual reported fuel costs for the plant part. Includes costs from all fuels. |
total_mmbtu_eia |
number |
Total annual heat content of fuel consumed by a plant part record in the plant parts list. |
total_mmbtu_ferc1 |
number |
Total annual heat content of fuel consumed by a plant part record in the plant parts list. |
true_gran |
boolean |
Indicates whether a plant part list record is associated with the highest priority plant part for all identical records. |
unit_heat_rate_mmbtu_per_mwh_eia |
number |
Fuel content per unit of electricity generated. Coming from MCOE calculation. |
unit_heat_rate_mmbtu_per_mwh_ferc1 |
number |
Fuel content per unit of electricity generated. Calculated from FERC reported fuel consumption and net generation. |
unit_id_pudl |
integer |
Dynamically assigned PUDL unit id. WARNING: This ID is not guaranteed to be static long term as the input data and algorithm may evolve over time. |
utility_id_eia |
integer |
The EIA Utility Identification number. |
utility_id_ferc1 |
integer |
PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year. |
utility_id_pudl |
integer |
A manually assigned PUDL utility ID. May not be stable over time. |
utility_name_ferc1 |
string |
Name of the responding utility, as it is reported in FERC Form 1. For human readability only. |
water_limited_capacity_mw |
number |
Plant capacity in MW when limited by condenser water. |
out_rus12__monthly_demand_and_energy_at_delivery_points#
Noticed any irregularities with this table? File a report.
Monthly time series of demand and energy delivered to distribution utilities.
- Most-recent data:
2021
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part H - Section E)
- Primary key:
report_date, borrower_id_rus, delivery_recipient
Usage Warnings#
Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus12__monthly_demand_and_energy_at_delivery_points.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus12__monthly_demand_and_energy_at_delivery_points.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus12__monthly_demand_and_energy_at_delivery_points.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus12__monthly_demand_and_energy_at_delivery_points.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
borrower_name_rus |
string |
The name of the RUS (Rural Utilities Service) borrower. |
delivered_demand_mw |
number |
The amount of demand delivered in MW. |
delivered_energy_mwh |
number |
The amount of energy delivered in MWh. |
delivery_recipient |
string |
The recipient of the delivered energy or demand. |
report_date |
date |
Date reported. |
state |
string |
Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands. |
out_rus12__monthly_demand_and_energy_at_power_sources#
Noticed any irregularities with this table? File a report.
Monthly time series of demand and energy at point of generation.
- Most-recent data:
2021
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part H - Section D)
- Primary key:
report_date, borrower_id_rus
Usage Warnings#
Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.
Additional Details#
There are some peak demand dates that have an hour component and others that don’t. This is because there is inconsistent reporting of peak_demand_hour in the raw data. If there is no hour reported, we record the hour as 0 and only the date will show in the data.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus12__monthly_demand_and_energy_at_power_sources.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus12__monthly_demand_and_energy_at_power_sources.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus12__monthly_demand_and_energy_at_power_sources.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus12__monthly_demand_and_energy_at_power_sources.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
borrower_name_rus |
string |
The name of the RUS (Rural Utilities Service) borrower. |
energy_output_mwh |
number |
The amount of energy output in MWh. |
is_peak_coincident |
boolean |
Whether or not the peak_mw is coincident or non-coincident peak. |
peak_demand_date |
datetime |
The date of the peak demand. |
peak_demand_mw |
number |
Net peak demand for 60 minutes. Note: in some cases peak demand for other time periods may have been reported instead, if hourly peak demand was unavailable. |
report_date |
date |
Date reported. |
state |
string |
Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands. |
out_rus12__yearly_balance_sheet_assets#
Noticed any irregularities with this table? File a report.
Annual time series of assets and other debts from the balance sheet.
- Most-recent data:
2021
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part A - Section B)
- Primary key:
report_date, borrower_id_rus, asset_type
Usage Warnings#
Some columns contain subtotals; use caution when choosing columns to aggregate.
Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.
Additional Details#
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus12__yearly_balance_sheet_assets.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus12__yearly_balance_sheet_assets.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus12__yearly_balance_sheet_assets.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus12__yearly_balance_sheet_assets.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
asset_type |
string |
Type of asset being reported to the core_ferc1__yearly_balance_sheet_assets_sched110 table. |
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
borrower_name_rus |
string |
The name of the RUS (Rural Utilities Service) borrower. |
ending_balance |
number |
Account balance at end of year. |
is_total |
boolean |
Whether or not this record represents a total. |
report_date |
date |
Date reported. |
state |
string |
Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands. |
out_rus12__yearly_balance_sheet_liabilities#
Noticed any irregularities with this table? File a report.
Annual time series of liabilities and other credits from the balance sheet.
- Most-recent data:
2021
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part A - Section B)
- Primary key:
report_date, borrower_id_rus, liability_type
Usage Warnings#
Some columns contain subtotals; use caution when choosing columns to aggregate.
Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.
Additional Details#
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus12__yearly_balance_sheet_liabilities.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus12__yearly_balance_sheet_liabilities.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus12__yearly_balance_sheet_liabilities.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus12__yearly_balance_sheet_liabilities.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
borrower_name_rus |
string |
The name of the RUS (Rural Utilities Service) borrower. |
ending_balance |
number |
Account balance at end of year. |
is_total |
boolean |
Whether or not this record represents a total. |
liability_type |
string |
Type of liability being reported to the core_ferc1__yearly_balance_sheet_liabilities_sched110 table. |
report_date |
date |
Date reported. |
state |
string |
Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands. |
out_rus12__yearly_depreciation_changes#
Noticed any irregularities with this table? File a report.
Annual time series of changes in accumulated depreciation.
- Most-recent data:
2021
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part H - Section B)
- Primary key:
report_date, borrower_id_rus, depreciation_and_amortization_group, depreciation_and_amortization_item
Usage Warnings#
Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus12__yearly_depreciation_changes.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus12__yearly_depreciation_changes.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus12__yearly_depreciation_changes.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus12__yearly_depreciation_changes.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
accruals |
number |
Value of additions into an asset class a.k.a accruals. |
adjustments_and_transfers |
number |
Amount of adjustments and transfers within a class of assets. |
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
borrower_name_rus |
string |
The name of the RUS (Rural Utilities Service) borrower. |
composite_depreciation_rate |
number |
The composite depreciation rate within a given category. This is typically expressed as a number between 0 and 100. |
depreciation_and_amortization_group |
string |
High-level category of depreciation and amortization items. |
depreciation_and_amortization_item |
string |
Category of depreciation and amortization items. |
ending_balance |
number |
Account balance at end of year. |
is_total |
boolean |
Whether or not this record represents a total. |
report_date |
date |
Date reported. |
retirements_less_net_salvage |
number |
Cost of retirements minus any net salvage value. |
state |
string |
Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands. |
out_rus12__yearly_depreciation_misc#
Noticed any irregularities with this table? File a report.
Annual time series of accumulated depreciation.
- Most-recent data:
2021
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part H - 2nd part of Section B)
- Primary key:
report_date, borrower_id_rus, depreciation_and_amortization_item
Usage Warnings#
Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.
Additional Details#
This is the second half of the depreciation table. It includes ending balances of six depreciation and amortization values.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus12__yearly_depreciation_misc.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus12__yearly_depreciation_misc.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus12__yearly_depreciation_misc.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus12__yearly_depreciation_misc.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
borrower_name_rus |
string |
The name of the RUS (Rural Utilities Service) borrower. |
depreciation_and_amortization_item |
string |
Category of depreciation and amortization items. |
ending_balance |
number |
Account balance at end of year. |
report_date |
date |
Date reported. |
state |
string |
Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands. |
out_rus12__yearly_external_financial_risk_ratio#
Noticed any irregularities with this table? File a report.
Annual time series of ratio of investments and loan guarantee balances to total utility plant assets.
- Most-recent data:
2021
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part H - Section F - Subsection III)
- Primary key:
report_date, borrower_id_rus
Usage Warnings#
Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus12__yearly_external_financial_risk_ratio.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus12__yearly_external_financial_risk_ratio.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus12__yearly_external_financial_risk_ratio.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus12__yearly_external_financial_risk_ratio.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
borrower_name_rus |
string |
The name of the RUS (Rural Utilities Service) borrower. |
external_financial_risk_ratio |
number |
total investments + loan guarantee balances / total utility plant assets. This ratio shows how much a utility is financially exposed to outside entities relative to its own assets. |
report_date |
date |
Date reported. |
state |
string |
Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands. |
out_rus12__yearly_investments#
Noticed any irregularities with this table? File a report.
Annual time series of investments, loan guarantees and loans.
- Most-recent data:
2021
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part H - Section F, Sub-section I)
- Primary key:
This table has no primary key. This is a list of all investments or loans in each year and borrowers can have multiple records with the same
investment_description.
Usage Warnings#
Some columns contain subtotals; use caution when choosing columns to aggregate.
Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.
Additional Details#
Reporting of investments is required by 7 CFR 1717, Subpart N. Investment categories reported on this Part correspond to Balance Sheet items in Part A Section B in table core_rus12__yearly_balance_sheet_assets.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus12__yearly_investments.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus12__yearly_investments.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus12__yearly_investments.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus12__yearly_investments.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
borrower_name_rus |
string |
The name of the RUS (Rural Utilities Service) borrower. |
excluded_investments |
number |
Excluded investment. |
for_rural_development |
boolean |
Whether or not the investment or loan is for rural development. This includes investments in any/all types of projects or products that were made to improve the economy and/or quality of life in the specified area. |
included_investments |
number |
Included investment. |
income_or_loss |
number |
Income or loss from investment. |
investment_description |
string |
Description of investment. This is a free-form text field and thus contains a wide variety of values. |
investment_type_code |
integer |
Investment type code. |
report_date |
date |
Date reported. |
state |
string |
Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands. |
out_rus12__yearly_lines_stations_labor_materials_cost#
Noticed any irregularities with this table? File a report.
Annual time series of labor and material cost for lines and stations operated by RUS borrowers.
- Most-recent data:
2021
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part I - Section C)
- Primary key:
report_date, borrower_id_rus, labor_or_material, operation_or_maintenance, lines_or_stations
Usage Warnings#
Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus12__yearly_lines_stations_labor_materials_cost.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus12__yearly_lines_stations_labor_materials_cost.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus12__yearly_lines_stations_labor_materials_cost.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus12__yearly_lines_stations_labor_materials_cost.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
borrower_name_rus |
string |
The name of the RUS (Rural Utilities Service) borrower. |
cost |
number |
Cost value. |
labor_or_material |
string |
Indicates whether the cost reported is for labor or material. |
lines_or_stations |
string |
Indicates whether the cost reported pertains to lines or stations. |
operation_or_maintenance |
string |
Indicates whether the expenditure is for operation or maintenance. |
report_date |
date |
Date reported. |
state |
string |
Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands. |
out_rus12__yearly_loans#
Noticed any irregularities with this table? File a report.
Annual time series of loans provided by RUS borrowers.
- Most-recent data:
2021
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part H - Section F - Subsections II & IV)
- Primary key:
This table has no primary key. This table has no primary key because some borrowers report multiple loan values from the same entity in a given year.
Usage Warnings#
Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.
Additional Details#
This table also includes loan guarantees where the RUS borrower backs a loan from another entity and is therefore liable to pay any remaining balance should the original borrower default.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus12__yearly_loans.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus12__yearly_loans.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus12__yearly_loans.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus12__yearly_loans.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
borrower_name_rus |
string |
The name of the RUS (Rural Utilities Service) borrower. |
for_rural_development |
boolean |
Whether or not the investment or loan is for rural development. This includes investments in any/all types of projects or products that were made to improve the economy and/or quality of life in the specified area. |
is_loan_guarantee |
boolean |
Indicates a third-party loan that the reporting utility (referred as a borrower) has co-signed, taking on responsibility for repayment if the primary borrower defaults. |
loan_balance |
number |
The amount of money still owned on a loan at the end of the reporting year. |
loan_maturity_date |
date |
The date on which a loan is scheduled to be fully paid. |
loan_original_amount |
number |
The original amount of a loan. |
loan_recipient |
string |
The organization that received a loan. |
report_date |
date |
Date reported. |
state |
string |
Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands. |
out_rus12__yearly_long_term_debt#
Noticed any irregularities with this table? File a report.
Annual time series of long-term debt and debt service requirements for RUS borrowers.
- Most-recent data:
2021
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part H - Section H)
- Primary key:
This table has no primary key. This table has no primary key because some borrowers report multiple debt values from the same entity in a given year.
Usage Warnings#
Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus12__yearly_long_term_debt.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus12__yearly_long_term_debt.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus12__yearly_long_term_debt.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus12__yearly_long_term_debt.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
borrower_name_rus |
string |
The name of the RUS (Rural Utilities Service) borrower. |
debt_description |
string |
Description of debt or loan. On the original form, there are nine provided descriptions and a section to add other free-form descriptions. |
debt_ending_balance |
number |
The amount of principal still owned on the debt at the end of the report year. |
debt_interest |
number |
The interest expense on the debt for the report year. |
debt_principal |
number |
The principal paid on the debt during the report year. |
debt_total |
number |
The total amount of debt. |
report_date |
date |
Date reported. |
state |
string |
Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands. |
out_rus12__yearly_meeting_and_board#
Noticed any irregularities with this table? File a report.
Annual time series of annual meeting and board information for RUS borrowers.
- Most-recent data:
2021
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part H - Section I)
- Primary key:
report_date, borrower_id_rus
Usage Warnings#
Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus12__yearly_meeting_and_board.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus12__yearly_meeting_and_board.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus12__yearly_meeting_and_board.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus12__yearly_meeting_and_board.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
board_members_num |
integer |
The total number of board members. |
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
borrower_name_rus |
string |
The name of the RUS (Rural Utilities Service) borrower. |
does_manager_have_written_contract |
boolean |
Whether or not the RUS borrower’s manager has a written contract. |
fees_and_expenses_for_board_members |
integer |
The total amount of fees and expenses for board members. |
last_annual_meeting_date |
datetime |
The date of the last annual meeting. |
members_num |
integer |
The total number of members. |
members_present_at_meeting_num |
integer |
The number of members present at the last annual meeting. |
members_voting_by_proxy_or_mail_num |
integer |
The number of members voting by mail or by proxy. |
report_date |
date |
Date reported. |
state |
string |
Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands. |
was_quorum_present |
boolean |
Whether or not quorum was met. |
out_rus12__yearly_non_utility_plant_changes#
Noticed any irregularities with this table? File a report.
Annual time series of changes in non-utility plant.
- Most-recent data:
2021
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part H - Section C)
- Primary key:
report_date, borrower_id_rus, non_utility_plant_item
Usage Warnings#
Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus12__yearly_non_utility_plant_changes.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus12__yearly_non_utility_plant_changes.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus12__yearly_non_utility_plant_changes.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus12__yearly_non_utility_plant_changes.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
additions |
number |
Cost of acquisition of items classified within the account. |
adjustments_and_transfers |
number |
Amount of adjustments and transfers within a class of assets. |
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
borrower_name_rus |
string |
The name of the RUS (Rural Utilities Service) borrower. |
ending_balance |
number |
Account balance at end of year. |
non_utility_plant_item |
string |
Category describing non-utility plant asset items. |
report_date |
date |
Date reported. |
retirements |
number |
Cost of disposal of items classified within the account. |
state |
string |
Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands. |
out_rus12__yearly_plant_costs#
Noticed any irregularities with this table? File a report.
Annual time series of costs of net energy generated by plant.
- Most-recent data:
2021
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part F - Section D)
- Primary key:
This table has no primary key. This table has no primary key because there is one plant (Walter Scott) that has duplicate records every year. Based on other RUS-12 tables, it is likely that these duplicate records are the borrowers’ ownership portion of the plant as well as the total plant but there is no clear indication in this table.
The primary key of this table otherwise would be: [‘report_date’, ‘borrower_id_rus’, ‘plant_name_rus’, ‘cost_group’, ‘cost_type’].
Usage Warnings#
Some columns contain subtotals; use caution when choosing columns to aggregate.
Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.
Additional Details#
The cost column in this table is expected to be largely non-null, the cost_per_mwh and cost_per_mmbtu columns only apply to some cost_type’s and even plant_type’s and thus are expected to contain many nulls.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus12__yearly_plant_costs.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus12__yearly_plant_costs.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus12__yearly_plant_costs.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus12__yearly_plant_costs.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
borrower_name_rus |
string |
The name of the RUS (Rural Utilities Service) borrower. |
cost |
number |
Cost value. |
cost_group |
string |
High-level category of cost type. |
cost_per_mmbtu |
number |
Unit cost of energy production in cost per MMBTU |
cost_per_mwh |
number |
Unit cost of energy production in cost per MWh |
cost_type |
string |
Detailed category of cost type. |
is_total |
boolean |
Whether or not this record represents a total. |
plant_name_rus |
string |
Name of the plant as reported to RUS. |
plant_type |
string |
Type of plant. |
report_date |
date |
Date reported. |
state |
string |
Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands. |
out_rus12__yearly_plant_factors_and_maximum_demand#
Noticed any irregularities with this table? File a report.
Annual time series of plant factors and maximum demand for plants owned by RUS borrowers.
- Most-recent data:
2021
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Parts D, E, F IC, F CC, G - Section C)
- Primary key:
This table has no primary key. The primary key for this table would be report_date, borrower_id_rus, plant_name_rus, and plant_type but the plant_type column is not reported before 2009 causing lots of duplicate records. Because none of the other fields are static year-over-year, there is no reliable way to backfill the plant_type column in instances where there are different plant_types reported for the same plant in later years.
Usage Warnings#
Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.
Additional Details#
The plant_type field was only included in the data starting in 2009. We backfill records with only one reported plant_type per borrower/plant name/year. Records with multiple plant_types per borrower/plant name/year are left with null plant_type because there is no reliable way to determine which plant_type is associated with which record back in time. Plants who’s names are reported differently pre and post 2009 will also miss being backfilled.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus12__yearly_plant_factors_and_maximum_demand.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus12__yearly_plant_factors_and_maximum_demand.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus12__yearly_plant_factors_and_maximum_demand.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus12__yearly_plant_factors_and_maximum_demand.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
borrower_name_rus |
string |
The name of the RUS (Rural Utilities Service) borrower. |
capacity_factor |
number |
Fraction of potential generation that was actually reported for a plant part. Energy generated over time period / nameplate capacity * time period (hours/years/etc.). |
capacity_factor_running |
number |
Fraction of potential generation over the time period a plant was in operation. Energy generated over time period / nameplate capacity * time period (hours/years/etc.). |
load_factor |
number |
Fraction of consumption vs demand reported for a plant over a given timeframe. Energy consumed over time period / peak demand * time period (hours/years/etc.). |
peak_gross_demand_mw |
number |
The highest average power output recorded over any single 15 minute interval during the reporting period. |
peak_gross_demand_nameplate_mw |
number |
The theoretical or nameplate peak the plant could produce under the best operating conditions during the reporting period. |
plant_name_rus |
string |
Name of the plant as reported to RUS. |
plant_type |
string |
Type of plant. |
report_date |
date |
Date reported. |
state |
string |
Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands. |
out_rus12__yearly_plant_labor#
Noticed any irregularities with this table? File a report.
Annual time series of labor and payroll information for plants owned by RUS borrowers.
- Most-recent data:
2021
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Parts D, E, F, G - Section B)
- Primary key:
This table has no primary key. The primary key should be report_date, borrower_id_rus, plant_name_rus, and plant_type, but this table did not report plant_type before 2009 and there are respondents who report multiple rows per plant pre-2009. The data cannot be backfilled because there is no way to distinguish between duplicate rows pre-2009.
Usage Warnings#
Some columns contain subtotals; use caution when choosing columns to aggregate.
Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.
Additional Details#
Note the lack of plant_type pre-2009 leading to a lack of reliable primary keys.
Plant-level data from core_rus12__yearly_plant_operations_by_plant can be matched to data from this table, but you must also join this data usingthe borrower_id_rus column as well as the report_date and plant_name. This is because the core_rus12__yearly_plant_operations_by_plant table includes records from each plant owner. The RUS instructions note that:
you should use caution when using total plant data since there are cases where more than one Borrower shares units at the same plant which means that you will be getting duplicate plant total records (and there is no guarantee that the total plant records entered by two borrowers for the same plant will be identical).
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus12__yearly_plant_labor.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus12__yearly_plant_labor.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus12__yearly_plant_labor.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus12__yearly_plant_labor.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
borrower_name_rus |
string |
The name of the RUS (Rural Utilities Service) borrower. |
employee_hours_worked_total |
number |
Total number of hours worked by employees. |
employees_full_time_num |
integer |
Number of employees hired full-time for normal operations of the system. |
employees_part_time_num |
integer |
Number employees regularly employed on a part-time basis. Exclude employees hired for short periods of time to complete special jobs. |
payroll_maintenance |
number |
The amount of payroll spent on plant maintenance. |
payroll_operations |
number |
The amount of payroll spent on plant operations. |
payroll_other_accounts |
number |
The amount of plant payroll spent on accounts other than maintenance and operations. |
plant_name_rus |
string |
Name of the plant as reported to RUS. |
plant_type |
string |
Type of plant. |
report_date |
date |
Date reported. |
state |
string |
Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands. |
out_rus12__yearly_plant_operations_by_borrower#
Noticed any irregularities with this table? File a report.
Annual time series of borrower portion of plant operational data including fuel consumption and operational hours.
- Most-recent data:
2021
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part D, E, F (CC), F (IC) & G - Section A)
- Primary key:
This table has no primary key. This table has no primary key because there are a handful of plants that have duplicate records. The primary key of this table otherwise would be: [report_date, borrower_id_rus, plant_name_rus, plant_name_rus, unit_id_rus, plant_type, is_full_ownership_portion, is_partly_owned_by_borrower].
Usage Warnings#
Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.
Additional Details#
The data in this table comes from five different portions of RUS 12 corresponding to different plant types (steam, hydroelectric, combined_cycle, internal_combustion and nuclear). The original form includes plant operations data for each plant type reported in multiple ways: with records corresponding to the portion of plants that borrowers own as well as the whole plant. We split these records into two tables, core_rus12__yearly_plant_operations_by_borrower, which contains records reported at the borrower level, and core_rus12__yearly_plant_operations_by_plant, which contains records reported at the plant level.Records that are wholly owned by one borrower show up in both core_rus12__yearly_plant_operations_by_borrower and core_rus12__yearly_plant_operations_by_plant.
There are two boolean columns used to delineate which records are associated with the borrowers’ share vs the whole plant - which is documented in _OR_PowerSupply Plant File Documentation.rtf in the newer years in the RUS 12 archive. One of these two fields - is_partly_owned_by_borrower - was not reported before 2009. For the pre-2009 years, we assume that all records that report TRUE for is_full_ownership_portion should end up in the by-plant table while all records should end up in the by-borrower portion of the table.Like the post-2009 records, this involves records from the original tables ending up in both of these PUDL tables.
Also there are two cleaning steps that we performed that alter the original data slightly:
One dropped with unexpected ownership label and duplicate data: There is a Wisdom steam plant record that is labeled to be both fully owned by borrower and partly owned for one year, which is an unexpected combo based on the _OR_PowerSupply Plant File Documentation.rst documentation file in the rus12 archive. Luckily this plant has exactly the same records as the other Wisdom steam plant that year with more expected ownership labels.* Replaced two string values for unit_id_rus: There are two instances from 2018 of unit_id_rus’s that have string values in them -
WSL GT 12andWSL ST 10. Based on pre-cleaned data, we were able to clearly identify that we can use just the numeric values in these bad strings - 12 and 10 respectively. This enables us to have an integer type for this unit_id_rus column.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus12__yearly_plant_operations_by_borrower.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus12__yearly_plant_operations_by_borrower.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus12__yearly_plant_operations_by_borrower.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus12__yearly_plant_operations_by_borrower.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
borrower_name_rus |
string |
The name of the RUS (Rural Utilities Service) borrower. |
capacity_mw |
number |
Total installed (nameplate) capacity, in megawatts. |
fuel_consumption_coal_lbs |
number |
Annual pounds of coal consumed for fuel.This field is only reported for plant_type steam. |
fuel_consumption_gas_cubic_feet |
number |
Annual cubic feet of natural gas consumed for fuel.This field is only reported for plant_type’s combined_cycle, combined_cycle and steam. |
fuel_consumption_oil_gallons |
number |
Annual gallons of oil consumed for fuel.This field is only reported for plant_type’s combined_cycle, combined_cycle and steam. |
fuel_consumption_other |
number |
Annual other fuel consumed. Neither units nor type of fuel are documented.This field is only reported for plant_type’s combined_cycle, combined_cycle and steam. |
gross_generation_mwh |
number |
Gross electricity generation for the specified period in megawatt-hours (MWh). |
is_full_ownership_portion |
boolean |
Whether or not the plant record represents the full plant - regardless of whether its fully owned by the borrower. |
is_partly_owned_by_borrower |
boolean |
Whether or not the plant record is partially owned by the borrower. This column was not reported before 2009. |
operating_hours_in_service |
number |
Number of operating hours in service. |
operating_hours_on_standby |
number |
Number of operating hours on standby. |
operating_hours_out_of_service_scheduled |
number |
Number of operating hours out of service which were scheduled. |
operating_hours_out_of_service_unscheduled |
number |
Number of operating hours out of service which were unscheduled. |
ownership_pct |
number |
Percentage of the plant owned by the respondent. |
plant_name_rus |
string |
Name of the plant as reported to RUS. |
plant_type |
string |
Type of plant. |
report_date |
date |
Date reported. |
state |
string |
Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands. |
times_started |
number |
Number of times the plant was started. This field is only reported for plant_type’s steam and nuclear. |
unit_id_rus |
integer |
RUS-assigned unit identification code. |
out_rus12__yearly_plant_operations_by_plant#
Noticed any irregularities with this table? File a report.
Annual time series of whole plant operational data including fuel consumption and operational hours.
- Most-recent data:
2021
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part D, E, F (CC), F (IC) & G - Section A)
- Primary key:
This table has no primary key. This table has no primary key because there are a handful of plants that have duplicate records. The primary key of this table otherwise would be: [report_date, borrower_id_rus, plant_name_rus, plant_name_rus, unit_id_rus, plant_type, is_full_ownership_portion, is_partly_owned_by_borrower].
Usage Warnings#
Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.
Additional Details#
The data in this table comes from five different portions of RUS 12 corresponding to different plant types (steam, hydroelectric, combined_cycle, internal_combustion and nuclear). The original form includes plant operations data for each plant type reported in multiple ways: with records corresponding to the portion of plants that borrowers own as well as the whole plant. We split these records into two tables, core_rus12__yearly_plant_operations_by_borrower, which contains records reported at the borrower level, and core_rus12__yearly_plant_operations_by_plant, which contains records reported at the plant level.Records that are wholly owned by one borrower show up in both core_rus12__yearly_plant_operations_by_borrower and core_rus12__yearly_plant_operations_by_plant.
There are two boolean columns used to delineate which records are associated with the borrowers’ share vs the whole plant - which is documented in _OR_PowerSupply Plant File Documentation.rtf in the newer years in the RUS 12 archive. One of these two fields - is_partly_owned_by_borrower - was not reported before 2009. For the pre-2009 years, we assume that all records that report TRUE for is_full_ownership_portion should end up in the by-plant table while all records should end up in the by-borrower portion of the table.Like the post-2009 records, this involves records from the original tables ending up in both of these PUDL tables.
Also there are two cleaning steps that we performed that alter the original data slightly:
One dropped with unexpected ownership label and duplicate data: There is a Wisdom steam plant record that is labeled to be both fully owned by borrower and partly owned for one year, which is an unexpected combo based on the _OR_PowerSupply Plant File Documentation.rst documentation file in the rus12 archive. Luckily this plant has exactly the same records as the other Wisdom steam plant that year with more expected ownership labels.* Replaced two string values for unit_id_rus: There are two instances from 2018 of unit_id_rus’s that have string values in them -
WSL GT 12andWSL ST 10. Based on pre-cleaned data, we were able to clearly identify that we can use just the numeric values in these bad strings - 12 and 10 respectively. This enables us to have an integer type for this unit_id_rus column..
Plant-level data can be matched to the :ref:core_rus12__yearly_plant_labor` and forthcoming core_rus12__yearly_plant_factors_and_maximum_demand, which report Section B and C data for all plants. The RUS instructions note that:
you should use caution when using total plant data since there are cases where more than one Borrower shares units at the same plant which means that you will be getting duplicate plant total records (and there is no guarantee that the total plant records entered by two borrowers for the same plant will be identical).
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus12__yearly_plant_operations_by_plant.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus12__yearly_plant_operations_by_plant.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus12__yearly_plant_operations_by_plant.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus12__yearly_plant_operations_by_plant.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
borrower_name_rus |
string |
The name of the RUS (Rural Utilities Service) borrower. |
capacity_mw |
number |
Total installed (nameplate) capacity, in megawatts. |
fuel_consumption_coal_lbs |
number |
Annual pounds of coal consumed for fuel.This field is only reported for plant_type steam. |
fuel_consumption_gas_cubic_feet |
number |
Annual cubic feet of natural gas consumed for fuel.This field is only reported for plant_type’s combined_cycle, combined_cycle and steam. |
fuel_consumption_oil_gallons |
number |
Annual gallons of oil consumed for fuel.This field is only reported for plant_type’s combined_cycle, combined_cycle and steam. |
fuel_consumption_other |
number |
Annual other fuel consumed. Neither units nor type of fuel are documented.This field is only reported for plant_type’s combined_cycle, combined_cycle and steam. |
gross_generation_mwh |
number |
Gross electricity generation for the specified period in megawatt-hours (MWh). |
is_partly_owned_by_borrower |
boolean |
Whether or not the plant record is partially owned by the borrower. This column was not reported before 2009. |
operating_hours_in_service |
number |
Number of operating hours in service. |
operating_hours_on_standby |
number |
Number of operating hours on standby. |
operating_hours_out_of_service_scheduled |
number |
Number of operating hours out of service which were scheduled. |
operating_hours_out_of_service_unscheduled |
number |
Number of operating hours out of service which were unscheduled. |
ownership_pct |
number |
Percentage of the plant owned by the respondent. |
plant_name_rus |
string |
Name of the plant as reported to RUS. |
plant_type |
string |
Type of plant. |
report_date |
date |
Date reported. |
state |
string |
Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands. |
times_started |
number |
Number of times the plant was started. This field is only reported for plant_type’s steam and nuclear. |
unit_id_rus |
integer |
RUS-assigned unit identification code. |
out_rus12__yearly_renewable_plants#
Noticed any irregularities with this table? File a report.
Annual time series of renewable energy plant generation information for RUS borrowers.
- Most-recent data:
2021
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part C RE)
- Primary key:
report_date, borrower_id_rus, plant_name_rus, prime_mover_id
Usage Warnings#
Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus12__yearly_renewable_plants.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus12__yearly_renewable_plants.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus12__yearly_renewable_plants.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus12__yearly_renewable_plants.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
borrower_name_rus |
string |
The name of the RUS (Rural Utilities Service) borrower. |
capacity_factor |
number |
Fraction of potential generation that was actually reported for a plant part. Energy generated over time period / nameplate capacity * time period (hours/years/etc.). |
capacity_mw |
number |
Total installed (nameplate) capacity, in megawatts. |
comments |
string |
General comments field. |
employees_num |
integer |
Number of employees. |
invested |
number |
The amount of money invested. |
net_generation_mwh |
number |
Net electricity generation for the specified period in megawatt-hours (MWh). |
opex_per_mwh |
number |
Total production expenses (USD per MWh generated). |
ownership_pct |
number |
Percentage of the plant owned by the respondent. |
plant_name_rus |
string |
Name of the plant as reported to RUS. |
power_cost_per_mwh |
number |
The cost of power per mwh. |
primary_renewable_fuel_type |
string |
Primary renewable fuel type used by the plant. |
primary_renewable_fuel_type_id |
integer |
Unique numeric identifier for each renewable fuel type. |
prime_mover_id |
integer |
Unique numeric identifier for each prime mover type used by RUS borrowers. |
prime_mover_type |
string |
Type of prime mover (e.g. Hydro, Internal Combustion). |
renewable_fuel_pct |
number |
Percentage of renewable fuel used. |
report_date |
date |
Date reported. |
rus_funding |
number |
Amount of funding received from the Rural Utilities Service (RUS). |
state |
string |
Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands. |
out_rus12__yearly_sources_and_distribution#
Noticed any irregularities with this table? File a report.
Annual time series of MWh and cost of energy sources and distribution by RUS borrowers.
- Most-recent data:
2021
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part C)
- Primary key:
report_date, borrower_id_rus, source_of_energy
Usage Warnings#
Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.
Additional Details#
See the sources_and_distribution_by_plant_type table for a breakdown of plant-type-specific cost, capacity, plant_num, and net_energy_received values. Also note that there are several source_of_energy values that don’t have a corresponding cost value.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus12__yearly_sources_and_distribution.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus12__yearly_sources_and_distribution.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus12__yearly_sources_and_distribution.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus12__yearly_sources_and_distribution.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
borrower_name_rus |
string |
The name of the RUS (Rural Utilities Service) borrower. |
cost |
number |
Cost value. |
net_energy_received_mwh |
number |
The net amount of energy received into the system. |
report_date |
date |
Date reported. |
source_of_energy |
string |
The source of energy (not plant type). |
state |
string |
Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands. |
out_rus12__yearly_sources_and_distribution_by_plant_type#
Noticed any irregularities with this table? File a report.
Annual time series of capacity, plant num, MWh, and cost of energy by plant type for RUS borrowers.
- Most-recent data:
2021
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part C)
- Primary key:
report_date, borrower_id_rus, plant_type
Usage Warnings#
Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.
Additional Details#
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus12__yearly_sources_and_distribution_by_plant_type.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus12__yearly_sources_and_distribution_by_plant_type.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus12__yearly_sources_and_distribution_by_plant_type.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus12__yearly_sources_and_distribution_by_plant_type.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
borrower_name_rus |
string |
The name of the RUS (Rural Utilities Service) borrower. |
capacity_mw |
number |
Total installed (nameplate) capacity, in megawatts. |
cost |
number |
Cost value. |
net_energy_received_mwh |
number |
The net amount of energy received into the system. |
plant_num |
integer |
Number of plants. |
plant_type |
string |
Type of plant. |
report_date |
date |
Date reported. |
state |
string |
Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands. |
out_rus12__yearly_statement_of_operations#
Noticed any irregularities with this table? File a report.
Annual time series of opex and cost of electric service for RUS borrowers by time period.
- Most-recent data:
2021
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part A - Section A)
- Primary key:
report_date, borrower_id_rus, opex_group, opex_type
Usage Warnings#
Some columns contain subtotals; use caution when choosing columns to aggregate.
Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus12__yearly_statement_of_operations.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus12__yearly_statement_of_operations.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus12__yearly_statement_of_operations.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus12__yearly_statement_of_operations.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
borrower_name_rus |
string |
The name of the RUS (Rural Utilities Service) borrower. |
is_total |
boolean |
Whether or not this record represents a total. |
opex_group |
string |
High level section from the statement of operations table. Most of these types have subcomponents broken out in the |
opex_report_month |
number |
Amount of operational expense, cost or income during the report month. |
opex_type |
string |
Type of item from the statement of operations. |
opex_ytd |
number |
The year-to-date amount of operational expense, cost or income. |
opex_ytd_budget |
number |
The year-to-date budget for amount of operational expense, cost or income. |
report_date |
date |
Date reported. |
state |
string |
Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands. |
out_rus12__yearly_utility_plant_changes#
Noticed any irregularities with this table? File a report.
Annual time series of changes in utility plant
- Most-recent data:
2021
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part H - Section A)
- Primary key:
report_date, borrower_id_rus, utility_plant_group, utility_plant_item
Usage Warnings#
Some columns contain subtotals; use caution when choosing columns to aggregate.
Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.
Additional Details#
There are a few layers of nesting in this table, which can be distinguished by the categories in the utility_plant_group column:
intangible_plant: only has one total utility_plant_item
production_plant: has several sub-component utility_plant_item and a standard total
transmission_plant: has several sub-component utility_plant_item and a standard total
distribution_plant: has several sub-component utility_plant_item and a standard total
electric_plant_in_service: has two sub-components and a total which includes those sub-components and the totals of the above categories
utility_plant_in_service: has several sub-components and the total is the sum of those sub-components as well as the total of electric_plant_in_service
total_utility_plant: has one sub-component and the total is the sum of the sub-component and utility_plant_in_service
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus12__yearly_utility_plant_changes.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus12__yearly_utility_plant_changes.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus12__yearly_utility_plant_changes.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus12__yearly_utility_plant_changes.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
additions |
number |
Cost of acquisition of items classified within the account. |
adjustments_and_transfers |
number |
Amount of adjustments and transfers within a class of assets. |
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
borrower_name_rus |
string |
The name of the RUS (Rural Utilities Service) borrower. |
ending_balance |
number |
Account balance at end of year. |
is_total |
boolean |
Whether or not this record represents a total. |
report_date |
date |
Date reported. |
retirements |
number |
Cost of disposal of items classified within the account. |
state |
string |
Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands. |
utility_plant_group |
string |
High-level category of utility plant asset type. |
utility_plant_item |
string |
Sub-category of utility_plant_group describing utility plant asset item. |
out_rus7__yearly_balance_sheet_assets#
Noticed any irregularities with this table? File a report.
Annual time series of assets and other debts from the balance sheet.
- Most-recent data:
2021
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part C)
- Primary key:
report_date, borrower_id_rus, asset_type
Usage Warnings#
Some columns contain subtotals; use caution when choosing columns to aggregate.
Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus7__yearly_balance_sheet_assets.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus7__yearly_balance_sheet_assets.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus7__yearly_balance_sheet_assets.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus7__yearly_balance_sheet_assets.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
asset_type |
string |
Type of asset being reported to the core_ferc1__yearly_balance_sheet_assets_sched110 table. |
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
borrower_name_rus |
string |
The name of the RUS (Rural Utilities Service) borrower. |
ending_balance |
number |
Account balance at end of year. |
is_total |
boolean |
Whether or not this record represents a total. |
report_date |
date |
Date reported. |
state |
string |
Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands. |
out_rus7__yearly_balance_sheet_liabilities#
Noticed any irregularities with this table? File a report.
Annual time series of liabilities and other credits from the balance sheet.
- Most-recent data:
2021
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part C)
- Primary key:
report_date, borrower_id_rus, liability_type
Usage Warnings#
Some columns contain subtotals; use caution when choosing columns to aggregate.
Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus7__yearly_balance_sheet_liabilities.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus7__yearly_balance_sheet_liabilities.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus7__yearly_balance_sheet_liabilities.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus7__yearly_balance_sheet_liabilities.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
borrower_name_rus |
string |
The name of the RUS (Rural Utilities Service) borrower. |
ending_balance |
number |
Account balance at end of year. |
is_total |
boolean |
Whether or not this record represents a total. |
liability_type |
string |
Type of liability being reported to the core_ferc1__yearly_balance_sheet_liabilities_sched110 table. |
report_date |
date |
Date reported. |
state |
string |
Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands. |
out_rus7__yearly_customer_energy_efficiency_and_conservation_loans#
Noticed any irregularities with this table? File a report.
Annual time series of the repayment status of loans made by a borrower to customers for investments in energy efficiency and conservation initiatives.
- Most-recent data:
2021
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part J)
- Primary key:
report_date, borrower_id_rus, loan_status
Usage Warnings#
Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.
Additional Details#
Energy Resources Conservation (ERC) loans are defined by the RUS as loans made by an RUS borrower to its consumers for the cost of labor and materials for the following energy conservation measures: caulking, weather-stripping, ceiling insulation, wall insulation, floor insulation, duct insulation, pipe insulation, water heater insulation storm windows, thermal windows, storm or thermal doors, clock thermostats and attic ventilation fans.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus7__yearly_customer_energy_efficiency_and_conservation_loans.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus7__yearly_customer_energy_efficiency_and_conservation_loans.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus7__yearly_customer_energy_efficiency_and_conservation_loans.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus7__yearly_customer_energy_efficiency_and_conservation_loans.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
actual_pct |
number |
Observed percentage. |
anticipated_pct |
number |
Expected percentage. |
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
borrower_name_rus |
string |
The name of the RUS (Rural Utilities Service) borrower. |
loan_status |
string |
The repayment status of a loan. |
report_date |
date |
Date reported. |
state |
string |
Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands. |
ytd_dollars |
number |
Balance this current year, in U.S. dollars. |
out_rus7__yearly_distribution_services#
Noticed any irregularities with this table? File a report.
Annual time series of distribution services.
- Most-recent data:
2021
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part B)
- Primary key:
report_date, borrower_id_rus, service_status
Usage Warnings#
Some columns contain subtotals; use caution when choosing columns to aggregate.
Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus7__yearly_distribution_services.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus7__yearly_distribution_services.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus7__yearly_distribution_services.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus7__yearly_distribution_services.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
borrower_name_rus |
string |
The name of the RUS (Rural Utilities Service) borrower. |
is_total |
boolean |
Whether or not this record represents a total. |
report_date |
date |
Date reported. |
service_status |
string |
Status of services (e.g., idle, retired) in report period. Idle services exclude seasonals. |
services |
number |
Number of services in system at end of year. |
state |
string |
Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands. |
out_rus7__yearly_employee_statistics#
Noticed any irregularities with this table? File a report.
Annual time series of statistics about employment and payroll.
- Most-recent data:
2021
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part H)
- Primary key:
report_date, borrower_id_rus
Usage Warnings#
Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus7__yearly_employee_statistics.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus7__yearly_employee_statistics.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus7__yearly_employee_statistics.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus7__yearly_employee_statistics.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
borrower_name_rus |
string |
The name of the RUS (Rural Utilities Service) borrower. |
employee_hours_worked_over_time |
integer |
The number of overtime hours worked by employees. |
employee_hours_worked_regular_time |
integer |
The number of regular (non-overtime) hours worked by employees. |
employees_fte_num |
integer |
The number of full time employees. |
payroll_capitalized |
integer |
The amount of payroll spent that was capitalized. |
payroll_expensed |
integer |
The amount of payroll spent that was expensed. |
payroll_other |
integer |
The amount of payroll spent that was funded by other means - not capitalized or expensed. |
report_date |
date |
Date reported. |
state |
string |
Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands. |
out_rus7__yearly_energy_efficiency#
Noticed any irregularities with this table? File a report.
Annual time series of investments in and impacts of energy efficiency programs.
- Most-recent data:
2021
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part P)
- Primary key:
report_date, borrower_id_rus, customer_class, observation_period
Usage Warnings#
The savings_mmbtu likely contains values with incorrect units.
Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus7__yearly_energy_efficiency.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus7__yearly_energy_efficiency.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus7__yearly_energy_efficiency.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus7__yearly_energy_efficiency.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
borrower_name_rus |
string |
The name of the RUS (Rural Utilities Service) borrower. |
customer_class |
string |
High level categorization of customer type. |
customers_num |
number |
Number of customers. |
invested |
number |
The amount of money invested. |
observation_period |
string |
The date range that any given record pertains to. Ex: ‘december’ implies that this record covers the month of December only, while ‘avg’ implies this record pertains to the average of the reporting period. |
report_date |
date |
Date reported. |
savings_mmbtu |
number |
The estimated amount of energy savings from energy efficiency programs. Warning: We found values much larger than expected that we have not yet cleaned - this is likely a reporting unit error. |
state |
string |
Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands. |
out_rus7__yearly_energy_purchased#
Noticed any irregularities with this table? File a report.
Annual time series of energy purchased by RUS borrowers.
- Most-recent data:
2020
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part K)
- Primary key:
This table has no primary key. The primary key would probably be report_date, borrower_id_rus, fuel_type_code, supplier_code_rus, renewable_energy_program if not for certain EIA utilities represented as Miscellaneous (supplier code 700000).
Usage Warnings#
Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.
Additional Details#
A number of records report purchases from an unspecified entity. These fields all have reported utility names beginning with an *, including *Miscellaneous, *Adjustments, and *Residential Renewable Supplier.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus7__yearly_energy_purchased.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus7__yearly_energy_purchased.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus7__yearly_energy_purchased.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus7__yearly_energy_purchased.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
average_energy_cost_dollars_per_mwh |
number |
The average cost of energy per MWh. |
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
borrower_name_rus |
string |
The name of the RUS (Rural Utilities Service) borrower. |
comments |
string |
General comments field. |
fuel_cost_adjustment |
number |
The variable fuel surcharge component of a distribution cooperative’s wholesale purchased power bill, reflecting pass-through of actual fuel cost fluctuations from the supplying utility, reported separately from base power charges and wheeling costs. Included in the total cost. |
fuel_type_code_rus |
integer |
Unique numeric identifier for RUS fuel types. |
is_supplier_eia_respondent |
boolean |
Whether the utility supplying energy to a RUS borrower is an EIA respondent. |
purchased_energy_cost_total |
number |
The total cost of purchased energy. Includes fuel cost adjustment and wheeling and other charges. |
purchased_mwh |
number |
The total electricity purchased. |
report_date |
date |
Date reported. |
state |
string |
Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands. |
supplier_code_rus |
string |
Unique numeric identifier for the utility supplying energy to a RUS borrower. |
utility_name_eia |
string |
The name of the utility. |
wheeling_and_other_charges |
number |
The cost of wheeling and other charges or credits related to fuel. Included in the total cost. |
out_rus7__yearly_external_financial_risk_ratio#
Noticed any irregularities with this table? File a report.
Annual time series of ratio of investments and loan guarantee balances to total utility plant assets.
- Most-recent data:
2021
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part Q - Section III)
- Primary key:
report_date, borrower_id_rus
Usage Warnings#
Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus7__yearly_external_financial_risk_ratio.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus7__yearly_external_financial_risk_ratio.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus7__yearly_external_financial_risk_ratio.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus7__yearly_external_financial_risk_ratio.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
borrower_name_rus |
string |
The name of the RUS (Rural Utilities Service) borrower. |
external_financial_risk_ratio |
number |
total investments + loan guarantee balances / total utility plant assets. This ratio shows how much a utility is financially exposed to outside entities relative to its own assets. |
report_date |
date |
Date reported. |
state |
string |
Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands. |
out_rus7__yearly_investments#
Noticed any irregularities with this table? File a report.
Annual time series of investments, loan guarantees and loans.
- Most-recent data:
2021
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part P - Section I)
- Primary key:
This table has no primary key. This is a list of all investments or loans in each year and borrowers can have multiple records with the same
investment_description.
Usage Warnings#
Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.
Additional Details#
Reporting of investments is required by 7 CFR 1717, Subpart N. Investment categories reported on this Part correspond to Balance Sheet items in Part C.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus7__yearly_investments.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus7__yearly_investments.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus7__yearly_investments.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus7__yearly_investments.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
borrower_name_rus |
string |
The name of the RUS (Rural Utilities Service) borrower. |
excluded_investments |
number |
Excluded investment. |
for_rural_development |
boolean |
Whether or not the investment or loan is for rural development. This includes investments in any/all types of projects or products that were made to improve the economy and/or quality of life in the specified area. |
included_investments |
number |
Included investment. |
income_or_loss |
number |
Income or loss from investment. |
investment_description |
string |
Description of investment. This is a free-form text field and thus contains a wide variety of values. |
investment_type_code |
integer |
Investment type code. |
report_date |
date |
Date reported. |
state |
string |
Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands. |
out_rus7__yearly_loans#
Noticed any irregularities with this table? File a report.
Annual time series of loans provided by RUS borrowers.
- Most-recent data:
2021
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part Q - Sections II & IV)
- Primary key:
This table has no primary key. Borrowers may receive multiple loans from
lending_organizationsin a given year.
Usage Warnings#
Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.
Additional Details#
This table also includes loan guarantees where the RUS borrower backs a loan from another entity and is therefore liable to pay any remaining balance should the original borrower default.
In 2006, the loan maturity date for borrower ND0051’s loan from ERC - Paulson, David was reported as 2/8/2820. There is no clear way to determine the correct maturity date given that 2006 is the first year of data we have and the same loan does not appear in future years. For this reason we’ve nulled the date.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus7__yearly_loans.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus7__yearly_loans.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus7__yearly_loans.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus7__yearly_loans.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
borrower_name_rus |
string |
The name of the RUS (Rural Utilities Service) borrower. |
for_rural_development |
boolean |
Whether or not the investment or loan is for rural development. This includes investments in any/all types of projects or products that were made to improve the economy and/or quality of life in the specified area. |
is_loan_guarantee |
boolean |
Indicates a third-party loan that the reporting utility (referred as a borrower) has co-signed, taking on responsibility for repayment if the primary borrower defaults. |
loan_balance |
number |
The amount of money still owned on a loan at the end of the reporting year. |
loan_maturity_date |
date |
The date on which a loan is scheduled to be fully paid. |
loan_original_amount |
number |
The original amount of a loan. |
loan_recipient |
string |
The organization that received a loan. |
report_date |
date |
Date reported. |
state |
string |
Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands. |
out_rus7__yearly_long_term_debt#
Noticed any irregularities with this table? File a report.
Annual time series of long term debt and debt service requirements.
- Most-recent data:
2021
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part N)
- Primary key:
This table has no primary key. This table has no native primary key. It is a list of all debts in each year and borrowers can have multiple records with the same
investment_description.
Usage Warnings#
Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus7__yearly_long_term_debt.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus7__yearly_long_term_debt.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus7__yearly_long_term_debt.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus7__yearly_long_term_debt.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
borrower_name_rus |
string |
The name of the RUS (Rural Utilities Service) borrower. |
debt_description |
string |
Description of debt or loan. On the original form, there are nine provided descriptions and a section to add other free-form descriptions. |
debt_ending_balance |
number |
The amount of principal still owned on the debt at the end of the report year. |
debt_interest |
number |
The interest expense on the debt for the report year. |
debt_principal |
number |
The principal paid on the debt during the report year. |
debt_total |
number |
The total amount of debt. |
report_date |
date |
Date reported. |
state |
string |
Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands. |
out_rus7__yearly_long_term_leases#
Noticed any irregularities with this table? File a report.
Annual time series of long term leases by property type.
- Most-recent data:
2021
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part L)
- Primary key:
This table has no primary key. Borrowers may receive multiple leases from
lending_organizationsin a given year.
Usage Warnings#
Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus7__yearly_long_term_leases.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus7__yearly_long_term_leases.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus7__yearly_long_term_leases.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus7__yearly_long_term_leases.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
borrower_name_rus |
string |
The name of the RUS (Rural Utilities Service) borrower. |
lending_organization |
string |
The organization that provided a lease or loan. |
property_type |
string |
The type of property leased. |
rental_cost_ytd |
number |
Year-to-date rental cost for leased property. |
report_date |
date |
Date reported. |
state |
string |
Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands. |
out_rus7__yearly_materials_and_supplies#
Noticed any irregularities with this table? File a report.
Annual time series of cost of electric vs. other materials that were purchased, salvaged, used, or sold.
- Most-recent data:
2021
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part F)
- Primary key:
report_date, borrower_id_rus, electric_or_other_materials
Usage Warnings#
Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus7__yearly_materials_and_supplies.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus7__yearly_materials_and_supplies.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus7__yearly_materials_and_supplies.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus7__yearly_materials_and_supplies.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
borrower_name_rus |
string |
The name of the RUS (Rural Utilities Service) borrower. |
electric_or_other_materials |
string |
Whether the cost is for electric materials or other materials |
materials_adjustment |
number |
An adjustment value for the cost of materials and supplies. |
materials_ending_balance |
number |
The balance at the end of the report year for materials and supplies. |
materials_purchased |
number |
The cost of materials and supplies purchased. |
materials_salvaged |
number |
The cost of materials and supplies salvaged. |
materials_sold |
number |
The cost of materials and supplies sold. |
materials_used |
number |
The cost of materials and supplies used. |
report_date |
date |
Date reported. |
state |
string |
Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands. |
out_rus7__yearly_meeting_and_board#
Noticed any irregularities with this table? File a report.
Annual time series of governance information about RUS borrowers’ annual member meetings as well as information about their board.
- Most-recent data:
2021
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part M)
- Primary key:
report_date, borrower_id_rus
Usage Warnings#
Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus7__yearly_meeting_and_board.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus7__yearly_meeting_and_board.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus7__yearly_meeting_and_board.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus7__yearly_meeting_and_board.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
board_members_num |
integer |
The total number of board members. |
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
borrower_name_rus |
string |
The name of the RUS (Rural Utilities Service) borrower. |
does_manager_have_written_contract |
boolean |
Whether or not the RUS borrower’s manager has a written contract. |
fees_and_expenses_for_board_members |
integer |
The total amount of fees and expenses for board members. |
last_annual_meeting_date |
datetime |
The date of the last annual meeting. |
members_num |
integer |
The total number of members. |
members_present_at_meeting_num |
integer |
The number of members present at the last annual meeting. |
members_voting_by_proxy_or_mail_num |
integer |
The number of members voting by mail or by proxy. |
report_date |
date |
Date reported. |
state |
string |
Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands. |
was_quorum_present |
boolean |
Whether or not quorum was met. |
out_rus7__yearly_owed_by_customers#
Noticed any irregularities with this table? File a report.
Annual time series of debt owed by customers.
- Most-recent data:
2021
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part J)
- Primary key:
report_date, borrower_id_rus
Usage Warnings#
Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus7__yearly_owed_by_customers.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus7__yearly_owed_by_customers.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus7__yearly_owed_by_customers.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus7__yearly_owed_by_customers.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
amount_due_over_60_days |
number |
Reported amount of dollars due over 60 days from consumers for electric service. Includes both connected and disconnected customers. |
amount_written_off_ytd |
number |
Total charges due from consumers for electric service written off during the current year to Account 144.1, representing the write-off of uncollectible accounts. |
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
borrower_name_rus |
string |
The name of the RUS (Rural Utilities Service) borrower. |
report_date |
date |
Date reported. |
state |
string |
Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands. |
out_rus7__yearly_patronage_capital#
Noticed any irregularities with this table? File a report.
Annual time series of patronage capital distributed and received.
- Most-recent data:
2021
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part I)
- Primary key:
report_date, borrower_id_rus, patronage_type
Usage Warnings#
Some columns contain subtotals; use caution when choosing columns to aggregate.
Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus7__yearly_patronage_capital.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus7__yearly_patronage_capital.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus7__yearly_patronage_capital.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus7__yearly_patronage_capital.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
borrower_name_rus |
string |
The name of the RUS (Rural Utilities Service) borrower. |
is_total |
boolean |
Whether or not this record represents a total. |
patronage_cumulative |
number |
Amount of patronage distributed or received cumulatively. Received patronage capital is not reported cumulatively and thus will be null. |
patronage_report_year |
number |
Amount of patronage distributed or received within report year. |
patronage_type |
string |
Type of patronage capital distributed or received. |
report_date |
date |
Date reported. |
state |
string |
Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands. |
out_rus7__yearly_power_requirements#
Noticed any irregularities with this table? File a report.
Annual time series of power requirements - revenue and generation summary.
- Most-recent data:
2021
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part O)
- Primary key:
report_date, borrower_id_rus
Usage Warnings#
Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.
Additional Details#
This table includes totals of electric sales and revenue which also appear in core_rus7__yearly_power_requirements_electric_sales with a customer_class of total. This table includes all other power requirements - not broken out by customer class, so we include these electric requirements in this table as well.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus7__yearly_power_requirements.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus7__yearly_power_requirements.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus7__yearly_power_requirements.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus7__yearly_power_requirements.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
borrower_name_rus |
string |
The name of the RUS (Rural Utilities Service) borrower. |
electric_sales_mwh |
number |
Total MWh Sold to electric sales. Total of lines 1b thru 9b on the original form. |
electric_sales_revenue |
integer |
Total Revenue Received From Sales of Electric Energy. Total of lines 1c thru 9c on the original form. |
generated_mwh |
number |
The total electricity generated. |
interchange_mwh |
number |
The net interchange of electricity. The net amount of electricity exchanged in purchases and sales. |
is_peak_coincident |
boolean |
Whether or not the peak_mw is coincident or non-coincident peak. |
other_electric_revenue |
integer |
Electric revenue other than electric_sales_revenue. |
own_use_mwh |
number |
The electricity in MWh used for the borrower’s own internal use. |
peak_mw |
number |
The peak system MWh - the sum of all MW. |
purchased_mwh |
number |
The total electricity purchased. |
purchases_and_generation_cost |
integer |
The cost of purchases and generation of electricity. |
report_date |
date |
Date reported. |
state |
string |
Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands. |
transmission_revenue |
integer |
Transmission revenue. |
out_rus7__yearly_power_requirements_electric_customers#
Noticed any irregularities with this table? File a report.
Annual time series of power requirements - number of customers served by customer type.
- Most-recent data:
2021
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part O)
- Primary key:
report_date, borrower_id_rus, customer_class, observation_period
Usage Warnings#
Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus7__yearly_power_requirements_electric_customers.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus7__yearly_power_requirements_electric_customers.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus7__yearly_power_requirements_electric_customers.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus7__yearly_power_requirements_electric_customers.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
borrower_name_rus |
string |
The name of the RUS (Rural Utilities Service) borrower. |
customer_class |
string |
High level categorization of customer type. |
customers_num |
number |
Number of customers. |
observation_period |
string |
The date range that any given record pertains to. Ex: ‘december’ implies that this record covers the month of December only, while ‘avg’ implies this record pertains to the average of the reporting period. |
report_date |
date |
Date reported. |
state |
string |
Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands. |
out_rus7__yearly_power_requirements_electric_sales#
Noticed any irregularities with this table? File a report.
Annual time series of power requirements - revenue and energy sold by customer type.
- Most-recent data:
2021
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part O)
- Primary key:
report_date, borrower_id_rus, customer_class
Usage Warnings#
Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus7__yearly_power_requirements_electric_sales.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus7__yearly_power_requirements_electric_sales.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus7__yearly_power_requirements_electric_sales.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus7__yearly_power_requirements_electric_sales.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
borrower_name_rus |
string |
The name of the RUS (Rural Utilities Service) borrower. |
customer_class |
string |
High level categorization of customer type. |
report_date |
date |
Date reported. |
revenue |
number |
Amount of revenue. |
sales_mwh |
number |
Quantity of electricity sold in MWh. |
state |
string |
Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands. |
out_rus7__yearly_service_interruptions#
Noticed any irregularities with this table? File a report.
Annual time series of service interruptions by cause.
- Most-recent data:
2021
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part G)
- Primary key:
report_date, borrower_id_rus, service_interruption_cause, observation_period
Usage Warnings#
Some columns contain subtotals; use caution when choosing columns to aggregate.
Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus7__yearly_service_interruptions.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus7__yearly_service_interruptions.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus7__yearly_service_interruptions.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus7__yearly_service_interruptions.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
borrower_name_rus |
string |
The name of the RUS (Rural Utilities Service) borrower. |
is_total |
boolean |
Whether or not this record represents a total. |
observation_period |
string |
The date range that any given record pertains to. Ex: ‘december’ implies that this record covers the month of December only, while ‘avg’ implies this record pertains to the average of the reporting period. |
report_date |
date |
Date reported. |
saidi_minutes |
number |
Cumulative duration (minutes) of interruption for the average customer during the report year. |
service_interruption_cause |
string |
Source of service interruption. |
state |
string |
Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands. |
out_rus7__yearly_statement_of_operations#
Noticed any irregularities with this table? File a report.
Annual time series of opex and cost of electric service for RUS borrowers by time period.
- Most-recent data:
2021
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part A)
- Primary key:
report_date, borrower_id_rus, opex_group, opex_type
Usage Warnings#
Some columns contain subtotals; use caution when choosing columns to aggregate.
Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus7__yearly_statement_of_operations.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus7__yearly_statement_of_operations.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus7__yearly_statement_of_operations.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus7__yearly_statement_of_operations.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
borrower_name_rus |
string |
The name of the RUS (Rural Utilities Service) borrower. |
is_total |
boolean |
Whether or not this record represents a total. |
opex_group |
string |
High level section from the statement of operations table. Most of these types have subcomponents broken out in the |
opex_report_month |
number |
Amount of operational expense, cost or income during the report month. |
opex_type |
string |
Type of item from the statement of operations. |
opex_ytd |
number |
The year-to-date amount of operational expense, cost or income. |
opex_ytd_budget |
number |
The year-to-date budget for amount of operational expense, cost or income. |
report_date |
date |
Date reported. |
state |
string |
Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands. |
out_rus7__yearly_transmission_and_distribution_mileage#
Noticed any irregularities with this table? File a report.
Annual time series of miles of transmission and distribution infrastructure.
- Most-recent data:
2021
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part B)
- Primary key:
report_date, borrower_id_rus, line_type
Usage Warnings#
Some columns contain subtotals; use caution when choosing columns to aggregate.
Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.
Additional Details#
Note that according to RUS, ‘underbuild in overhead lines or joint runs in underground installations do not increase the number of line miles except for distribution underbuild on transmission poles. In such cases, distribution pole line miles would be increased by the number of underbuild miles involved.’
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus7__yearly_transmission_and_distribution_mileage.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus7__yearly_transmission_and_distribution_mileage.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus7__yearly_transmission_and_distribution_mileage.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus7__yearly_transmission_and_distribution_mileage.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
borrower_name_rus |
string |
The name of the RUS (Rural Utilities Service) borrower. |
is_total |
boolean |
Whether or not this record represents a total. |
line_type |
string |
The type of line mileage reported (e.g., transmission, overhead distribution). |
miles |
number |
Line length at the end of the reported period, in miles. |
report_date |
date |
Date reported. |
state |
string |
Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands. |
out_rus7__yearly_utility_plant_changes#
Noticed any irregularities with this table? File a report.
Annual time series of changes in utility plant
- Most-recent data:
2021
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part E)
- Primary key:
report_date, borrower_id_rus, utility_plant_group, utility_plant_item
Usage Warnings#
Some columns contain subtotals; use caution when choosing columns to aggregate.
Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus7__yearly_utility_plant_changes.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus7__yearly_utility_plant_changes.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus7__yearly_utility_plant_changes.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_rus7__yearly_utility_plant_changes.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
additions |
number |
Cost of acquisition of items classified within the account. |
adjustments_and_transfers |
number |
Amount of adjustments and transfers within a class of assets. |
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
borrower_name_rus |
string |
The name of the RUS (Rural Utilities Service) borrower. |
ending_balance |
number |
Account balance at end of year. |
is_total |
boolean |
Whether or not this record represents a total. |
report_date |
date |
Date reported. |
retirements |
number |
Cost of disposal of items classified within the account. |
state |
string |
Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands. |
utility_plant_group |
string |
High-level category of utility plant asset type. |
utility_plant_item |
string |
Sub-category of utility_plant_group describing utility plant asset item. |
out_sec10k__changelog_company_name#
Noticed any irregularities with this table? File a report.
Changelog table tracking changes in the names each SEC 10-K filer has used.
- Most-recent data:
2023
- Processing:
Data is ready for use in analyses, but for practical reasons has not been denormalized and remains in narrow format.
- Source:
U.S. Securities and Exchange Commission (SEC) Form 10-K
- Primary key:
This table has no primary key.
Usage Warnings#
PUDL does not currently update its copy of this data.
Additional Details#
We use the company name reported in association with each name change block in the
company information table to fill in the most recent value of company_name_new.
Roughly 1,000 reported “name changes” in which the old and new names were identical
have been dropped.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_sec10k__changelog_company_name.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_sec10k__changelog_company_name.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_sec10k__changelog_company_name.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_sec10k__changelog_company_name.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
central_index_key |
string |
Identifier of the company in SEC database. |
company_name_new |
string |
Name of company after name change. |
company_name_old |
string |
Name of company prior to name change. |
name_change_date |
date |
Date of last name change of the company. |
out_sec10k__parents_and_subsidiaries#
Noticed any irregularities with this table? File a report.
A denormalized table containing information about parent companies that file SEC Form 10-K and their subsidiaries, which may or may not file Form 10-K.
- Most-recent data:
2023
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
U.S. Securities and Exchange Commission (SEC) Form 10-K
- Primary key:
filename_sec10k, subsidiary_company_id_sec10k
Usage Warnings#
This data has been extracted probabilistically using a machine learning model and contains some incompletions and errors. It should not be treated as ground truth data.
PUDL does not currently update its copy of this data.
Additional Details#
Company ownership fractions are extracted from SEC 10-K Exhibit 21. Information about the companies is extracted primarily from the headers of the SEC 10-K filing. Subsidiaries that file Form 10-K will have much more information available than those that only appear as subsidiaries in Exhibit 21.
SEC 10-K filers and EIA utilities are matched using probabilistic record linkage. Exhibit 21 subsidiaries that don’t file a Form 10-K are matched to EIA utilities using the company name.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_sec10k__parents_and_subsidiaries.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_sec10k__parents_and_subsidiaries.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_sec10k__parents_and_subsidiaries.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_sec10k__parents_and_subsidiaries.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
filename_sec10k |
string |
Unique portion of the filename associated with the SEC 10-K filing in the EDGAR database. The full source URL can be reconstructed by prepending https://www.sec.gov/Archives/edgar/data/ and adding the .txt file type extension. |
filing_date |
date |
Date on which the filing was submitted. |
fraction_owned |
number |
Proportion of generator ownership attributable to this utility. |
parent_company_business_city |
string |
City where the parent company’s place of business is located. |
parent_company_business_state |
string |
State where the parent company’s place of business is located. |
parent_company_business_street_address |
string |
Street address of the parent company’s place of business. |
parent_company_business_street_address_2 |
string |
Second line of the street address of the parent company’s place of business. |
parent_company_business_zip_code |
string |
Zip code of the parent company’s place of business. |
parent_company_business_zip_code_4 |
string |
Zip code suffix of the company’s place of business. |
parent_company_central_index_key |
string |
Central index key (CIK) of the parent company. |
parent_company_incorporation_state |
string |
Two letter state code where parent company is incorporated. |
parent_company_industry_id_sic |
string |
Four-digit Standard Industrial Classification (SIC) code identifying the parent company’s primary industry. SIC codes have been replaced by NAICS codes in many applications, but are still used by the SEC. See e.g. https://www.osha.gov/data/sic-manual for code definitions. |
parent_company_industry_name_sic |
string |
Text description of the parent company’s Standard Industrial Classification (SIC) |
parent_company_mail_city |
string |
City of the parent company’s mailing address. |
parent_company_mail_state |
string |
State of the parent company’s mailing address. |
parent_company_mail_street_address |
string |
Street portion of the parent company’s mailing address. |
parent_company_mail_street_address_2 |
string |
Second line of the street portion of the parent company’s mailing address. |
parent_company_mail_zip_code |
string |
Zip code of the parent company’s mailing address. |
parent_company_mail_zip_code_4 |
string |
Zip code suffix of the parent company’s mailing address. |
parent_company_name |
string |
Name of the parent company. |
parent_company_phone_number |
string |
Phone number of the parent company. |
parent_company_taxpayer_id_irs |
string |
Taxpayer ID of the parent company with the IRS. |
parent_company_utility_id_eia |
integer |
The EIA utility ID of the parent company. |
parent_company_utility_name_eia |
string |
The EIA reported utility name of the parent company. |
report_date |
date |
Date reported. |
subsidiary_company_business_city |
string |
City where the subsidiary company’s place of business is located. |
subsidiary_company_business_state |
string |
State where the subsidiary company’s place of business is located. |
subsidiary_company_business_street_address |
string |
Street address of the subsidiary company’s place of business. |
subsidiary_company_business_street_address_2 |
string |
Second line of the street address of the subsidiary company’s place of business. |
subsidiary_company_business_zip_code |
string |
Zip code of the subsidiary company’s place of business. |
subsidiary_company_business_zip_code_4 |
string |
Zip code suffix of the subsidiary company’s place of business. |
subsidiary_company_central_index_key |
string |
Central index key (CIK) of the subsidiary company. |
subsidiary_company_id_sec10k |
string |
PUDL-assigned ID for subsidiaries found in SEC 10-K Exhibit 21. The ID is created by concatenating the CIK of the company whose filing the subsidiary was found in, the subsidiary company’s name, and location of incorporation. It is not guaranteed to be stable across different releases of PUDL and so should never be hard-coded in analyses. |
subsidiary_company_incorporation_state |
string |
Two letter state code where subisidary company is incorporated. |
subsidiary_company_industry_id_sic |
string |
Four-digit Standard Industrial Classification (SIC) code identifying the subsidiary company’s primary industry. SIC codes have been replaced by NAICS codes in many applications, but are still used by the SEC. See e.g. https://www.osha.gov/data/sic-manual for code definitions. |
subsidiary_company_industry_name_sic |
string |
Text description of the subsidiary company’s Standard Industrial Classification (SIC) |
subsidiary_company_location |
string |
Location of subsidiary company. This is the full US state name or country name and occasionally a two digit code that was not mapped to a full name during cleaning. |
subsidiary_company_mail_city |
string |
City of the subsidiary company’s mailing address. |
subsidiary_company_mail_state |
string |
State of the parent company’s mailing address. |
subsidiary_company_mail_street_address |
string |
Street portion of the subsidiary company’s mailing address. |
subsidiary_company_mail_street_address_2 |
string |
Second line of the street portion of the subsidiary company’s mailing address. |
subsidiary_company_mail_zip_code |
string |
Zip code of the subsidiary company’s mailing address. |
subsidiary_company_mail_zip_code_4 |
string |
Zip code suffix of the subsidiary company’s mailing address. |
subsidiary_company_name |
string |
Name of subsidiary company. |
subsidiary_company_phone_number |
string |
Phone number of the subsidiary company. |
subsidiary_company_taxpayer_id_irs |
string |
Taxpayer ID of the subsidiary company with the IRS. |
subsidiary_company_utility_id_eia |
integer |
The EIA utility ID of the subsidiary company. |
subsidiary_company_utility_name_eia |
string |
The EIA reported utility name of the subsidiary company. |
out_sec10k__quarterly_company_information#
Noticed any irregularities with this table? File a report.
Quarterly time series of company and filing data extracted from SEC 10-K filings.
- Most-recent data:
2023
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
U.S. Securities and Exchange Commission (SEC) Form 10-K
- Primary key:
filename_sec10k, central_index_key
Usage Warnings#
PUDL does not currently update its copy of this data.
Additional Details#
In addition to the information provided by the core_sec10k__quarterly_company_information table, this output table merges in the associated utility_id_eia (and utility name) if it is available, as well as the report and filing dates associated with the filing each record was extracted from, as well as providing a link to the source URL for the filing.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_sec10k__quarterly_company_information.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_sec10k__quarterly_company_information.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_sec10k__quarterly_company_information.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_sec10k__quarterly_company_information.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
business_city |
string |
City where the company’s place of business is located. |
business_postal_code |
string |
Non-US postal code of the company’s place of business. |
business_state |
string |
State where the company’s place of business is located. |
business_street_address |
string |
Street address of the company’s place of business. |
business_street_address_2 |
string |
Second line of the street address of the company’s place of business. |
business_zip_code |
string |
Zip code of the company’s place of business. |
business_zip_code_4 |
string |
Zip code suffix of the company’s place of business. |
central_index_key |
string |
Identifier of the company in SEC database. |
company_name |
string |
Name of the reporting company. |
filename_sec10k |
string |
Unique portion of the filename associated with the SEC 10-K filing in the EDGAR database. The full source URL can be reconstructed by prepending https://www.sec.gov/Archives/edgar/data/ and adding the .txt file type extension. |
filer_count |
integer |
A counter indicating which observation of company data within an SEC 10-K filing header the record pertains to. |
filing_date |
date |
Date on which the filing was submitted. |
filing_number_sec |
string |
Filing number used internally by the SEC commission to track filing. |
film_number |
string |
Document control number used in the SEC EDGAR database. The first four digits can be used to access scans of the document in the SEC’s Virtual Private Reference Room. |
fiscal_year_end |
string |
The end date of an SEC filing company’s fiscal year, in MMDD format. |
incorporation_state |
string |
Two letter state code where company is incorporated. |
industry_group_sic |
string |
A higher level industry category defined within the Standard Industrial Classification (SIC) system. |
industry_id_sic |
string |
Four-digit Standard Industrial Classification (SIC) code identifying the company’s primary industry. SIC codes have been replaced by NAICS codes in many applications, but are still used by the SEC. See e.g. https://www.osha.gov/data/sic-manual for code definitions. |
industry_name_sic |
string |
Text description of Standard Industrial Classification (SIC) |
mail_city |
string |
City of the company’s mailing address. |
mail_postal_code |
string |
Non-US postal code of the company’s mailing address. |
mail_state |
string |
State of the company’s mailing address. |
mail_street_address |
string |
Street portion of the company’s mailing address. |
mail_street_address_2 |
string |
Second line of the street portion of the company’s mailing address. |
mail_zip_code |
string |
Zip code of the company’s mailing address. |
mail_zip_code_4 |
string |
Zip code suffix of the company’s mailing address. |
phone_number |
string |
Phone number for utility contact 1. |
report_date |
date |
Date reported. |
sec10k_type |
string |
Specific version of SEC 10-K that was filed. 10-k: the standard annual report. 10-k/a: an amended version of the annual report. 10-k405: filed to report insider trading that was not reported in a timely fashion. 10-k405/a: an amended version of the 10-k405. 10-kt: submitted in lieu of or in addition to a standard 10-K annual report when a company changes the end of its fiscal year (e.g. due to a merger) leaving the company with a longer or shorter reporting period. 10-kt/a: an amended version of the 10-kt. 10-ksb: the annual report for small businesses, also known as penny stocks. 10-ksb/a: an amended version of the 10-ksb. |
sec_act |
string |
SEC Act through which the form was enacted, e.g. 1934 act. |
source_url |
string |
URL pointing to the original source of the data in the record. |
taxpayer_id_irs |
string |
Taxpayer ID of the company with the IRS. |
utility_id_eia |
integer |
The EIA Utility Identification number. |
utility_name_eia |
string |
The name of the utility. |
out_sec10k__quarterly_filings#
Noticed any irregularities with this table? File a report.
Quarterly time series of metadata describing SEC 10-K filings.
- Most-recent data:
2023
- Processing:
Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.
- Source:
U.S. Securities and Exchange Commission (SEC) Form 10-K
- Primary key:
filename_sec10k
Usage Warnings#
PUDL does not currently update its copy of this data.
Additional Details#
Each SEC 10-K filing is submitted by a single company, but may contain information about numerous other companies. This table indicates the company submitting the filing, as well as some information about the overall filing. Each filing is guaranteed to have a unique filename, but ~1% of all filings are one company submitting the same form multiple times on the same day, so the filename is the only available natural primary key. This output table adds a link to the source URL for the filing, which is constructed from the filename.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_sec10k__quarterly_filings.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/out_sec10k__quarterly_filings.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_sec10k__quarterly_filings.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/out_sec10k__quarterly_filings.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
central_index_key |
string |
Identifier of the company in SEC database. |
company_name |
string |
Name of the reporting company. |
exhibit_21_version |
string |
Version of exhibit 21 submitted (if applicable). |
filename_sec10k |
string |
Unique portion of the filename associated with the SEC 10-K filing in the EDGAR database. The full source URL can be reconstructed by prepending https://www.sec.gov/Archives/edgar/data/ and adding the .txt file type extension. |
filing_date |
date |
Date on which the filing was submitted. |
report_date |
date |
Date reported. |
sec10k_type |
string |
Specific version of SEC 10-K that was filed. 10-k: the standard annual report. 10-k/a: an amended version of the annual report. 10-k405: filed to report insider trading that was not reported in a timely fashion. 10-k405/a: an amended version of the 10-k405. 10-kt: submitted in lieu of or in addition to a standard 10-K annual report when a company changes the end of its fiscal year (e.g. due to a merger) leaving the company with a longer or shorter reporting period. 10-kt/a: an amended version of the 10-kt. 10-ksb: the annual report for small businesses, also known as penny stocks. 10-ksb/a: an amended version of the 10-ksb. |
source_url |
string |
URL pointing to the original source of the data in the record. |
out_vcerare__hourly_available_capacity_factor#
Noticed any irregularities with this table? File a report.
Hourly time series of estimated county-averaged capacity factors for wind and solar generating facilities across the contiguous United States (US), to be used as a tool and input for resource adequacy modeling and planning.
- Most-recent data:
2023
- Processing:
Data is ready for use in analyses, but for practical reasons has not been denormalized and remains in narrow format.
- Source:
Vibrant Clean Energy Resource Adequacy Renewable Energy (RARE) Power Dataset
- Primary key:
state, place_name, datetime_utc
Usage Warnings#
The hourly capacity factors are normalized to unity for maximal power output. To convert to units of power, the user must multiply by the installed capacity within the county.
Hourly capacity factors are spatially averaged across each county over the contiguous USA. There are a handful of counties that are too small to pick up representation on the HRRR operational forecast grid. As such, these counties will have no wind or solar power production curves.
Due to power production performance being correlated with panel temperatures, during cold sunny periods, some solar capacity factor values are greater than 1 (but less that 1.1).
Additional Details#
The data in this table were produced by Vibrant Clean Energy, and are licensed to the public under the Creative Commons Attribution 4.0 International license (CC-BY-4.0).
The technologies provided are:
Onshore wind assuming a 100m hub height and 120m rotor diameter;
Offshore wind assuming a 140m hub height and 120m rotor diameter;
Utility solar assuming a fixed axis panel tilted at latitude.
The foundation of the capacity factors provided here is the NOAA HRRR operational numerical weather prediction model. The HRRR covers the entire contiguous US at a horizontal resolution of 3 km. Forecasts are initialized each hour of the year. Forecast hour two (2) is used as the input data for the power algorithms. This forecast hour is chosen to trade-off the impact of the measurement and data assimilation procedure of the HRRR with the physics of the model to derive the most complete picture of the atmosphere at the forecast time horizon.
For wind capacity factors: vertical slices of the atmosphere are considered across the defined rotor swept area. Bringing together wind speed, density, temperature and icing information, a power capacity is estimated using a representative power coefficient (Cp) curve to determine the power from a given wind speed, atmospheric density and temperature. There is no wake modeling included in the dataset.
For solar capacity factors: pertinent surface weather variables are pulled such as incoming short wave radiation, direct normal irradiance (calculated in the HRRR 2016 forward), surface temperature and other parameters. These are used in a non-linear I-V curve translation to power capacity factors.
Access methods#
Warning
This table will likely not fit in your computer’s memory.
Make sure to filter or aggregate to reduce the amount of data read in.
This is easier in polars and duckdb than in pandas,
so we only provide examples for these.
import polars as pl
df = (
# Use scan_parquet (lazy evaluation) to plan the query before execution
pl.scan_parquet(
"s3://pudl.catalyst.coop/nightly/out_vcerare__hourly_available_capacity_factor.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
# Select only a subset of columns to reduce network and memory usage
.select(
[
"report_year",
"datetime_utc",
"hour_of_year",
"county_id_fips",
"state",
"place_name",
"capacity_factor_solar_pv",
]
)
.filter(
# Select data for only a single year and state to reduce network and memory usage
(pl.col("report_year") == 2022)
& (pl.col("state") == "CO")
)
.collect()
)
SELECT
-- Select only a subset of columns to reduce network and memory usage
report_year,
datetime_utc,
hour_of_year,
county_id_fips,
state,
place_name,
capacity_factor_solar_pv
FROM 's3://pudl.catalyst.coop/nightly/out_vcerare__hourly_available_capacity_factor.parquet';
WHERE
-- Select data for only a single year and state to reduce network and memory usage
report_year = 2022
AND state = "CO";
Columns#
Field Name |
Type |
Description |
|---|---|---|
capacity_factor_offshore_wind |
number |
Estimated capacity factor (0-1) calculated for offshore wind assuming a 140m hub height and 120m rotor diameter.Based on outputs from the NOAA HRRR operational numerical weather prediction model. Capacity factors are normalized to unity for maximal power output. Vertical slices of the atmosphere are considered across the defined rotor swept area. Bringing together wind speed, density, temperature and icing information, a power capacity is estimated using a representative power coefficient (Cp) curve to determine the power from a given wind speed, atmospheric density and temperature. There is no wake modeling included in the dataset. |
capacity_factor_onshore_wind |
number |
Estimated capacity factor (0-1) calculated for onshore wind assuming a 100m hub height and 120m rotor diameter.Based on outputs from the NOAA HRRR operational numerical weather prediction model. Capacity factors are normalized to unity for maximal power output. Vertical slices of the atmosphere are considered across the defined rotor swept area. Bringing together wind speed, density, temperature and icing information, a power capacity is estimated using a representative power coefficient (Cp) curve to determine the power from a given wind speed, atmospheric density and temperature. There is no wake modeling included in the dataset. |
capacity_factor_solar_pv |
number |
Estimated capacity factor (0-1) calculated for solar PV assuming a fixed axis panel tilted at latitude and DC power outputs. Due to power production performance being correlated with panel temperatures, during cold sunny periods, some solar capacity factor values are greater than 1 (but less that 1.1).All values are based on outputs from the NOAA HRRR operational numerical weather prediction model. Capacity factors are normalized to unity for maximal power output. Pertinent surface weather variables are pulled such as incoming short wave radiation, direct normal irradiance (calculated in the HRRR 2016 forward), surface temperature and other parameters. These are used in a non-linear I-V curve translation to power capacity factors. |
county_id_fips |
string |
County ID from the Federal Information Processing Standard Publication 6-4. |
datetime_utc |
datetime |
Date and time converted to Coordinated Universal Time (UTC). |
hour_of_year |
integer |
Integer between 1 and 8670 representing the hour in a given year. |
latitude |
number |
Latitude of the place centroid (e.g., county centroid). |
longitude |
number |
Longitude of the place centroid (e.g., county centroid). |
place_name |
string |
County or lake name, sourced from the latest Census PEP vintage based on county FIPS ID. Lake names originate from VCE RARE directly, and may also appear several times–once for each state it touches. FIPS ID values for lakes have been nulled. |
report_year |
integer |
Four-digit year in which the data was reported. |
state |
string |
Two letter US state abbreviation. |
core_eia176__yearly_gas_disposition#
Noticed any irregularities with this table? File a report.
Annual time series of a company’s natural and supplemental gas disposition for the report state.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA Form 176 – Annual Report of Natural and Supplemental Gas Supply and Disposition (Part 6, Lines 9, 12-20)
- Primary key:
operator_id_eia, report_year
Additional Details#
The deliveries_out_of_state_volume_mcf (Line 14.0) are reported as one aggregated volume,
calculated by summing the original granular data. Similarly, disposition_to_other_volume_mcf
(Line 18.4) is summed from the original granular data and reported as one aggregate field.
The delivered_gas_heat_content_mmbtu_per_mcf is expected to be between 0.8 and 1.2
by the EIA. We find that less than 0.5 percent of data falls outside of these expected bounds.
The total_disposition_mcf field includes disposition to consumers which is reported in core_eia176__yearly_gas_disposition_by_consumer.
Note that the reported total disposition and the sum of values in this table and core_eia176__yearly_gas_disposition_by_consumer often don’t match as would be expected.
Thus, we have preserved both the total field and the sub-components in these two tables.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia176__yearly_gas_disposition.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia176__yearly_gas_disposition.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia176__yearly_gas_disposition.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia176__yearly_gas_disposition.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
delivered_gas_heat_content_mmbtu_per_mcf |
number |
The average annual heat content of gas delivered directly to consumers. |
disposition_distribution_companies_mcf |
number |
Disposition to distribution companies within the report state that does not fall into one of the other reported categories in lines 10.1-17.0. Reference conditions for measurement are 14.73 psia and 60° Fahrenheit. |
disposition_other_pipelines_mcf |
number |
Disposition to other pipelines within the report state that does not fall into one of the other reported categories in lines 10.1-17.0. Reference conditions for measurement are 14.73 psia and 60° Fahrenheit. |
disposition_out_of_state_mcf |
number |
Total volume of the operator’s deliveries across or to state lines or U.S. borders. This has been summed from the detailed data reported by each company on Line 14.0 of the original form in order to preserve the primary key of the table. Reference conditions for measurement are 14.73 psia and 60° Fahrenheit. |
disposition_storage_operators_mcf |
number |
Disposition to storage operators within the report state that does not fall into one of the other reported categories in lines 10.1-17.0. Reference conditions for measurement are 14.73 psia and 60° Fahrenheit. |
losses_mcf |
number |
Known loss volumes as a result of leaks, damage, accidents, migration and blow down within the report state where the events took place, as well as estimated losses from leaks occurring during distribution activities. Reference conditions for measurement are 14.73 psia and 60° Fahrenheit. |
operating_state |
string |
State that the distribution utility is reporting for. Prior to 2004, this may be a list of states. |
operational_consumption_compressors_mcf |
number |
Volume of natural gas consumed as fuel for the operator’s own pipeline distribution or storage compressor use within the report state. Reference conditions for measurement are 14.73 psia and 60° Fahrenheit. |
operational_consumption_facility_space_heat_mcf |
number |
Volume of natural gas consumed as fuel for space heat of the operator’s own facilities within the report state. Reference conditions for measurement are 14.73 psia and 60° Fahrenheit. |
operational_consumption_lng_vaporization_liquefaction_mcf |
number |
Volume of natural gas consumed by the operator for vaporization, liquefaction, and LNG fuel within the report state. Reference conditions for measurement are 14.73 psia and 60° Fahrenheit. |
operational_consumption_new_pipeline_fill_mcf |
number |
Volume of natural gas consumed as fuel for the operator’s new pipeline fill within the report state. Reference conditions for measurement are 14.73 psia and 60° Fahrenheit. |
operational_consumption_other_detail |
string |
Free-text detail describing the operator’s specified ‘other purposes’ for operational natural-gas consumption, corresponding to the volume reported in operational_consumption_other_mcf. |
operational_consumption_other_mcf |
number |
Volume of natural gas consumed as fuel by the operator for other purposes within the report state. Reference conditions for measurement are 14.73 psia and 60° Fahrenheit. |
operational_consumption_vehicle_fuel_mcf |
number |
Volume of natural gas used in the operator’s company-owned fleet within the report state. Reference conditions for measurement are 14.73 psia and 60° Fahrenheit. |
operational_lng_storage_injections_mcf |
number |
The total volume added to LNG storage operated by the company within the report state, regardless of ownership of the gas. For LNG import and export marine terminals, this should exclude injections of LNG during the course of routine operations for handling imports, as well as LNG held in storage for future use. Reference conditions for measurement are 14.73 psia and 60° Fahrenheit. |
operational_storage_underground_mcf |
number |
The total volume added to underground storage operated by the company within the report state, regardless of ownership of the gas. This includes new fields. Reference conditions for measurement are 14.73 psia and 60° Fahrenheit. |
operator_id_eia |
string |
The unique EIA identifier for an operator in a given state. The last two letters of the ID indicate the state. |
other_disposition_all_other_mcf |
number |
Other disposition within the report state that does not fall into one of the other reported categories in lines 10.1-17.0. This has been summed from the detailed data reported by each company on Line 18.4 of the original form in order to preserve the primary key of the table. Reference conditions for measurement are 14.73 psia and 60° Fahrenheit. |
producer_lease_use_mcf |
number |
The total volume of gas used in the company’s well, field and lease operations. This field should only be reported by producers. Reference conditions for measurement are 14.73 psia and 60° Fahrenheit. |
producer_returned_for_repressuring_reinjection_mcf |
number |
The volume of gas delivered directly from the operator’s system to oil or gas fields located within the report state for repressuring, pressure maintenance, and cycling operations. This field should only be reported by producers. Reference conditions for measurement are 14.73 psia and 60° Fahrenheit. |
report_year |
integer |
Four-digit year in which the data was reported. |
total_disposition_mcf |
number |
Total disposition within the report state, as reported by the operator. This includes disposition to consumers which is reported in core_eia176__yearly_gas_disposition_by_consumer. Note that the reported total disposition and the sum of values in this table and core_eia176__yearly_gas_disposition_by_consumer often don’t match as it would be expected. Reference conditions for measurement are 14.73 psia and 60° Fahrenheit. |
unaccounted_for_mcf |
number |
The difference between gas supply and disposition. A positive entry indicates supply in excess of accounted-for disposition. A negative entry indicates accounted-for disposition exceeds reported supply. This is calculated as the difference between Part 4 Line 7.0 and Part 6 Line 19.0, and is reported as Line 20.0 in the original form. |
core_eia176__yearly_gas_disposition_by_consumer#
Noticed any irregularities with this table? File a report.
Annual time series of a company’s natural gas deliveries to end-use consumers within the report state.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA Form 176 – Annual Report of Natural and Supplemental Gas Supply and Disposition (Part 6, Lines 10.0-11.6)
- Primary key:
report_year, operator_id_eia, customer_class, revenue_class
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia176__yearly_gas_disposition_by_consumer.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia176__yearly_gas_disposition_by_consumer.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia176__yearly_gas_disposition_by_consumer.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia176__yearly_gas_disposition_by_consumer.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
consumers |
integer |
Number of end-use consumers within the report state. |
customer_class |
string |
High level categorization of customer type (e.g., commercial, residential). |
operating_state |
string |
State that the operator is reporting for. |
operator_id_eia |
string |
The unique EIA identifier for an operator in a given state. The last two letters of the ID indicate the state. |
report_year |
integer |
Four-digit year in which the data was reported. |
revenue |
number |
Revenue including taxes, rounded to the nearest whole dollar. |
revenue_class |
string |
Source of revenue: whether revenue originates from gas owned directly by the operator (sales) or gas transported by the operator (transport). |
volume_mcf |
number |
Total volume of natural gas deliveries in the report state. Reference conditions for measurement are 14.73 psia and 60° Fahrenheit. |
core_eia191__monthly_gas_storage#
Noticed any irregularities with this table? File a report.
Monthly time series of monthly underground natural gas storage activity reported by operators.
- Most-recent data:
2025
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA Form 191 – Monthly Underground Natural Gas Storage Report
- Primary key:
storage_field_id_eia, report_date
Usage Warnings#
total_field_capacity_mcfis not reliably equal to the sum ofworking_gas_capacity_mcfandbase_gas_mcf: approximately 23% of records differ. This reflects loose EIA definitions and operator self-reporting practices, not data errors. Do not assume additivity among these three fields.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia191__monthly_gas_storage.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia191__monthly_gas_storage.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia191__monthly_gas_storage.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia191__monthly_gas_storage.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
base_gas_mcf |
number |
Volume of base gas (cushion gas) in the underground storage reservoir. Base gas is the volume of gas intended as permanent inventory in a reservoir to maintain adequate pressure and deliverability rates. |
company_name |
string |
Name of the reporting company. |
county |
string |
County name. |
field_name |
string |
Name of the natural gas storage field. |
field_type |
string |
Type of underground natural gas storage facility. One of: depleted field (a reservoir previously used for natural gas production), aquifer (a porous rock structure used to store gas), or salt dome (a cavern created within a salt formation). |
gas_field_id_eia |
integer |
EIA-assigned numeric identifier for a natural gas storage field. |
maximum_daily_delivery_mcf |
number |
Maximum daily quantity of gas that can be withdrawn from the underground storage reservoir under normal operating conditions. |
operational_status |
string |
Operational status of the underground natural gas storage reservoir. |
region |
string |
EIA storage region in which the underground natural gas storage field is located. |
report_date |
date |
Date reported. |
reservoir_id_eia |
integer |
EIA-assigned numeric identifier for the individual storage reservoir within a natural gas storage field. |
reservoir_name |
string |
Name of the individual underground natural gas storage reservoir. |
state |
string |
Two letter US state abbreviation. |
storage_field_id_eia |
string |
EIA-assigned identifier for an underground natural gas storage reservoir. IDs are assigned per company and state, so the same physical reservoir may have different IDs if ownership changes. |
total_field_capacity_mcf |
number |
Total design capacity of the underground natural gas storage field (base gas plus working gas). |
working_gas_capacity_mcf |
number |
Volume of working gas (the portion of stored gas available for withdrawal) in the underground storage reservoir. |
core_eia860__assn_boiler_cooling#
Noticed any irregularities with this table? File a report.
Association table providing connections between EIA boiler IDs and EIA cooling system IDs.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA Form 860 – Annual Electric Generator Report
- Primary key:
report_date, plant_id_eia, boiler_id, cooling_id_eia
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia860__assn_boiler_cooling.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia860__assn_boiler_cooling.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia860__assn_boiler_cooling.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia860__assn_boiler_cooling.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
boiler_id |
string |
Alphanumeric boiler ID. |
cooling_id_eia |
string |
EIA Identification code for cooling system (if multiple cooling systems are not distinguished by separate IDs, the word ‘PLANT’ is listed to encompass the cooling system for the entire plant) |
data_maturity |
string |
Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity. |
plant_id_eia |
integer |
The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration. |
report_date |
date |
Date reported. |
core_eia860__assn_boiler_generator#
Noticed any irregularities with this table? File a report.
Association table providing connections between boilers and generators.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA Form 860 – Annual Electric Generator Report (Schedule 6, Part A)
- Primary key:
plant_id_eia, report_date, generator_id, boiler_id
Usage Warnings#
Contains information from multiple raw inputs.
Additional Details#
Augmented with various heuristics within PUDL.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia860__assn_boiler_generator.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia860__assn_boiler_generator.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia860__assn_boiler_generator.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia860__assn_boiler_generator.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
bga_source |
string |
The source from where the unit_id_pudl is compiled. The unit_id_pudl comes directly from EIA 860, or string association (which looks at all the boilers and generators that are not associated with a unit and tries to find a matching string in the respective collection of boilers or generator), or from a unit connection (where the unit_id_eia is employed to find additional boiler generator connections). |
boiler_generator_assn_type_code |
string |
Indicates whether boiler associations with generator during the year were actual or theoretical. Only available before 2013. |
boiler_id |
string |
Alphanumeric boiler ID. |
data_maturity |
string |
Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity. |
generator_id |
string |
Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string! |
plant_id_eia |
integer |
The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration. |
report_date |
date |
Date reported. |
steam_plant_type_code |
integer |
Code that describes types of steam plants from EIA 860. See steam_plant_types_eia table for more details. |
unit_id_eia |
string |
EIA-assigned unit identification code. |
unit_id_pudl |
integer |
Dynamically assigned PUDL unit id. WARNING: This ID is not guaranteed to be static long term as the input data and algorithm may evolve over time. |
core_eia860__assn_boiler_stack_flue#
Noticed any irregularities with this table? File a report.
Association table providing connections between EIA boiler IDs and EIA stack and/or flue system IDs.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA Form 860 – Annual Electric Generator Report
- Primary key:
report_date, plant_id_eia, boiler_id, stack_flue_id_pudl
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia860__assn_boiler_stack_flue.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia860__assn_boiler_stack_flue.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia860__assn_boiler_stack_flue.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia860__assn_boiler_stack_flue.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
boiler_id |
string |
Alphanumeric boiler ID. |
flue_id_eia |
string |
The flue identification value reported to EIA. The flue is a duct, pipe, or opening that transports exhast gases through the stack. This field was reported in conjunction with stack_id_eia until 2013 when stack_flue_id_eia took their place. |
plant_id_eia |
integer |
The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration. |
report_date |
date |
Date reported. |
stack_flue_id_eia |
string |
The stack or flue identification value reported to EIA. This denotes the place where emissions from the combustion process are released into the atmosphere. Prior to 2013, this was reported as stack_id_eia and flue_id_eia. |
stack_flue_id_pudl |
string |
A stack and/or flue identification value created by PUDL for use as part of the primary key for the stack flue equipment and boiler association tables. For 2013 and onward, this value is equal to the value for stack_flue_id_eia. Prior to 2013, this value is equal to the value for stack_id_eia and the value for flue_id_eia separated by an underscore or just the stack_flue_eia in cases where flue_id_eia is NA. |
stack_id_eia |
string |
The stack identification value reported to EIA. Stacks or chimneys are the place where emissions from the combustion process are released into the atmosphere. This field was reported in conjunction with flue_id_eia until 2013 when stack_flue_id_eia took their place. |
core_eia860__assn_yearly_boiler_emissions_control_equipment#
Noticed any irregularities with this table? File a report.
Association table providing connections between EIA boiler IDs and emissions control IDs for NOx, SO2, mercury, and particulate monitoring.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA Form 860 – Annual Electric Generator Report
- Primary key:
report_date, plant_id_eia, boiler_id, emission_control_id_type, emission_control_id_eia
Usage Warnings#
The relationship between the IDs is sometimes many to many.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia860__assn_yearly_boiler_emissions_control_equipment.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia860__assn_yearly_boiler_emissions_control_equipment.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia860__assn_yearly_boiler_emissions_control_equipment.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia860__assn_yearly_boiler_emissions_control_equipment.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
boiler_id |
string |
Alphanumeric boiler ID. |
data_maturity |
string |
Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity. |
emission_control_id_eia |
string |
The emission control ID used to collect SO2, NOx, particulate, and mercury emissions data. This column should be used in conjunction with emissions_control_type as it’s not guaranteed to be unique. |
emission_control_id_type |
string |
The type of emissions control id: SO2, NOx, particulate, or mercury. |
plant_id_eia |
integer |
The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration. |
report_date |
date |
Date reported. |
core_eia860__scd_boilers#
Noticed any irregularities with this table? File a report.
Slowly changing dimension (SCD) table describing attributes of boilers which may vary from year to year. Compiled from across all EIA-860 data.
- Most-recent data:
2025
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA Form 860 – Annual Electric Generator Report
- Primary key:
plant_id_eia, boiler_id, report_date
Usage Warnings#
Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.
Contains information from multiple raw inputs.
Additional Details#
This is one of two tables where canonical values for boilers are set. It contains values which are expected to vary slowly, while core_eia__entity_boilers contains those expected to remain fixed. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process. All tables downstream of this one inherit the canonical values established here.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia860__scd_boilers.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia860__scd_boilers.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia860__scd_boilers.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia860__scd_boilers.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
air_flow_100pct_load_cubic_feet_per_minute |
number |
Total air flow including excess air at 100 percent load, reported at standard temperature and pressure (i.e. 68 F and one atmosphere pressure). |
boiler_fuel_code_1 |
string |
The code representing the most predominant type of energy that fuels the boiler. |
boiler_fuel_code_2 |
string |
The code representing the second most predominant type of energy that fuels the boiler. |
boiler_fuel_code_3 |
string |
The code representing the third most predominant type of energy that fuels the boiler. |
boiler_fuel_code_4 |
string |
The code representing the fourth most predominant type of energy that fuels the boiler. |
boiler_id |
string |
Alphanumeric boiler ID. |
boiler_operating_date |
date |
Date the boiler began or is planned to begin commercial operation. |
boiler_retirement_date |
date |
Date of the scheduled or effected retirement of the boiler. |
boiler_status |
string |
EIA short code identifying boiler operational status. |
boiler_type |
string |
EIA short code indicating the standards under which the boiler is operating as described in the U.S. EPA regulation under 40 CFR. |
compliance_year_mercury |
integer |
Year boiler was or is expected to be in compliance with federal, state and/or local regulations for mercury emissions. |
compliance_year_nox |
integer |
Year boiler was or is expected to be in compliance with federal, state and/or local regulations for nitrogen oxide emissions. |
compliance_year_particulate |
integer |
Year boiler was or is expected to be in compliance with federal, state and/or local regulations for particulate matter emissions. |
compliance_year_so2 |
integer |
Year boiler was or is expected to be in compliance with federal, state and/or local regulations for sulfur dioxide emissions. |
data_maturity |
string |
Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity. |
efficiency_100pct_load |
number |
Boiler efficiency percentage when burning at 100 percent load to the nearest 0.1 percent. |
efficiency_50pct_load |
number |
Boiler efficiency percentage when burning at 50 percent load to the nearest 0.1 percent. |
firing_rate_using_coal_tons_per_hour |
number |
Design firing rate at maximum continuous steam flow for coal to the nearest 0.1 ton per hour. |
firing_rate_using_gas_mcf_per_hour |
number |
Design firing rate at maximum continuous steam flow for gas to the nearest 0.1 cubic feet per hour. |
firing_rate_using_oil_bbls_per_hour |
number |
Design firing rate at maximum continuous steam flow for pet coke to the nearest 0.1 barrels per hour. |
firing_rate_using_other_fuels |
number |
Design firing rate at maximum continuous steam flow for energy sources other than coal, petroleum, or natural gas. |
firing_type_1 |
string |
EIA short code indicating the type of firing used by this boiler. |
firing_type_2 |
string |
EIA short code indicating the type of firing used by this boiler. |
firing_type_3 |
string |
EIA short code indicating the type of firing used by this boiler. |
fly_ash_reinjection |
boolean |
Indicates whether the boiler is capable of re-injecting fly ash. |
hrsg |
boolean |
indicates if the boiler is a heat recovery steam generator (HRSG). |
max_steam_flow_lbs_per_hour |
number |
Maximum continuous steam flow at 100 percent load. |
mercury_control_existing_strategy_1 |
string |
Existing strategy to comply with the most stringent mercury regulation. |
mercury_control_existing_strategy_2 |
string |
Existing strategy to comply with the most stringent mercury regulation. |
mercury_control_existing_strategy_3 |
string |
Existing strategy to comply with the most stringent mercury regulation. |
mercury_control_existing_strategy_4 |
string |
Existing strategy to comply with the most stringent mercury regulation. |
mercury_control_existing_strategy_5 |
string |
Existing strategy to comply with the most stringent mercury regulation. |
mercury_control_existing_strategy_6 |
string |
Existing strategy to comply with the most stringent mercury regulation. |
mercury_control_proposed_strategy_1 |
string |
Proposed strategy to comply with the most stringent mercury regulation. |
mercury_control_proposed_strategy_2 |
string |
Proposed strategy to comply with the most stringent mercury regulation. |
mercury_control_proposed_strategy_3 |
string |
Proposed strategy to comply with the most stringent mercury regulation. |
new_source_review |
boolean |
Indicates whether the boiler is subject to New Source Review requirements. |
new_source_review_date |
date |
Month of issued New Source Review permit. |
new_source_review_permit |
string |
New Source Review permit number. |
nox_control_existing_caaa_compliance_strategy_1 |
string |
Existing strategies to meet the nitrogen oxide requirements of Title IV of the Clean Air Act Amendment of 1990. |
nox_control_existing_caaa_compliance_strategy_2 |
string |
Existing strategies to meet the nitrogen oxide requirements of Title IV of the Clean Air Act Amendment of 1990. |
nox_control_existing_caaa_compliance_strategy_3 |
string |
Existing strategies to meet the nitrogen oxide requirements of Title IV of the Clean Air Act Amendment of 1990. |
nox_control_existing_strategy_1 |
string |
Existing strategy to comply with the most stringent nitrogen oxide regulation. |
nox_control_existing_strategy_2 |
string |
Existing strategy to comply with the most stringent nitrogen oxide regulation. |
nox_control_existing_strategy_3 |
string |
Existing strategy to comply with the most stringent nitrogen oxide regulation. |
nox_control_manufacturer |
string |
Name of nitrogen oxide control manufacturer. |
nox_control_manufacturer_code |
string |
Code indicating the nitrogen oxide control burner manufacturer. |
nox_control_out_of_compliance_strategy_1 |
string |
If boiler is not in compliance with nitrogen oxide regulations, strategy for compliance. |
nox_control_out_of_compliance_strategy_2 |
string |
If boiler is not in compliance with nitrogen oxide regulations, strategy for compliance. |
nox_control_out_of_compliance_strategy_3 |
string |
If boiler is not in compliance with nitrogen oxide regulations, strategy for compliance. |
nox_control_planned_caaa_compliance_strategy_1 |
string |
Planned strategies to meet the nitrogen oxide requirements of Title IV of the Clean Air Act Amendment of 1990. |
nox_control_planned_caaa_compliance_strategy_2 |
string |
Planned strategies to meet the nitrogen oxide requirements of Title IV of the Clean Air Act Amendment of 1990. |
nox_control_planned_caaa_compliance_strategy_3 |
string |
Planned strategies to meet the nitrogen oxide requirements of Title IV of the Clean Air Act Amendment of 1990. |
nox_control_proposed_strategy_1 |
string |
Proposed strategy to comply with the most stringent nitrogen oxide regulation. |
nox_control_proposed_strategy_2 |
string |
Proposed strategy to comply with the most stringent nitrogen oxide regulation. |
nox_control_proposed_strategy_3 |
string |
Proposed strategy to comply with the most stringent nitrogen oxide regulation. |
nox_control_status_code |
string |
Nitrogen oxide control status code. |
particulate_control_out_of_compliance_strategy_1 |
string |
If boiler is not in compliance with particulate matter regulations, strategy for compliance. |
particulate_control_out_of_compliance_strategy_2 |
string |
If boiler is not in compliance with particulate matter regulations, strategy for compliance. |
particulate_control_out_of_compliance_strategy_3 |
string |
If boiler is not in compliance with particulate matter regulations, strategy for compliance. |
plant_id_eia |
integer |
The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration. |
regulation_mercury |
string |
Most stringent type of statute or regulation code under which the boiler is operating for mercury control standards. |
regulation_nox |
string |
EIA short code for most stringent type of statute or regulation code under which the boiler is operating for nitrogen oxide control standards. |
regulation_particulate |
string |
EIA short code for most stringent type of statute or regulation code under which the boiler is operating for particulate matter control standards. |
regulation_so2 |
string |
EIA short code for most stringent type of statute or regulation code under which the boiler is operating for sulfur dioxide control standards. |
report_date |
date |
Date reported. |
so2_control_existing_caaa_compliance_strategy_1 |
string |
Existing strategies to meet the sulfur dioxide requirements of Title IV of the Clean Air Act Amendment of 1990. |
so2_control_existing_caaa_compliance_strategy_2 |
string |
Existing strategies to meet the sulfur dioxide requirements of Title IV of the Clean Air Act Amendment of 1990. |
so2_control_existing_caaa_compliance_strategy_3 |
string |
Existing strategies to meet the sulfur dioxide requirements of Title IV of the Clean Air Act Amendment of 1990. |
so2_control_existing_strategy_1 |
string |
Existing strategy to comply with the most stringent sulfur dioxide regulation. |
so2_control_existing_strategy_2 |
string |
Existing strategy to comply with the most stringent sulfur dioxide regulation. |
so2_control_existing_strategy_3 |
string |
Existing strategy to comply with the most stringent sulfur dioxide regulation. |
so2_control_out_of_compliance_strategy_1 |
string |
If boiler is not in compliance with sulfur dioxide regulations, strategy for compliance. |
so2_control_out_of_compliance_strategy_2 |
string |
If boiler is not in compliance with sulfur dioxide regulations, strategy for compliance. |
so2_control_out_of_compliance_strategy_3 |
string |
If boiler is not in compliance with sulfur dioxide regulations, strategy for compliance. |
so2_control_planned_caaa_compliance_strategy_1 |
string |
Planned strategies to meet the sulfur dioxide requirements of Title IV of the Clean Air Act Amendment of 1990. |
so2_control_planned_caaa_compliance_strategy_2 |
string |
Planned strategies to meet the sulfur dioxide requirements of Title IV of the Clean Air Act Amendment of 1990. |
so2_control_planned_caaa_compliance_strategy_3 |
string |
Planned strategies to meet the sulfur dioxide requirements of Title IV of the Clean Air Act Amendment of 1990. |
so2_control_proposed_strategy_1 |
string |
Proposed strategy to comply with the most stringent sulfur dioxide regulation. |
so2_control_proposed_strategy_2 |
string |
Proposed strategy to comply with the most stringent sulfur dioxide regulation. |
so2_control_proposed_strategy_3 |
string |
Proposed strategy to comply with the most stringent sulfur dioxide regulation. |
standard_nox_rate |
number |
Numeric value for the unit of measurement specified for nitrogen oxide. |
standard_particulate_rate |
number |
Numeric value for the unit of measurement specified for particulate matter. |
standard_so2_percent_scrubbed |
number |
The percent of sulfur dioxide to be scrubbed specified by the most stringent sulfur dioxide regulation. |
standard_so2_rate |
number |
Numeric value for the unit of measurement specified for sulfur dioxide. |
turndown_ratio |
number |
The turndown ratio for the boiler. |
unit_nox |
string |
Numeric value for the unit of measurement specified for nitrogen oxide. |
unit_particulate |
string |
Numeric value for the unit of measurement specified for particulate matter. |
unit_so2 |
string |
Numeric value for the unit of measurement specified for sulfur dioxide. |
waste_heat_input_mmbtu_per_hour |
number |
Design waste-heat input rate at maximum continuous steam flow where a waste-heat boiler is a boiler that receives all or a substantial portion of its energy input from the noncumbustible exhaust gases of a separate fuel-burning process (MMBTU per hour). |
wet_dry_bottom |
string |
Wet or Dry Bottom where Wet Bottom is defined as slag tanks that are installed at furnace throat to contain and remove molten ash from the furnace, and Dry Bottom is defined as having no slag tanks at furnace throat area, throat area is clear, and bottom ash drops through throat to bottom ash water hoppers. |
core_eia860__scd_emissions_control_equipment#
Noticed any irregularities with this table? File a report.
Slowly changing dimension (SCD) table describing attributes of emissions control equipment reported to EIA, including cost, type, operating status, retirement date, and install year.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA Form 860 – Annual Electric Generator Report
- Primary key:
report_date, plant_id_eia, emission_control_id_pudl
Usage Warnings#
Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.
Additional Details#
Includes control ids for sulfur dioxide (SO2), particulate matter, mercury, nitrogen oxide (NOX), and acid (HCl) gas monitoring.
This table inherits canonicalized values for plants and utilities. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia860__scd_emissions_control_equipment.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia860__scd_emissions_control_equipment.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia860__scd_emissions_control_equipment.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia860__scd_emissions_control_equipment.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
acid_gas_control |
boolean |
Indicates whether the emissions control equipment controls acid (HCl) gas. |
data_maturity |
string |
Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity. |
emission_control_equipment_cost |
number |
The total cost to install a piece of emission control equipment. |
emission_control_equipment_type_code |
string |
Short code indicating the type of emission control equipment installed. |
emission_control_id_pudl |
number |
A PUDL-generated ID used to distinguish emission control units in the same report year and plant id. This ID should not be used to track units over time or between plants. |
emission_control_operating_date |
date |
The date a piece of emissions control equipment began operating. Derived from month and year columns in the raw data. |
emission_control_retirement_date |
date |
The expected or actual retirement date for a piece of emissions control equipment. Derived from month and year columns in the raw data. |
mercury_control_id_eia |
string |
Mercury control identification number. This ID is not a unique identifier. |
nox_control_id_eia |
string |
Nitrogen oxide control identification number. This ID is not a unique identifier. |
operational_status_code |
string |
The operating status of the asset. |
particulate_control_id_eia |
string |
Particulate matter control identification number. This ID is not a unique identifier. |
plant_id_eia |
integer |
The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration. |
report_date |
date |
Date reported. |
so2_control_id_eia |
string |
Sulfur dioxide control identification number. This ID is not a unique identifier. |
core_eia860__scd_generators#
Noticed any irregularities with this table? File a report.
Slowly changing dimension (SCD) table describing attributes of generators which may vary from year to year. Compiled from across EIA-860 and EIA-923 data.
- Most-recent data:
2026
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA Form 860 – Annual Electric Generator Report
- Primary key:
plant_id_eia, generator_id, report_date
Usage Warnings#
Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.
Contains information from multiple raw inputs.
Additional Details#
This is one of two tables where canonical values for generators are set. It contains values which are expected to vary slowly, while core_eia__entity_generators contains those expected to remain fixed. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process. All tables downstream of this one inherit the canonical values established here.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia860__scd_generators.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia860__scd_generators.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia860__scd_generators.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia860__scd_generators.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
can_burn_multiple_fuels |
boolean |
Whether the generator can burn multiple fuels. |
can_cofire_fuels |
boolean |
Whether the generator can co-fire fuels. |
can_switch_oil_gas |
boolean |
Whether the generator can switch between oil and natural gas. |
capacity_mw |
number |
Total installed (nameplate) capacity, in megawatts. |
carbon_capture |
boolean |
Indicates whether the generator uses carbon capture technology. |
current_planned_generator_operating_date |
date |
The most recently updated effective date on which the generator is scheduled to start operation |
data_maturity |
string |
Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity. |
deliver_power_transgrid |
boolean |
Indicate whether the generator can deliver power to the transmission grid. |
distributed_generation |
boolean |
Whether the generator is considered distributed generation |
energy_source_1_transport_1 |
string |
Primary mode of transport for energy source 1. |
energy_source_1_transport_2 |
string |
Secondary mode of transport for energy source 1. |
energy_source_1_transport_3 |
string |
Tertiary mode of transport for energy source 1. |
energy_source_2_transport_1 |
string |
Primary mode of transport for energy source 2. |
energy_source_2_transport_2 |
string |
Secondary mode of transport for energy source 2. |
energy_source_2_transport_3 |
string |
Tertiary mode of transport for energy source 2. |
energy_source_code_1 |
string |
The code representing the most predominant type of energy that fuels the generator. |
energy_source_code_2 |
string |
The code representing the second most predominant type of energy that fuels the generator |
energy_source_code_3 |
string |
The code representing the third most predominant type of energy that fuels the generator |
energy_source_code_4 |
string |
The code representing the fourth most predominant type of energy that fuels the generator |
energy_source_code_5 |
string |
The code representing the fifth most predominant type of energy that fuels the generator |
energy_source_code_6 |
string |
The code representing the sixth most predominant type of energy that fuels the generator |
energy_storage_capacity_mwh |
number |
Energy storage capacity in MWh (e.g. for batteries). |
ferc_qualifying_facility |
boolean |
Indicates whether or not a generator is a qualifying FERC cogeneration facility. |
fuel_type_code_pudl |
string |
Simplified fuel type code used in PUDL |
generator_id |
string |
Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string! |
generator_retirement_date |
date |
Date of the scheduled or effected retirement of the generator. |
minimum_load_mw |
number |
The minimum load at which the generator can operate at continuosuly. |
nameplate_power_factor |
number |
The nameplate power factor of the generator. |
net_capacity_mwdc |
number |
Generation capacity in megawatts of direct current that is subject to a net metering agreement. Typically used for behind-the-meter solar PV. |
operational_status |
string |
The operating status of the asset. For generators this is based on which tab the generator was listed in in EIA 860. |
operational_status_code |
string |
The operating status of the asset. |
other_modifications_date |
date |
Planned effective date that the generator is scheduled to enter commercial operation after any other planned modification is complete. |
other_planned_modifications |
boolean |
Indicates whether there are there other modifications planned for the generator. |
owned_by_non_utility |
boolean |
Whether any part of generator is owned by a nonutilty |
ownership_code |
string |
Identifies the ownership for each generator. |
planned_derate_date |
date |
Planned effective month that the generator is scheduled to enter operation after the derate modification. |
planned_energy_source_code_1 |
string |
New energy source code for the planned repowered generator. |
planned_generator_retirement_date |
date |
Planned effective date of the scheduled retirement of the generator. |
planned_modifications |
boolean |
Indicates whether there are any planned capacity uprates/derates, repowering, other modifications, or generator retirements scheduled for the next 5 years. |
planned_net_summer_capacity_derate_mw |
number |
Decrease in summer capacity expected to be realized from the derate modification to the equipment. |
planned_net_summer_capacity_uprate_mw |
number |
Increase in summer capacity expected to be realized from the modification to the equipment. |
planned_net_winter_capacity_derate_mw |
number |
Decrease in winter capacity expected to be realized from the derate modification to the equipment. |
planned_net_winter_capacity_uprate_mw |
number |
Increase in winter capacity expected to be realized from the uprate modification to the equipment. |
planned_new_capacity_mw |
number |
The expected new namplate capacity for the generator. |
planned_new_prime_mover_code |
string |
New prime mover for the planned repowered generator. |
planned_repower_date |
date |
Planned effective date that the generator is scheduled to enter operation after the repowering is complete. |
planned_uprate_date |
date |
Planned effective date that the generator is scheduled to enter operation after the uprate modification. |
plant_id_eia |
integer |
The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration. |
prime_mover_code |
string |
Code for the type of prime mover (e.g. CT, CG) |
reactive_power_output_mvar |
number |
Reactive Power Output (MVAr) |
report_date |
date |
Date reported. |
startup_source_code_1 |
string |
The code representing the first, second, third or fourth start-up and flame stabilization energy source used by the combustion unit(s) associated with this generator. |
startup_source_code_2 |
string |
The code representing the first, second, third or fourth start-up and flame stabilization energy source used by the combustion unit(s) associated with this generator. |
startup_source_code_3 |
string |
The code representing the first, second, third or fourth start-up and flame stabilization energy source used by the combustion unit(s) associated with this generator. |
startup_source_code_4 |
string |
The code representing the first, second, third or fourth start-up and flame stabilization energy source used by the combustion unit(s) associated with this generator. |
summer_capacity_estimate |
boolean |
Whether the summer capacity value was an estimate |
summer_capacity_mw |
number |
The net summer capacity. |
summer_estimated_capability_mw |
number |
EIA estimated summer capacity (in MWh). |
synchronized_transmission_grid |
boolean |
Indicates whether standby generators (SB status) can be synchronized to the grid. |
technology_description |
string |
High level description of the technology used by the generator to produce electricity. |
time_cold_shutdown_full_load_code |
string |
The minimum amount of time required to bring the unit to full load from shutdown. |
turbines_inverters_hydrokinetics |
integer |
Number of wind turbines, or hydrokinetic buoys. |
turbines_num |
integer |
Number of wind turbines, or hydrokinetic buoys. |
uprate_derate_completed_date |
date |
The date when the uprate or derate was completed. |
uprate_derate_during_year |
boolean |
Was an uprate or derate completed on this generator during the reporting year? |
utility_id_eia |
integer |
The EIA Utility Identification number. |
winter_capacity_estimate |
boolean |
Whether the winter capacity value was an estimate |
winter_capacity_mw |
number |
The net winter capacity. |
winter_estimated_capability_mw |
number |
EIA estimated winter capacity (in MWh). |
core_eia860__scd_generators_energy_storage#
Noticed any irregularities with this table? File a report.
Slowly changing dimension (SCD) table describing attributes of energy storage which may vary from year to year.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA Form 860 – Annual Electric Generator Report (Schedule 3)
- Primary key:
plant_id_eia, generator_id, report_date
Additional Details#
This table includes only those values that are unique to energy storage. The rest of the columns that are reported in the EIA-860 Energy Storage tabs are included in core_eia860__scd_generators and core_eia__entity_generators.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia860__scd_generators_energy_storage.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia860__scd_generators_energy_storage.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia860__scd_generators_energy_storage.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia860__scd_generators_energy_storage.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
generator_id |
string |
Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string! |
generator_id_direct_support_1 |
string |
The EIA Generator ID of the primary unit whose generation this energy storage device is intended to firm or store. |
generator_id_direct_support_2 |
string |
The EIA Generator ID of the secondary unit whose generation this energy storage device is intended to firm or store. |
generator_id_direct_support_3 |
string |
The EIA Generator ID of the tertiary unit whose generation this energy storage device is intended to firm or store. |
is_ac_coupled |
boolean |
Indicates if this energy storage device is AC-coupled (means the energy storage device and the PV system are not installed on the same side of an inverter). |
is_dc_coupled |
boolean |
Indicates if this energy storage device is DC-coupled (means the energy storage device and the PV system are on the same side of an inverter and the battery can still charge from the grid). |
is_dc_coupled_tightly |
boolean |
Indicates if this energy storage device is DC tightly coupled (means the energy storage device and the PV system are on the same side of an inverter and the battery cannot charge from the grid). |
is_direct_support |
boolean |
Indicates if this energy storage device is intended for dedicated generator firming or storing excess generation of other units. |
is_independent |
boolean |
Indicates if this energy storage device is independent (not coupled with another generators) |
is_transmission_and_distribution_asset_support |
boolean |
Indicate if the energy storage system is intended to support a specific substation, transmission or distribution asset. |
max_charge_rate_mw |
number |
Maximum charge rate in MW. |
max_discharge_rate_mw |
number |
Maximum discharge rate in MW. |
plant_id_eia |
integer |
The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration. |
plant_id_eia_direct_support_1 |
integer |
The EIA Plant ID of the primary unit whose generation this energy storage device is intended to firm or store. |
plant_id_eia_direct_support_2 |
integer |
The EIA Plant ID of the secondary unit whose generation this energy storage device is intended to firm or store. |
plant_id_eia_direct_support_3 |
integer |
The EIA Plant ID of the tertiary unit whose generation this energy storage device is intended to firm or store. |
report_date |
date |
Date reported. |
served_arbitrage |
boolean |
Whether the energy storage device served arbitrage applications during the reporting year |
served_backup_power |
boolean |
Whether the energy storage device served backup power applications during the reporting year. |
served_co_located_renewable_firming |
boolean |
Whether the energy storage device served renewable firming applications during the reporting year. |
served_frequency_regulation |
boolean |
Whether the energy storage device served frequency regulation applications during the reporting year. |
served_load_following |
boolean |
Whether the energy storage device served load following applications during the reporting year. |
served_load_management |
boolean |
Whether the energy storage device served load management applications during the reporting year. |
served_ramping_spinning_reserve |
boolean |
Whether the this energy storage device served ramping / spinning reserve applications during the reporting year. |
served_system_peak_shaving |
boolean |
Whether the energy storage device served system peak shaving applications during the reporting year. |
served_transmission_and_distribution_deferral |
boolean |
Whether the energy storage device served renewable firming applications during the reporting year. |
served_voltage_or_reactive_power_support |
boolean |
Whether the energy storage device served voltage or reactive power support applications during the reporting year. |
storage_enclosure_code |
string |
A code representing the enclosure type that best describes where the generator is located. |
storage_technology_code_1 |
string |
The electro-chemical storage technology used for this battery applications. |
storage_technology_code_2 |
string |
The electro-chemical storage technology used for this battery applications. |
storage_technology_code_3 |
string |
The electro-chemical storage technology used for this battery applications. |
storage_technology_code_4 |
string |
The electro-chemical storage technology used for this battery applications. |
stored_excess_wind_and_solar_generation |
boolean |
Whether the energy storage device was used to store excess wind/solar generation during the reporting year. |
core_eia860__scd_generators_multifuel#
Noticed any irregularities with this table? File a report.
Slowly changing dimension (SCD) table describing attributes of generators that may vary from year to year, pertaining to fuel-switching and the use of multiple fuels.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA Form 860 – Annual Electric Generator Report (Schedule 3)
- Primary key:
report_date, utility_id_eia, generator_id, plant_id_eia
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia860__scd_generators_multifuel.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia860__scd_generators_multifuel.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia860__scd_generators_multifuel.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia860__scd_generators_multifuel.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
can_burn_multiple_fuels |
boolean |
Whether the generator can burn multiple fuels. |
can_cofire_100_oil |
boolean |
Whether the generator can co-fire 100 oil. |
can_cofire_fuels |
boolean |
Whether the generator can co-fire fuels. |
can_cofire_oil_and_gas |
boolean |
Whether the generator can co-fire oil and gas. |
can_fuel_switch |
boolean |
Whether a unit is able to switch fuels. |
can_switch_oil_gas |
boolean |
Whether the generator can switch between oil and natural gas. |
can_switch_when_operating |
boolean |
Indicates whether a fuel switching generator can switch fuels while operating. |
capacity_mw |
number |
Total installed (nameplate) capacity, in megawatts. |
cofire_energy_source_1 |
string |
The codes representing the type of fuel that will be co-fired. |
cofire_energy_source_2 |
string |
The codes representing the type of fuel that will be co-fired. |
cofire_energy_source_3 |
string |
The codes representing the type of fuel that will be co-fired. |
cofire_energy_source_4 |
string |
The codes representing the type of fuel that will be co-fired. |
cofire_energy_source_5 |
string |
The codes representing the type of fuel that will be co-fired. |
cofire_energy_source_6 |
string |
The codes representing the type of fuel that will be co-fired. |
county |
string |
County name. |
current_planned_generator_operating_date |
date |
The most recently updated effective date on which the generator is scheduled to start operation |
energy_source_code_1 |
string |
The code representing the most predominant type of energy that fuels the generator. |
energy_source_code_2 |
string |
The code representing the second most predominant type of energy that fuels the generator |
fuel_switch_energy_source_1 |
string |
The codes representing the type of fuel that will be able to be used as a sole source of fuel for this unit. |
fuel_switch_energy_source_2 |
string |
The codes representing the type of fuel that will be able to be used as a sole source of fuel for this unit. |
fuel_switch_energy_source_3 |
string |
The codes representing the type of fuel that will be able to be used as a sole source of fuel for this unit. |
fuel_switch_energy_source_4 |
string |
The codes representing the type of fuel that will be able to be used as a sole source of fuel for this unit. |
fuel_switch_energy_source_5 |
string |
The codes representing the type of fuel that will be able to be used as a sole source of fuel for this unit. |
fuel_switch_energy_source_6 |
string |
The codes representing the type of fuel that will be able to be used as a sole source of fuel for this unit. |
generator_id |
string |
Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string! |
has_air_permit_limits |
boolean |
Whether air permit limits are a factor that limits the generator’s ability to switch between oil and natural gas. |
has_factors_that_limit_switching |
boolean |
Whether there are factors that limit the generator’s ability to switch between oil and natural gas. |
has_other_factors_that_limit_switching |
boolean |
Whether there are factors other than air permit limits and storage that limit the generator’s ability to switch between oil and natural gas. |
has_regulatory_limits |
boolean |
Whether there are factors that limit the operation of the generator when running on 100 percent oil |
has_storage_limits |
boolean |
Whether limited on-site fuel storage is a factor that limits the generator’s ability to switch between oil and natural gas. |
max_oil_heat_input |
number |
The maximum oil heat input (percent of MMBTUs) expected for proposed unit when co-firing with natural gas |
max_oil_output_mw |
number |
The maximum output (net MW) expected for proposed unit, when making the maximum use of oil and co-firing natural gas. |
net_summer_capacity_natural_gas_mw |
number |
The maximum net summer output achievable when running on natural gas. |
net_summer_capacity_oil_mw |
number |
The maximum net summer output achievable when running on oil. |
net_winter_capacity_natural_gas_mw |
number |
The maximum net winter output achievable when running on natural gas. |
net_winter_capacity_oil_mw |
number |
The maximum net summer output achievable when running on oil. |
operational_status_code |
string |
The operating status of the asset. |
plant_id_eia |
integer |
The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration. |
plant_name_eia |
string |
Plant name. |
prime_mover_code |
string |
Code for the type of prime mover (e.g. CT, CG) |
report_date |
date |
Date reported. |
sector_id_eia |
integer |
EIA assigned sector ID, corresponding to high level NAICS sector, designated by the primary purpose, regulatory status and plant-level combined heat and power status |
sector_name_eia |
string |
EIA assigned sector name, corresponding to high level NAICS sector, designated by the primary purpose, regulatory status and plant-level combined heat and power status |
state |
string |
Two letter US state abbreviation. |
summer_capacity_mw |
number |
The net summer capacity. |
technology_description |
string |
High level description of the technology used by the generator to produce electricity. |
time_to_switch_gas_to_oil |
string |
The time required to switch the generator from running 100 percent natural gas to running 100 percent oil. |
time_to_switch_oil_to_gas |
string |
The time required to switch the generator from running 100 percent oil to running 100 percent natural gas. |
utility_id_eia |
integer |
The EIA Utility Identification number. |
utility_name_eia |
string |
The name of the utility. |
winter_capacity_mw |
number |
The net winter capacity. |
core_eia860__scd_generators_solar#
Noticed any irregularities with this table? File a report.
Slowly changing dimension (SCD) table describing attributes of solar generators that may vary from year to year.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA Form 860 – Annual Electric Generator Report (Schedule 3.3)
- Primary key:
plant_id_eia, generator_id, report_date
Additional Details#
This table includes only those values that are unique to solar generators. The rest of the columns that are reported in the EIA-860 Solar tabs are included in core_eia860__scd_generators and core_eia__entity_generators.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia860__scd_generators_solar.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia860__scd_generators_solar.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia860__scd_generators_solar.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia860__scd_generators_solar.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
azimuth_angle_deg |
number |
Indicates the azimuth angle of the unit for fixed tilt or single-axis technologies. |
generator_id |
string |
Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string! |
net_metering_capacity_mwdc |
number |
The DC megawatt capacity that is part of a net metering agreement. |
plant_id_eia |
integer |
The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration. |
report_date |
date |
Date reported. |
standard_testing_conditions_capacity_mwdc |
number |
The net capacity of this photovoltaic generator in direct current under standard test conditions (STC) of 1000 W/m^2 solar irradiance and 25 degrees Celsius PV module temperature. This was only reported in 2013 and 2014. |
tilt_angle_deg |
number |
Indicates the tilt angle of the unit for fixed tilt or single-axis technologies. |
uses_bifacial_panels |
boolean |
Indicates whether bifacial solar panels are used at this solar generating unit. |
uses_material_crystalline_silicon |
boolean |
Indicates whether any solar photovoltaic panels at this generator are made of crystalline silicon. |
uses_material_other |
boolean |
Indicates whether any solar photovoltaic panels at this generator are made of other materials. |
uses_material_thin_film_a_si |
boolean |
Indicates whether any solar photovoltaic panels at this generator are made of thin-film amorphous silicon (A-Si). |
uses_material_thin_film_cdte |
boolean |
Indicates whether any solar photovoltaic panels at this generator are made of thin-film cadmium telluride (CdTe). |
uses_material_thin_film_cigs |
boolean |
Indicates whether any solar photovoltaic panels at this generator are made of thin-film copper indium gallium diselenide (CIGS). |
uses_material_thin_film_other |
boolean |
Indicates whether any solar photovoltaic panels at this generator are made of other thin-film material. |
uses_net_metering_agreement |
boolean |
Indicates if the output from this generator is part of a net metering agreement. |
uses_technology_dish_engine |
boolean |
Indicates whether dish engines are used at this solar generating unit. |
uses_technology_dual_axis_tracking |
boolean |
Indicates whether dual-axis tracking technologies are used at this solar generating unit. |
uses_technology_east_west_fixed_tilt |
boolean |
Indicates whether east west fixed tilt technologies are used at this solar generating unit. |
uses_technology_fixed_tilt |
boolean |
Indicates whether fixed tilt technologies are used at this solar generating unit. |
uses_technology_lenses_mirrors |
boolean |
Indicates whether lenses or mirrors are used at this solar generating unit. |
uses_technology_linear_fresnel |
boolean |
Indicates whether linear fresnel technologies are used at this solar generating unit. |
uses_technology_other |
boolean |
Indicates whether other solar technologies are used at this solar generating unit. |
uses_technology_parabolic_trough |
boolean |
Indicates whether parabolic trough technologies s are used at this solar generating unit. |
uses_technology_power_tower |
boolean |
Indicates whether power towers are used at this solar generating unit. |
uses_technology_single_axis_tracking |
boolean |
Indicates whether single-axis tracking technologies are used at this solar generating unit. |
uses_virtual_net_metering_agreement |
boolean |
Indicates if the output from this generator is part of a virtual net metering agreement. |
virtual_net_metering_capacity_mwdc |
number |
The DC capacity in MW that is part of a virtual net metering agreement. |
core_eia860__scd_generators_wind#
Noticed any irregularities with this table? File a report.
Slowly changing dimension (SCD) table describing attributes of wind generators that may vary from year to year.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA Form 860 – Annual Electric Generator Report (Schedule 3.2)
- Primary key:
plant_id_eia, generator_id, report_date
Additional Details#
This table includes only those values that are unique to wind generators. The rest of the columns that are reported in the EIA-860 Wind tabs are included in core_eia860__scd_generators and core_eia__entity_generators.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia860__scd_generators_wind.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia860__scd_generators_wind.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia860__scd_generators_wind.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia860__scd_generators_wind.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
design_wind_speed_mph |
number |
Average annual wind speed that turbines at this wind site were designed for. |
generator_id |
string |
Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string! |
obstacle_id_faa |
string |
The Federal Aviation Administration (FAA) obstacle number assigned to this generator. If more than one obstacle number exists, the one that best represents the turbines. References the obstacle numbers reported in the FAA’s Digital Obstacle File: https://www.faa.gov/air_traffic/flight_info/aeronav/digital_products/dof/ This field was only reported from 2013 through 2015. |
plant_id_eia |
integer |
The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration. |
predominant_turbine_manufacturer |
string |
Name of predominant manufacturer of turbines at this generator. |
predominant_turbine_model |
string |
Predominant model number of turbines at this generator. |
report_date |
date |
Date reported. |
turbine_hub_height_feet |
number |
The hub height of turbines at this generator. If more than one value exists, the one that best represents the turbines. |
wind_quality_class |
integer |
The wind quality class for turbines at this generator. See table core_eia__codes_wind_quality_class for specifications about each class. |
core_eia860__scd_ownership#
Noticed any irregularities with this table? File a report.
Slowly changing dimension (SCD) table describing attributes of generator ownership. Includes only jointly or third-party owned generators.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA Form 860 – Annual Electric Generator Report (Schedule 4)
- Primary key:
report_date, plant_id_eia, generator_id, owner_utility_id_eia
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia860__scd_ownership.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia860__scd_ownership.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia860__scd_ownership.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia860__scd_ownership.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
data_maturity |
string |
Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity. |
fraction_owned |
number |
Proportion of generator ownership attributable to this utility. |
generator_id |
string |
Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string! |
operator_utility_id_eia |
integer |
The EIA utility Identification number for the operator utility. |
owner_city |
string |
City of owner. |
owner_country |
string |
Three letter ISO-3166 country code. |
owner_state |
string |
Two letter ISO-3166 political subdivision code. |
owner_street_address |
string |
Steet address of owner. |
owner_utility_id_eia |
integer |
The EIA utility Identification number for the owner company that is responsible for the day-to-day operations of the generator, not the operator utility. |
owner_utility_name_eia |
string |
The name of the EIA owner utility. |
owner_zip_code |
string |
Zip code of owner. |
plant_id_eia |
integer |
The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration. |
report_date |
date |
Date reported. |
core_eia860__scd_plants#
Noticed any irregularities with this table? File a report.
Slowly changing dimension (SCD) table describing attributes of plants which may vary from year to year. Compiled from across all EIA-860 and EIA-923 data.
- Most-recent data:
2026
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA Form 860 – Annual Electric Generator Report
- Primary key:
plant_id_eia, report_date
Usage Warnings#
Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.
Contains information from multiple raw inputs.
Additional Details#
This is one of two tables where canonical values for plants are set. It contains values which are expected to vary slowly, while core_eia__entity_plants contains those expected to remain fixed. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process. All tables downstream of this one inherit the canonical values established here.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia860__scd_plants.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia860__scd_plants.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia860__scd_plants.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia860__scd_plants.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
ash_impoundment |
boolean |
Is there an ash impoundment (e.g. pond, reservoir) at the plant? |
ash_impoundment_lined |
boolean |
If there is an ash impoundment at the plant, is the impoundment lined? |
ash_impoundment_status |
string |
If there is an ash impoundment at the plant, the ash impoundment status as of December 31 of the reporting year. |
balancing_authority_code_eia |
string |
EIA short code identifying a balancing authority. May include Canadian and Mexican BAs. |
balancing_authority_name_eia |
string |
Name of the balancing authority. |
data_maturity |
string |
Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity. |
datum |
string |
Geodetic coordinate system identifier (e.g. NAD27, NAD83, or WGS84). |
energy_storage |
boolean |
Indicates if the facility has energy storage capabilities. |
ferc_cogen_docket_no |
string |
The docket number relating to the FERC cogenerator status. See FERC Form 556. |
ferc_cogen_status |
boolean |
Indicates whether the plant has FERC qualifying facility cogenerator status. See FERC Form 556. |
ferc_exempt_wholesale_generator |
boolean |
Indicates whether the plant has FERC qualifying facility exempt wholesale generator status |
ferc_exempt_wholesale_generator_docket_no |
string |
The docket number relating to the FERC qualifying facility exempt wholesale generator status. |
ferc_qualifying_facility_docket_no |
string |
The docket number relating to the FERC qualifying facility cogenerator status. See FERC Form 556. |
ferc_small_power_producer |
boolean |
Indicates whether the plant has FERC qualifying facility small power producer status. See FERC Form 556. |
ferc_small_power_producer_docket_no |
string |
The docket number relating to the FERC qualifying facility small power producer status. See FERC Form 556. |
grid_voltage_1_kv |
number |
Plant’s grid voltage at point of interconnection to transmission or distribution facilities |
grid_voltage_2_kv |
number |
Plant’s grid voltage at point of interconnection to transmission or distribution facilities |
grid_voltage_3_kv |
number |
Plant’s grid voltage at point of interconnection to transmission or distribution facilities |
has_net_metering |
boolean |
Whether the plant has a net metering agreement in effect during the reporting year. (Only displayed for facilities that report the sun or wind as an energy source). This field was only reported up until 2015 |
iso_rto_code |
string |
The code of the plant’s ISO or RTO. NA if not reported in that year. |
liquefied_natural_gas_storage |
boolean |
Indicates if the facility have the capability to store the natural gas in the form of liquefied natural gas. |
natural_gas_local_distribution_company |
string |
Names of Local Distribution Company (LDC), connected to natural gas burning power plants. |
natural_gas_pipeline_name_1 |
string |
The name of the owner or operator of natural gas pipeline that connects directly to this facility or that connects to a lateral pipeline owned by this facility. |
natural_gas_pipeline_name_2 |
string |
The name of the owner or operator of natural gas pipeline that connects directly to this facility or that connects to a lateral pipeline owned by this facility. |
natural_gas_pipeline_name_3 |
string |
The name of the owner or operator of natural gas pipeline that connects directly to this facility or that connects to a lateral pipeline owned by this facility. |
natural_gas_storage |
boolean |
Indicates if the facility have on-site storage of natural gas. |
nerc_region |
string |
NERC region in which the plant is located |
pipeline_notes |
string |
Additional owner or operator of natural gas pipeline. |
plant_id_eia |
integer |
The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration. |
primary_purpose_id_naics |
integer |
North American Industry Classification System (NAICS) code that best describes the primary purpose of the reporting plant |
regulatory_status_code |
string |
Indicates whether the plant is regulated or non-regulated. |
report_date |
date |
Date reported. |
reporting_frequency_code |
string |
Code that specifies what time period data has to be reported (i.e. monthly data or annual totals) and how often the power plant reports this data to EIA. See reporting_frequencies_eia for more details. |
sector_id_eia |
integer |
EIA assigned sector ID, corresponding to high level NAICS sector, designated by the primary purpose, regulatory status and plant-level combined heat and power status |
sector_name_eia |
string |
EIA assigned sector name, corresponding to high level NAICS sector, designated by the primary purpose, regulatory status and plant-level combined heat and power status |
service_area |
string |
Service area in which plant is located; for unregulated companies, it’s the electric utility with which plant is interconnected |
transmission_distribution_owner_id |
integer |
EIA-assigned code for owner of transmission/distribution system to which the plant is interconnected. |
transmission_distribution_owner_name |
string |
Name of the owner of the transmission or distribution system to which the plant is interconnected. |
transmission_distribution_owner_state |
string |
State location for owner of transmission/distribution system to which the plant is interconnected. |
utility_id_eia |
integer |
The EIA Utility Identification number. |
water_source |
string |
Name of water source associated with the plant. |
core_eia860__scd_utilities#
Noticed any irregularities with this table? File a report.
Slowly changing dimension (SCD) table describing attributes of utilities which may vary from year to year. Compiled from all EIA data.
- Most-recent data:
2026
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA Form 860 – Annual Electric Generator Report
- Primary key:
utility_id_eia, report_date
Usage Warnings#
Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.
Contains information from multiple raw inputs.
Additional Details#
This is one of two tables where canonical values for utilities are set. It contains values which are expected to vary slowly, while core_eia__entity_utilities contains those expected to remain fixed. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process. All tables downstream of this one inherit the canonical values established here.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia860__scd_utilities.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia860__scd_utilities.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia860__scd_utilities.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia860__scd_utilities.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
address_2 |
string |
Second line of the address. |
attention_line |
string |
Mail attention name of the operator/owner. |
city |
string |
Name of the city. |
contact_firstname |
string |
First name of utility contact 1. |
contact_firstname_2 |
string |
First name of utility contact 2. |
contact_lastname |
string |
Last name of utility contact 1. |
contact_lastname_2 |
string |
Last name of utility contact 2. |
contact_title |
string |
Title of of utility contact 1. |
contact_title_2 |
string |
Title of utility contact 2. |
data_maturity |
string |
Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity. |
entity_type |
string |
Entity type of principal owner. |
phone_extension |
string |
Phone extension for utility contact 1 |
phone_extension_2 |
string |
Phone extension for utility contact 2 |
phone_number |
string |
Phone number for utility contact 1. |
phone_number_2 |
string |
Phone number for utility contact 2. |
plants_reported_asset_manager |
boolean |
Is the reporting entity an asset manager of power plants reported on Schedule 2 of the form? |
plants_reported_operator |
boolean |
Is the reporting entity an operator of power plants reported on Schedule 2 of the form? |
plants_reported_other_relationship |
boolean |
Does the reporting entity have any other relationship to the power plants reported on Schedule 2 of the form? |
plants_reported_owner |
boolean |
Is the reporting entity an owner of power plants reported on Schedule 2 of the form? |
report_date |
date |
Date reported. |
state |
string |
Two letter US state abbreviation. |
street_address |
string |
Physical street address. |
utility_id_eia |
integer |
The EIA Utility Identification number. |
zip_code |
string |
Five digit US Zip Code. |
zip_code_4 |
string |
Four digit US Zip Code suffix. |
core_eia860m__changelog_generators#
Noticed any irregularities with this table? File a report.
Changelog table tracking changes in generator characteristics reported in EIA-860M.
- Most-recent data:
2026
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA Form 860M – Monthly Update to the Annual Electric Generator Report
- Primary key:
plant_id_eia, generator_id, report_date
Additional Details#
EIA-860M includes generator tables with the most up-to-date catalog of EIA generators and their operational status and other generator characteristics. EIA-860M is reported monthly, although for the vast majority of the generators nothing changes month-to-month. This table is a changelog of that monthly reported generator data. There is a record corresponding to the first instance of a generator and associated characteristics with a report_date column and a valid_until_date column. Whenever any of the reported EIA-860M data was changed for a record, there will be a new changelog record with a new report_date.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia860m__changelog_generators.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia860m__changelog_generators.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia860m__changelog_generators.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia860m__changelog_generators.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
balancing_authority_code_eia |
string |
EIA short code identifying a balancing authority. May include Canadian and Mexican BAs. |
capacity_mw |
number |
Total installed (nameplate) capacity, in megawatts. |
county |
string |
County name. |
current_planned_generator_operating_date |
date |
The most recently updated effective date on which the generator is scheduled to start operation |
data_maturity |
string |
Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity. |
energy_source_code_1 |
string |
The code representing the most predominant type of energy that fuels the generator. |
energy_storage_capacity_mwh |
number |
Energy storage capacity in MWh (e.g. for batteries). |
fuel_type_code_pudl |
string |
Simplified fuel type code used in PUDL |
generator_id |
string |
Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string! |
generator_operating_date |
date |
Date the generator began commercial operation. If harvested values are inconsistent, we default to using the most recently reported date. |
generator_retirement_date |
date |
Date of the scheduled or effected retirement of the generator. |
latitude |
number |
Latitude of the plant’s location, in degrees. |
longitude |
number |
Longitude of the plant’s location, in degrees. |
net_capacity_mwdc |
number |
Generation capacity in megawatts of direct current that is subject to a net metering agreement. Typically used for behind-the-meter solar PV. |
operational_status |
string |
The operating status of the asset. For generators this is based on which tab the generator was listed in in EIA 860. |
operational_status_code |
string |
The operating status of the asset. |
planned_derate_date |
date |
Planned effective month that the generator is scheduled to enter operation after the derate modification. |
planned_generator_retirement_date |
date |
Planned effective date of the scheduled retirement of the generator. |
planned_net_summer_capacity_derate_mw |
number |
Decrease in summer capacity expected to be realized from the derate modification to the equipment. |
planned_net_summer_capacity_uprate_mw |
number |
Increase in summer capacity expected to be realized from the modification to the equipment. |
planned_repower_date |
date |
Planned effective date that the generator is scheduled to enter operation after the repowering is complete. |
planned_uprate_date |
date |
Planned effective date that the generator is scheduled to enter operation after the uprate modification. |
plant_id_eia |
integer |
The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration. |
plant_name_eia |
string |
Plant name. |
prime_mover_code |
string |
Code for the type of prime mover (e.g. CT, CG) |
report_date |
date |
Date reported. |
sector_id_eia |
integer |
EIA assigned sector ID, corresponding to high level NAICS sector, designated by the primary purpose, regulatory status and plant-level combined heat and power status |
state |
string |
Two letter US state abbreviation. |
summer_capacity_mw |
number |
The net summer capacity. |
technology_description |
string |
High level description of the technology used by the generator to produce electricity. |
utility_id_eia |
integer |
The EIA Utility Identification number. |
utility_name_eia |
string |
The name of the utility. |
valid_until_date |
date |
The record in the changelog is valid until this date. The record is valid from the report_date up until but not including the valid_until_date. |
winter_capacity_mw |
number |
The net winter capacity. |
core_eia861__assn_utility#
Noticed any irregularities with this table? File a report.
Association table providing connections between utility and state in a given year.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA Form 861 – Annual Electric Power Industry Report
- Primary key:
report_date, utility_id_eia, state
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia861__assn_utility.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia861__assn_utility.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__assn_utility.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia861__assn_utility.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
report_date |
date |
Date reported. |
state |
string |
Two letter US state abbreviation. |
utility_id_eia |
integer |
The EIA Utility Identification number. |
core_eia861__yearly_advanced_metering_infrastructure#
Noticed any irregularities with this table? File a report.
Annual time series of advanced metering infrastructure (AMI) and automated meter reading (AMR) by state, sector, and balancing authority.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA Form 861 – Annual Electric Power Industry Report
- Primary key:
balancing_authority_code_eia, customer_class, report_date, state, utility_id_eia
Additional Details#
The energy served (in MWH) for AMI systems is provided. Form EIA-861 respondents also report the number of standard meters (non AMR/AMI) in their system. Historical Changes: We started collecting the number of standard meters in 2013. The monthly survey collected these data from January 2011 to January 2017.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia861__yearly_advanced_metering_infrastructure.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia861__yearly_advanced_metering_infrastructure.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__yearly_advanced_metering_infrastructure.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia861__yearly_advanced_metering_infrastructure.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
advanced_metering_infrastructure |
integer |
Number of meters that measure and record usage data at a minimum, in hourly intervals and provide usage data at least daily to energy companies and may also provide data to consumers. Data are used for billing and other purposes. Advanced meters include basic hourly interval meters and extend to real-time meters with built-in two-way communication capable of recording and transmitting instantaneous data. |
automated_meter_reading |
integer |
Number of meters that collect data for billing purposes only and transmit this data one way, usually from the customer to the distribution utility. |
balancing_authority_code_eia |
string |
EIA short code identifying a balancing authority. May include Canadian and Mexican BAs. |
customer_class |
string |
High level categorization of customer type (e.g., commercial, residential). |
daily_digital_access_customers |
integer |
Number of customers able to access daily energy usage through a webportal or other electronic means. |
data_maturity |
string |
Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity. |
direct_load_control_customers |
integer |
Number of customers with direct load control: a A demand response activity by which the program sponsor remotely shuts down or cycles a customer’s electrical equipment (e.g. air conditioner, water heater) on short notice. |
energy_served_ami_mwh |
number |
Amount of energy served through AMI meters. AMI meters can transmit data in both directions, between the delivery entity and the customer. |
entity_type |
string |
Entity type of principal owner. |
home_area_network |
integer |
Number of AMI meters with home area network (HAN) gateway enabled. |
non_amr_ami |
integer |
Number of non-AMR/AMI meters. Usually electromechanical or solid state meters measuring aggregated kWh where data are manually retrieved over monthly billing cycles for billing purposes only. Standard meters may also include functions to measure time-of-use and/or demand with data manually retrieved over monthly billing cycles. |
report_date |
date |
Date reported. |
short_form |
boolean |
Whether the reported information comes from the short form. In the case of form EIA 861, a shorter version of the form was created in 2012 to reduce respondent burden on smaller utilities and increase our processing efficiency. |
state |
string |
Two letter US state abbreviation. |
utility_id_eia |
integer |
The EIA Utility Identification number. |
utility_name_eia |
string |
The name of the utility. |
core_eia861__yearly_demand_response#
Noticed any irregularities with this table? File a report.
Annual time series of demand response programs by state, sector, and balancing authority.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA Form 861 – Annual Electric Power Industry Report
- Primary key:
balancing_authority_code_eia, customer_class, report_date, state, utility_id_eia
Additional Details#
The EIA861 demand-side management (DSM) table (split into three normalized tables in PUDL) contain data through 2012. The form changed in 2013 and split the contents of the DSM table into energy efficiency and demand response tables. Though similar, the information collected before and after 2012 are not comparable enough to combine into a singular, continuous table. We were discouraged from doing so after contacting a representative from EIA.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia861__yearly_demand_response.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia861__yearly_demand_response.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__yearly_demand_response.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia861__yearly_demand_response.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
actual_peak_demand_savings_mw |
number |
Demand reduction actually achieved by demand response activities. Measured at the time of the company’s annual system peak hour. |
balancing_authority_code_eia |
string |
EIA short code identifying a balancing authority. May include Canadian and Mexican BAs. |
customer_class |
string |
High level categorization of customer type (e.g., commercial, residential). |
customer_incentives_cost |
number |
Total cost of customer incentives in a given report year. Customer incentives are the total financial value provided to a customer for program participation, whether, for example, cash payment, or lowered tariff rates relative to non-participants, in-kind services (e.g. design work), or other benefits directly provided to the customer for their program participation. |
customers |
number |
Number of customers. |
data_maturity |
string |
Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity. |
energy_savings_mwh |
number |
The energy savings incurred in a given reporting year by participation in demand response programs. |
other_costs |
number |
Additional costs. |
potential_peak_demand_savings_mw |
number |
The total demand savings that could occur at the time of the system peak hour assuming all demand response is called. |
report_date |
date |
Date reported. |
short_form |
boolean |
Whether the reported information comes from the short form. In the case of form EIA 861, a shorter version of the form was created in 2012 to reduce respondent burden on smaller utilities and increase our processing efficiency. |
state |
string |
Two letter US state abbreviation. |
utility_id_eia |
integer |
The EIA Utility Identification number. |
utility_name_eia |
string |
The name of the utility. |
core_eia861__yearly_demand_response_water_heater#
Noticed any irregularities with this table? File a report.
Annual time series of grid-connected water heaters enrolled in demand response programs.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA Form 861 – Annual Electric Power Industry Report
- Primary key:
balancing_authority_code_eia, report_date, state, utility_id_eia
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia861__yearly_demand_response_water_heater.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia861__yearly_demand_response_water_heater.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__yearly_demand_response_water_heater.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia861__yearly_demand_response_water_heater.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
balancing_authority_code_eia |
string |
EIA short code identifying a balancing authority. May include Canadian and Mexican BAs. |
data_maturity |
string |
Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity. |
num_water_heaters |
integer |
The number of grid-enabled water heaters added to the respondent’s program this year - if the respondent has DSM program for grid-enabled water heaters (as defined by DOE’s Office of Energy Efficiency and Renewable Energy). |
report_date |
date |
Date reported. |
state |
string |
Two letter US state abbreviation. |
utility_id_eia |
integer |
The EIA Utility Identification number. |
core_eia861__yearly_demand_side_management_ee_dr#
Noticed any irregularities with this table? File a report.
Annual time series of The impact of energy efficiency and load management programs on total energy sold (MWh) and peak demand (MW) by customer class.
- Most-recent data:
2012
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA Form 861 – Annual Electric Power Industry Report
- Primary key:
This table has no primary key.
Usage Warnings#
The original data is no longer being collected or reported in this way.
Additional Details#
The raw EIA861 demand-side management (DSM) table (split into three normalized tables in PUDL) contain data through 2012. The form changed in 2013 and split the contents of the DSM table into energy efficiency and demand response tables. Though similar, the information collected before and after 2012 are not comparable enough to combine into a singular, continuous table. We were discouraged from doing so after contacting a representative from EIA.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia861__yearly_demand_side_management_ee_dr.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia861__yearly_demand_side_management_ee_dr.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__yearly_demand_side_management_ee_dr.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia861__yearly_demand_side_management_ee_dr.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
annual_indirect_program_cost |
number |
Costs that have not been included in any program category, but could be meaningfully identified with operating the company’s DSM programs (e.g., Administrative, Marketing, Monitoring & Evaluation, Company-Earned Incentives, Other). |
annual_total_cost |
number |
The sum of direct program costs, indirect program costs, and incentive payments associated with utility demand side management programs. |
customer_class |
string |
High level categorization of customer type (e.g., commercial, residential). |
data_maturity |
string |
Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity. |
energy_efficiency_annual_actual_peak_reduction_mw |
number |
The peak reduction incurred in a given reporting year by all participants in efficiency programs. |
energy_efficiency_annual_direct_cost |
number |
The sum of actual direct costs (excluding incentive payments) incurred from energy efficiency programs in a given reporting year. |
energy_efficiency_annual_effects_mwh |
number |
The change in energy use incurred in a given reporting year by all participants in energy efficiency programs. |
energy_efficiency_annual_incentive_cost |
number |
The cost of incentive payments incurred in a given reporting year from energy efficiency programs. Incentives are the total financial value provided to a customer for program participation, whether cash payment, in-kind services (e.g. design work), or other benefits directly provided customer for their program participation. |
energy_efficiency_incremental_actual_peak_reduction_mw |
number |
The peak reduction incurred in a given reporting year by new participants in existing energy efficiency programs and all participants in new energy efficiency programs. |
energy_efficiency_incremental_effects_mwh |
number |
The change in energy use incurred in a given reporting year by new participants in existing energy efficiency programs and all participants in new energy efficiency programs. |
load_management_annual_actual_peak_reduction_mw |
number |
The peak reduction incurred in a given reporting year by all participants in load management programs. |
load_management_annual_direct_cost |
number |
The sum of actual direct costs (excluding incentive payments) incurred from load management programs in a given reporting year. |
load_management_annual_effects_mwh |
number |
The change in energy use incurred in a given reporting year by all participants in load management programs. |
load_management_annual_incentive_cost |
number |
The cost of incentive payments incurred in a given reporting year from load management programs. Incentives are the total financial value provided to a customer for program participation, whether cash payment, in-kind services (e.g. design work), or other benefits directly provided customer for their program participation. |
load_management_annual_potential_peak_reduction_mw |
number |
The potential amount of peak reduction that could be incurred in a given reporting year by all participants in load management programs. |
load_management_incremental_actual_peak_reduction_mw |
number |
The peak reduction incurred in a given reporting year by new participants in existing load management programs and all participants in new load management programs. |
load_management_incremental_effects_mwh |
number |
The change in energy use incurred in a given reporting year by new participants in existing load management programs and all participants in new load management programs. |
load_management_incremental_potential_peak_reduction_mw |
number |
The potential amount of peak reduction that could be incurred in a given reporting year by new participants in existing load management programs and all participants in new load management programs. |
nerc_region |
string |
NERC region in which the plant is located |
price_responsiveness_customers |
integer |
The number of customers participating in the respondent’s incentive-based demand response programs. |
report_date |
date |
Date reported. |
state |
string |
Two letter US state abbreviation. |
time_responsiveness_customers |
integer |
The number of cusomters participating in the respondent’s time-based rate programs. |
utility_id_eia |
integer |
The EIA Utility Identification number. |
core_eia861__yearly_demand_side_management_misc#
Noticed any irregularities with this table? File a report.
Annual time series of demand-side management (DSM) program information.
- Most-recent data:
2012
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA Form 861 – Annual Electric Power Industry Report
- Primary key:
This table has no primary key.
Usage Warnings#
The original data is no longer being collected or reported in this way.
Additional Details#
Includes boolean fields about whether the energy savings estimates/calculations were independently verified and whether the utility runs time and or price responsive programs. Also contains information on whether any of the respondent’s DSM activities are reported under another company, and if so which one.
The raw EIA861 demand-side management (DSM) table (split into three normalized tables in PUDL) contain data through 2012. The form changed in 2013 and split the contents of the DSM table into energy efficiency and demand response tables. Though similar, the information collected before and after 2012 are not comparable enough to combine into a singular, continuous table. We were discouraged from doing so after contacting a representative from EIA.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia861__yearly_demand_side_management_misc.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia861__yearly_demand_side_management_misc.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__yearly_demand_side_management_misc.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia861__yearly_demand_side_management_misc.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
data_maturity |
string |
Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity. |
energy_savings_estimates_independently_verified |
boolean |
Whether savings estimates are based on a forecast or the report of one or more independent evaluators. |
energy_savings_independently_verified |
boolean |
Whether reported energy savings were verified through an independent evaluation. |
entity_type |
string |
Entity type of principal owner. |
has_time_responsive_programs |
boolean |
Whether the respondent operates any time-based rate programs (e.g., real-time pricing, critical peak pricing, variable peak pricing and time-of-use rates administered through a tariff). |
major_program_changes |
boolean |
Whether there have been any major changes to the respondent’s demand-side management programs (e.g., terminated programs, new information or financing programs, or a shift to programs with dual load building objectives and energy efficiency objectives), program tracking procedures, or reporting methods that affect the comparison of demand-side management data reported on this schedule to data from previous years. |
nerc_region |
string |
NERC region in which the plant is located |
price_responsive_programs |
boolean |
Whether the respondent operates any incentive-based demand response programs (e.g., market incentives, financial incentives, direct load control, interruptible programs, demand bidding/buyback, emergency demand response, capacity market programs, and ancillary service market programs). |
report_date |
date |
Date reported. |
reported_as_another_company |
string |
The name of the company if a respondent’s demand-side management activities are reported on Schedule 6 of another company’s form. |
short_form |
boolean |
Whether the reported information comes from the short form. In the case of form EIA 861, a shorter version of the form was created in 2012 to reduce respondent burden on smaller utilities and increase our processing efficiency. |
state |
string |
Two letter US state abbreviation. |
utility_id_eia |
integer |
The EIA Utility Identification number. |
utility_name_eia |
string |
The name of the utility. |
core_eia861__yearly_demand_side_management_sales#
Noticed any irregularities with this table? File a report.
Annual time series of electricity sales related to demand-side management (DSM).
- Most-recent data:
2012
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA Form 861 – Annual Electric Power Industry Report
- Primary key:
This table has no primary key.
Usage Warnings#
The original data is no longer being collected or reported in this way.
Additional Details#
The raw EIA861 demand-side management (DSM) table (split into three normalized tables in PUDL) contain data through 2012. The form changed in 2013 and split the contents of the DSM table into energy efficiency and demand response tables. Though similar, the information collected before and after 2012 are not comparable enough to combine into a singular, continuous table. We were discouraged from doing so after contacting a representative from EIA.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia861__yearly_demand_side_management_sales.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia861__yearly_demand_side_management_sales.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__yearly_demand_side_management_sales.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia861__yearly_demand_side_management_sales.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
data_maturity |
string |
Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity. |
nerc_region |
string |
NERC region in which the plant is located |
report_date |
date |
Date reported. |
sales_for_resale_mwh |
number |
The amount of electricity sold for resale purposes. This entry should include sales for resale to power marketers (reported separately in previous years), full and partial requirements customers, firm power customers and nonfirm customers. |
sales_to_ultimate_consumers_mwh |
number |
The amount of electricity sold to customers purchasing electricity for their own use and not for resale. |
state |
string |
Two letter US state abbreviation. |
utility_id_eia |
integer |
The EIA Utility Identification number. |
core_eia861__yearly_distributed_generation_fuel#
Noticed any irregularities with this table? File a report.
Annual time series of the energy sources used for utility or customer-owned distributed generation capacity.
- Most-recent data:
2015
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA Form 861 – Annual Electric Power Industry Report
- Primary key:
This table has no primary key.
Usage Warnings#
The original data is no longer being collected or reported in this way.
Additional Details#
From 2016 on, there’s no great match for the data found in this table. You may be able to compute an analog for fuel percents using the capacity and tech class figures in core_eia861__yearly_non_net_metering_customer_fuel_class, but no corresponding consumption or generation information is available in the new format.
The raw EIA861 distributed generation (DG) table (split into three normalized tables in PUDL) was renamed in 2016 to Non-Net Metering to prevent double counting. The data in the Non-Net Metering table (2016+) are split by sector, contain fuel cell information, and convert capacity reported in DC units to AC units.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia861__yearly_distributed_generation_fuel.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia861__yearly_distributed_generation_fuel.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__yearly_distributed_generation_fuel.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia861__yearly_distributed_generation_fuel.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
data_maturity |
string |
Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity. |
estimated_or_actual_fuel_data |
string |
Whether the reported fuel data is estimated or actual. |
fuel_class |
string |
Fuel types specific to EIA 861 distributed generation table: [‘gas’, ‘oil’, ‘other’, ‘renewable’, ‘water’, ‘wind’, ‘wood’] |
fuel_pct |
number |
Percent of fuel |
report_date |
date |
Date reported. |
state |
string |
Two letter US state abbreviation. |
utility_id_eia |
integer |
The EIA Utility Identification number. |
core_eia861__yearly_distributed_generation_misc#
Noticed any irregularities with this table? File a report.
Annual time series of the capacity and quantity of utility or customer-owned distributed generation.
- Most-recent data:
2015
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA Form 861 – Annual Electric Power Industry Report
- Primary key:
This table has no primary key.
Usage Warnings#
The original data is no longer being collected or reported in this way.
Additional Details#
From 2016 on, data similar to that found in this table can be found in core_eia861__yearly_non_net_metering_misc.
The raw EIA861 distributed generation (DG) table (split into three normalized tables in PUDL) was renamed in 2016 to Non-Net Metering to prevent double counting. The data in the Non-Net Metering table (2016+) are split by sector, contain fuel cell information, and convert capacity reported in DC units to AC units.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia861__yearly_distributed_generation_misc.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia861__yearly_distributed_generation_misc.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__yearly_distributed_generation_misc.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia861__yearly_distributed_generation_misc.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
backup_capacity_mw |
number |
The total nameplate capacity of generators that are used only for emergency backup service. |
data_maturity |
string |
Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity. |
distributed_generation_owned_capacity_mw |
number |
Amount of distributed generation capacity owned by the respondent. |
estimated_or_actual_capacity_data |
string |
Whether the reported capacity data is estimated or actual. |
generators_num_less_1_mw |
integer |
Total number of generators less than 1 MW. |
generators_number |
integer |
Total number of generators |
report_date |
date |
Date reported. |
state |
string |
Two letter US state abbreviation. |
total_capacity_less_1_mw |
number |
The total amount of capacity from generators with less than 1 MW of nameplate capacity. |
utility_id_eia |
integer |
The EIA Utility Identification number. |
utility_name_eia |
string |
The name of the utility. |
core_eia861__yearly_distributed_generation_tech#
Noticed any irregularities with this table? File a report.
Annual time series of the technology used for utility or customer-owned distributed generation.
- Most-recent data:
2015
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA Form 861 – Annual Electric Power Industry Report
- Primary key:
This table has no primary key.
Usage Warnings#
The original data is no longer being collected or reported in this way.
Additional Details#
From 2016 on, data similar to that found in this table can be found in core_eia861__yearly_non_net_metering_customer_fuel_class.
The raw EIA861 distributed generation (DG) table (split into three normalized tables in PUDL) was renamed in 2016 to Non-Net Metering to prevent double counting. The data in the Non-Net Metering table (2016+) are split by sector, contain fuel cell information, and convert capacity reported in DC units to AC units.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia861__yearly_distributed_generation_tech.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia861__yearly_distributed_generation_tech.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__yearly_distributed_generation_tech.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia861__yearly_distributed_generation_tech.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
capacity_mw |
number |
Total installed (nameplate) capacity, in megawatts. |
data_maturity |
string |
Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity. |
estimated_or_actual_tech_data |
string |
Whether the reported technology data is estimated or actual. |
report_date |
date |
Date reported. |
state |
string |
Two letter US state abbreviation. |
tech_class |
string |
Type of technology specific to EIA 861 distributed generation and net generation tables: [‘backup’, ‘chp_cogen’, ‘combustion_turbine’, ‘fuel_cell’, ‘hydro’, ‘internal_combustion’, ‘other’, ‘pv’, ‘steam’, ‘storage_pv’, ‘storage_nonpv’, ‘all_storage’, ‘total’, ‘virtual_pv’, ‘virtual_pv_under_1mw’, ‘virtual_pv_over_1mw’, ‘wind’]. |
utility_id_eia |
integer |
The EIA Utility Identification number. |
core_eia861__yearly_distribution_systems#
Noticed any irregularities with this table? File a report.
Annual time series of distribution circuits and circuits with voltage optimization by state.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA Form 861 – Annual Electric Power Industry Report
- Primary key:
This table has no primary key.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia861__yearly_distribution_systems.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia861__yearly_distribution_systems.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__yearly_distribution_systems.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia861__yearly_distribution_systems.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
circuits_with_voltage_optimization |
integer |
Number of distribution circuits that employ voltage/VAR optimization (VVO). |
data_maturity |
string |
Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity. |
distribution_circuits |
integer |
Total number of distribution circuits. |
report_date |
date |
Date reported. |
short_form |
boolean |
Whether the reported information comes from the short form. In the case of form EIA 861, a shorter version of the form was created in 2012 to reduce respondent burden on smaller utilities and increase our processing efficiency. |
state |
string |
Two letter US state abbreviation. |
utility_id_eia |
integer |
The EIA Utility Identification number. |
utility_name_eia |
string |
The name of the utility. |
core_eia861__yearly_dynamic_pricing#
Noticed any irregularities with this table? File a report.
Annual time series of enrollment in dynamic pricing programs by state, sector, and balancing authority.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA Form 861 – Annual Electric Power Industry Report
- Primary key:
This table has no primary key.
Additional Details#
Respondents check if one or more customers are enrolled in time-of-use pricing, real time pricing, variable peak pricing, critical peak pricing, and critical peak rebates.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia861__yearly_dynamic_pricing.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia861__yearly_dynamic_pricing.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__yearly_dynamic_pricing.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia861__yearly_dynamic_pricing.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
balancing_authority_code_eia |
string |
EIA short code identifying a balancing authority. May include Canadian and Mexican BAs. |
critical_peak_pricing |
boolean |
Whether customers are participating in critical peak pricing, a program in which rate and/or price structure is designed to encourage reduced consumption during periods of high wholesale market prices or system contingencies, by imposing a pre-specified high rate or price for a limited number of days or hours. |
critical_peak_rebate |
boolean |
Whether customers are participating in critical peak rebates, a program in which rate and/or price structure is designed to encourage reduced consumption during periods of high wholesale market prices or system contingencies, by providing a rebate to the customer on a limited number of days and for a limited number of hours, at the request of the energy provider. |
customer_class |
string |
High level categorization of customer type (e.g., commercial, residential). |
customers |
number |
Number of customers. |
data_maturity |
string |
Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity. |
real_time_pricing |
boolean |
Whether the respondent has customers participating in a real time pricing (RTP) program. RTP is a program of rate and price structure in which the retail price for electricity typically fluctuates hourly or more often, to reflect changes in the wholesale price of electricity on either a day- ahead or hour-ahead basis. |
report_date |
date |
Date reported. |
short_form |
boolean |
Whether the reported information comes from the short form. In the case of form EIA 861, a shorter version of the form was created in 2012 to reduce respondent burden on smaller utilities and increase our processing efficiency. |
state |
string |
Two letter US state abbreviation. |
time_of_use_pricing |
boolean |
Whether the respondent has customers participating in a time-of-use pricing programs (TOU). TOU is a program in which customers pay different prices at different times of the day. On-peak prices are higher and off-peak prices are lower than a “standard” rate. Price schedule is fixed and predefined, based on season, day of week, and time of day. |
utility_id_eia |
integer |
The EIA Utility Identification number. |
utility_name_eia |
string |
The name of the utility. |
variable_peak_pricing |
boolean |
Whether the respondent has customers participating in a variable peak pricing program (VPP). VPP is a program in which a form of TOU pricing allows customers to purchase their generation supply at prices set on a daily basis with varying on-peak and constant off-peak rates. Under the VPP program, the on-peak price for each weekday becomes available the previous day (typically late afternoon) and the customer is billed for actual consumption during the billing cycle at these prices. |
core_eia861__yearly_energy_efficiency#
Noticed any irregularities with this table? File a report.
Annual time series of incremental energy savings, peak demand savings, weighted average life cycle, and associated costs for the reporting year and life cycle of energy efficiency programs.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA Form 861 – Annual Electric Power Industry Report
- Primary key:
This table has no primary key.
Additional Details#
The EIA861 demand-side management (DSM) table (split into three normalized tables in PUDL) contain data through 2012. The form changed in 2013 and split the contents of the DSM table into energy efficiency and demand response tables. Though similar, the information collected before and after 2012 are not comparable enough to combine into a singular, continuous table. We were discouraged from doing so after contacting a representative from EIA.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia861__yearly_energy_efficiency.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia861__yearly_energy_efficiency.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__yearly_energy_efficiency.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia861__yearly_energy_efficiency.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
balancing_authority_code_eia |
string |
EIA short code identifying a balancing authority. May include Canadian and Mexican BAs. |
customer_class |
string |
High level categorization of customer type (e.g., commercial, residential). |
customer_incentives_incremental_cost |
number |
The cost of customer incentives resulting from new participants in existing energy efficiency programs and all participants in new energy efficiency programs. Customer incentives are the total financial value provided to a customer for program participation, whether, for example, cash payment, or lowered tariff rates relative to non-participants, in-kind services (e.g. design work), or other benefits directly provided to the customer for their program participation. |
customer_incentives_incremental_life_cycle_cost |
number |
All anticipated costs of the customer incentives including reporting year incremental costs and all future costs. Customer incentives are the total financial value provided to a customer for program participation, whether, for example, cash payment, or lowered tariff rates relative to non-participants, in-kind services (e.g. design work), or other benefits directly provided to the customer for their program participation. |
customer_other_costs_incremental_life_cycle_cost |
number |
All anticipated costs other than customer incentives. Includes reporting year incremental costs and all future costs. |
data_maturity |
string |
Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity. |
incremental_energy_savings_mwh |
number |
Energy savings in the given report year resulting from new participants in existing demand response programs and all participants in new demand response programs. |
incremental_life_cycle_energy_savings_mwh |
number |
The estimated total changes in energy use for incremental programs and participants over the life of the programs. DSM programs have a useful life, and the net effects of these programs will diminish over time. Considers the useful life of energy efficiency technology by accounting for building demolition, equipment degradation, and program attrition. |
incremental_life_cycle_peak_reduction_mw |
number |
The estimated total changes in peak load for incremental programs and participants over the life of the programs. DSM programs have a useful life, and the net effects of these programs will diminish over time. Considers the useful life of energy efficiency technology by accounting for building demolition, equipment degradation, and program attrition. |
incremental_peak_reduction_mw |
number |
The peak reduction incurred in a given reporting year by new participants in existing energy efficiency programs and all participants in new energy efficiency programs. |
other_costs_incremental_cost |
number |
Costs resulting from new participants in existing energy efficiency programs and all participants in new energy efficiency programs that aren’t directly associated with customer incentives. |
report_date |
date |
Date reported. |
short_form |
boolean |
Whether the reported information comes from the short form. In the case of form EIA 861, a shorter version of the form was created in 2012 to reduce respondent burden on smaller utilities and increase our processing efficiency. |
state |
string |
Two letter US state abbreviation. |
utility_id_eia |
integer |
The EIA Utility Identification number. |
utility_name_eia |
string |
The name of the utility. |
weighted_average_life_years |
number |
The weighted average life of the respondent’s portfolio of energy efficiency programs. |
core_eia861__yearly_green_pricing#
Noticed any irregularities with this table? File a report.
Annual time series of green pricing program revenue, sales, and customer count by sector and state.
- Most-recent data:
2012
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA Form 861 – Annual Electric Power Industry Report
- Primary key:
This table has no primary key.
Usage Warnings#
The original data is no longer being collected or reported in this way.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia861__yearly_green_pricing.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia861__yearly_green_pricing.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__yearly_green_pricing.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia861__yearly_green_pricing.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
customer_class |
string |
High level categorization of customer type (e.g., commercial, residential). |
customers |
number |
Number of customers. |
data_maturity |
string |
Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity. |
green_pricing_revenue |
number |
The money derived from premium green pricing rate of the respondent’sprogram. |
rec_revenue |
number |
Amount of revenue collected from Renewable Energy Certificates (RECs). |
rec_sales_mwh |
number |
Amount of sales collected from Renewable Energy Certificates (RECs). |
report_date |
date |
Date reported. |
sales_mwh |
number |
Quantity of electricity sold in MWh. |
state |
string |
Two letter US state abbreviation. |
utility_id_eia |
integer |
The EIA Utility Identification number. |
utility_name_eia |
string |
The name of the utility. |
core_eia861__yearly_mergers#
Noticed any irregularities with this table? File a report.
Annual time series of utility mergers and acquisitions.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA Form 861 – Annual Electric Power Industry Report
- Primary key:
This table has no primary key.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia861__yearly_mergers.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia861__yearly_mergers.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__yearly_mergers.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia861__yearly_mergers.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
data_maturity |
string |
Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity. |
entity_type |
string |
Entity type of principal owner. |
merge_address |
string |
Address of new parent company. |
merge_city |
string |
City of new parent company. |
merge_company |
string |
Name of the company merged with or acquired. |
merge_date |
date |
Date of merger or acquisition. |
merge_state |
string |
Two letter US state abbreviations and three letter ISO-3166-1 country codes for international mines. |
new_parent |
string |
Name of the new parent company post merger. |
report_date |
date |
Date reported. |
state |
string |
Two letter US state abbreviation. |
utility_id_eia |
integer |
The EIA Utility Identification number. |
utility_name_eia |
string |
The name of the utility. |
zip_code |
string |
Five digit US Zip Code. |
zip_code_4 |
string |
Four digit US Zip Code suffix. |
core_eia861__yearly_net_metering_customer_fuel_class#
Noticed any irregularities with this table? File a report.
Annual time series of net metering by customer and fuel class.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA Form 861 – Annual Electric Power Industry Report
- Primary key:
This table has no primary key.
Usage Warnings#
Some years use a slightly different data definition.
Additional Details#
The amount of energy sold to back to the grid. From 2007 - 2009 the data are reported as a lump sum of total energy dispatched by sector. After 2009, the data are broken down by sector and technology type.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia861__yearly_net_metering_customer_fuel_class.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia861__yearly_net_metering_customer_fuel_class.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__yearly_net_metering_customer_fuel_class.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia861__yearly_net_metering_customer_fuel_class.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
balancing_authority_code_eia |
string |
EIA short code identifying a balancing authority. May include Canadian and Mexican BAs. |
capacity_mw |
number |
Total installed (nameplate) capacity, in megawatts. |
customer_class |
string |
High level categorization of customer type (e.g., commercial, residential). |
customers |
number |
Number of customers. |
data_maturity |
string |
Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity. |
energy_capacity_mwh |
number |
The total amount of energy which the system can supply power before recharging is necessary, in megawatt-hours. |
report_date |
date |
Date reported. |
short_form |
boolean |
Whether the reported information comes from the short form. In the case of form EIA 861, a shorter version of the form was created in 2012 to reduce respondent burden on smaller utilities and increase our processing efficiency. |
sold_to_utility_mwh |
number |
The amount of electric energy sold back to the utility through the net metering application. |
state |
string |
Two letter US state abbreviation. |
tech_class |
string |
Type of technology specific to EIA 861 distributed generation and net generation tables: [‘backup’, ‘chp_cogen’, ‘combustion_turbine’, ‘fuel_cell’, ‘hydro’, ‘internal_combustion’, ‘other’, ‘pv’, ‘steam’, ‘storage_pv’, ‘storage_nonpv’, ‘all_storage’, ‘total’, ‘virtual_pv’, ‘virtual_pv_under_1mw’, ‘virtual_pv_over_1mw’, ‘wind’]. |
utility_id_eia |
integer |
The EIA Utility Identification number. |
utility_name_eia |
string |
The name of the utility. |
core_eia861__yearly_net_metering_misc#
Noticed any irregularities with this table? File a report.
Annual time series of PV current flow type for net metered capacity.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA Form 861 – Annual Electric Power Industry Report
- Primary key:
This table has no primary key.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia861__yearly_net_metering_misc.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia861__yearly_net_metering_misc.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__yearly_net_metering_misc.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia861__yearly_net_metering_misc.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
balancing_authority_code_eia |
string |
EIA short code identifying a balancing authority. May include Canadian and Mexican BAs. |
data_maturity |
string |
Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity. |
pv_current_flow_type |
string |
Current flow type for photovoltaics: AC or DC |
report_date |
date |
Date reported. |
state |
string |
Two letter US state abbreviation. |
utility_id_eia |
integer |
The EIA Utility Identification number. |
core_eia861__yearly_non_net_metering_customer_fuel_class#
Noticed any irregularities with this table? File a report.
Annual time series of non-net metered distributed generation by sector and technology type.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA Form 861 – Annual Electric Power Industry Report
- Primary key:
This table has no primary key.
Additional Details#
This table’s data starts in 2016. For 2015 and earlier, data similar to that found in this table can be found in core_eia861__yearly_distributed_generation_tech.
The raw EIA861 distributed generation (DG) table (split into three normalized tables in PUDL) was renamed in 2016 to Non-Net Metering to prevent double counting. The data in the Non-Net Metering table (2016+) are split by sector, contain fuel cell information, and convert capacity reported in DC units to AC units.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia861__yearly_non_net_metering_customer_fuel_class.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia861__yearly_non_net_metering_customer_fuel_class.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__yearly_non_net_metering_customer_fuel_class.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia861__yearly_non_net_metering_customer_fuel_class.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
balancing_authority_code_eia |
string |
EIA short code identifying a balancing authority. May include Canadian and Mexican BAs. |
capacity_mw |
number |
Total installed (nameplate) capacity, in megawatts. |
customer_class |
string |
High level categorization of customer type (e.g., commercial, residential). |
data_maturity |
string |
Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity. |
energy_capacity_mwh |
number |
The total amount of energy which the system can supply power before recharging is necessary, in megawatt-hours. |
report_date |
date |
Date reported. |
state |
string |
Two letter US state abbreviation. |
tech_class |
string |
Type of technology specific to EIA 861 distributed generation and net generation tables: [‘backup’, ‘chp_cogen’, ‘combustion_turbine’, ‘fuel_cell’, ‘hydro’, ‘internal_combustion’, ‘other’, ‘pv’, ‘steam’, ‘storage_pv’, ‘storage_nonpv’, ‘all_storage’, ‘total’, ‘virtual_pv’, ‘virtual_pv_under_1mw’, ‘virtual_pv_over_1mw’, ‘wind’]. |
utility_id_eia |
integer |
The EIA Utility Identification number. |
utility_name_eia |
string |
The name of the utility. |
core_eia861__yearly_non_net_metering_misc#
Noticed any irregularities with this table? File a report.
Annual time series of non-net metered distributed generation generators, pv current flow type, backup capacity and utility owned capacity.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA Form 861 – Annual Electric Power Industry Report
- Primary key:
This table has no primary key.
Additional Details#
This table’s data starts in 2016. For 2015 and earlier, data similar to that found in this table can be found in core_eia861__yearly_distributed_generation_misc.
The raw EIA861 distributed generation (DG) table (split into three normalized tables in PUDL) was renamed in 2016 to Non-Net Metering to prevent double counting. The data in the Non-Net Metering table (2016+) are split by sector, contain fuel cell information, and convert capacity reported in DC units to AC units.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia861__yearly_non_net_metering_misc.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia861__yearly_non_net_metering_misc.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__yearly_non_net_metering_misc.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia861__yearly_non_net_metering_misc.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
backup_capacity_mw |
number |
The total nameplate capacity of generators that are used only for emergency backup service. |
balancing_authority_code_eia |
string |
EIA short code identifying a balancing authority. May include Canadian and Mexican BAs. |
data_maturity |
string |
Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity. |
generators_number |
integer |
Total number of generators |
pv_current_flow_type |
string |
Current flow type for photovoltaics: AC or DC |
report_date |
date |
Date reported. |
state |
string |
Two letter US state abbreviation. |
utility_id_eia |
integer |
The EIA Utility Identification number. |
utility_owned_capacity_mw |
number |
Total non-net-metered capacity owned by the respondent. |
core_eia861__yearly_operational_data_misc#
Noticed any irregularities with this table? File a report.
Annual time series of megawatt hours (MWH) for the sources and disposition of a utility’s electricity.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA Form 861 – Annual Electric Power Industry Report (Schedule 2B)
- Primary key:
utility_id_eia, nerc_region, report_date
Usage Warnings#
Some values have been redacted.
Contains rows where missing values were imputed.
Additional Details#
Respondents are required to report this information to the EIA, but are not required to disclose utility-level data to the public. When a respondent chooses to keep its utility-level data proprietary, it files using EIA utility id 88888. For more details, see EIA-861 Notable Irregularities.
Rows where data_observed is False were labeled as imputed in the raw EIA data.
The EIA documentation does not specify what imputation applies to operational data,
only net metering and non net-metering distributed data. Less than 1% of rows are
labeled as imputed, all of which occur in 2004 or 2005. Imputed rows make up no
more than 25% of the rows for any utility.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia861__yearly_operational_data_misc.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia861__yearly_operational_data_misc.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__yearly_operational_data_misc.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia861__yearly_operational_data_misc.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
consumed_by_facility_mwh |
number |
The amount of electricity used by the facility. |
consumed_by_respondent_without_charge_mwh |
number |
The amount of electricity used by the electric utility in its electric and other departments without charge. |
data_maturity |
string |
Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity. |
data_observed |
boolean |
Is the value observed (True) or imputed (False). |
entity_type |
string |
Entity type of principal owner. |
exchange_energy_delivered_mwh |
number |
The amount of exchange energy delivered. Does not include power delivered as part of a tolling arrangement. |
exchange_energy_received_mwh |
number |
The amount of exchange energy received. Does not include power received through tolling arrangements. |
furnished_without_charge_mwh |
number |
The amount of electricity furnished by the electric utility without charge, such as to a municipality under a franchise agreement or for street and highway lighting. |
nerc_region |
string |
NERC region in which the plant is located |
net_generation_mwh |
number |
Net electricity generation for the specified period in megawatt-hours (MWh). |
net_power_exchanged_mwh |
number |
The net amount of energy exchanged. Net exchange is the difference between the amount of exchange received and the amount of exchange delivered. This entry should not include wholesale energy purchased from or sold to regulated companies or unregulated companies for other systems. |
net_wheeled_power_mwh |
number |
The difference between the amount of energy entering the respondent’s system (wheeled received) for transmission through the respondent’s system and the amount of energy leaving the respondent’s system (wheeled delivered). Wheeled net represents the energy losses on the respondent’s system associated with the wheeling of energy for other systems. |
report_date |
date |
Date reported. |
retail_sales_mwh |
number |
MWh of sales to end-use customers in areas where the customer has been given the legal right to select a power supplier other than the traditional, vertically integrated electric utility. |
sales_for_resale_mwh |
number |
The amount of electricity sold for resale purposes. This entry should include sales for resale to power marketers (reported separately in previous years), full and partial requirements customers, firm power customers and nonfirm customers. |
short_form |
boolean |
Whether the reported information comes from the short form. In the case of form EIA 861, a shorter version of the form was created in 2012 to reduce respondent burden on smaller utilities and increase our processing efficiency. |
state |
string |
Two letter US state abbreviation. |
summer_peak_demand_mw |
number |
The maximum hourly summer load (for the months of June through September) based on net energy for the system during the reporting year. Net energy for the system is the sum of energy an electric utility needs to satisfy their service area and includes full and partial wholesale requirements customers, and the losses experienced in delivery. The maximum hourly load is determined by the interval in which the 60-minute integrated demand is the greatest. |
total_disposition_mwh |
number |
Sum of all disposition of electricity listed. Includes sales to ultimate customers, sales for resale, energy furnished without charge, energy consumed by respondent without charge and total energy losses. |
total_energy_losses_mwh |
number |
The total amount of electricity lost from transmission, distribution, and/or unaccounted for. Should be expressed as a positive number. |
total_sources_mwh |
number |
Sum of all sources of electricity listed. Includes net generation, purchases from electricity suppliers, net exchanges (received - delivered), net wheeled (received - delivered), transmission by others, and losses. |
transmission_by_other_losses_mwh |
number |
The amount of energy losses associated with the wheeling of electricity provided to the respondent’s system by other utilities. Transmission by others, losses should always be a negative value. |
utility_id_eia |
integer |
The EIA Utility Identification number. |
utility_name_eia |
string |
The name of the utility. |
wheeled_power_delivered_mwh |
number |
The total amount of energy leaving the respondent’s system that was transmitted through the respondent’s system for delivery to other systems. If wheeling delivered is not precisely known, the value is an estimate based on the respondent’s system’s known percentage of losses for wheeling transactions. |
wheeled_power_received_mwh |
number |
The total amount of energy entering the respondent’s system from other systems for transmission through the respondent’s system (wheeling) for delivery to other systems. Does not include energy purchased or exchanged for consumption within the respondent’s system, which was wheeled to the respondent by others. |
wholesale_power_purchases_mwh |
number |
Purchases from electricity suppliers. |
winter_peak_demand_mw |
number |
The maximum hourly winter load (for the months of January through March) based on net energy for the system during the reporting year. Net energy for the system is the sum of energy an electric utility needs to satisfy their service area and includes full and partial wholesale requirements customers, and the losses experienced in delivery. The maximum hourly load is determined by the interval in which the 60-minute integrated demand is the greatest. |
core_eia861__yearly_operational_data_revenue#
Noticed any irregularities with this table? File a report.
Annual time series of utility revenue by type of electric operating revenue.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA Form 861 – Annual Electric Power Industry Report (Schedule 2C)
- Primary key:
utility_id_eia, nerc_region, report_date, revenue_class
Usage Warnings#
Some values have been redacted.
Additional Details#
A utility’s revenue by type of electric operating revenue. Includes electric operating revenue from sales to ultimate customers, revenue from unbundled (delivery) customers, revenue from sales for resale, electric credits/other adjustments, revenue from transmission, other electric operating revenue, and total electric operating revenue.
Respondents are required to report this information to the EIA, but are not required to disclose utility-level data to the public. When a respondent chooses to keep its utility-level data proprietary, it files using EIA utility id 88888. For more details, see EIA-861 Notable Irregularities.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia861__yearly_operational_data_revenue.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia861__yearly_operational_data_revenue.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__yearly_operational_data_revenue.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia861__yearly_operational_data_revenue.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
data_maturity |
string |
Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity. |
nerc_region |
string |
NERC region in which the plant is located |
report_date |
date |
Date reported. |
revenue |
number |
Amount of revenue. |
revenue_class |
string |
Source of revenue (e.g., retail sales, transmission). |
state |
string |
Two letter US state abbreviation. |
utility_id_eia |
integer |
The EIA Utility Identification number. |
core_eia861__yearly_reliability#
Noticed any irregularities with this table? File a report.
Annual time series of electricity system reliability and outage impacts.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA Form 861 – Annual Electric Power Industry Report (Schedules 3B and 3C)
- Primary key:
utility_id_eia, state, report_date, standard
Note:
standardis included because while respondents are asked to only fill out one of parts B or C, sometimes they fill out both.
Additional Details#
Contains information on non-momentary electrical interruptions. Includes the system average interruption duration index (SAIDI), system average interruption frequency index (SAIFI), and customer average interruption duration index (CAIDI) aka SAIDI/SAIFI with and without major event days and loss of service. Includes the standard (IEEE/other) and other relevant information.
SAIDI is measured in minutes, and represents the sum over the year of all customers interrupted times the number of minutes they experienced an interruption, divided by total number of customers.
SAIFI is measured in customers, and represents the sum over the year of the total number of customers that experienced an interruption, divided by the total number of customers.
The IEEE standards which can be used to calculate SAIDI and SAIFI include IEEE 1366-2003 and IEEE 1366-2012. These standards define momentary interruptions as having a duration of five minutes or less. If one of these IEEE standards is used, respondents fill out part B and are required to specify:
SAIDI major event days minus loss of supply
SAIFI major event days minus loss of supply
SAIDI and SAIFI major event days minus loss of supply excludes events where the reliability event resulted
from an event on the distribution system, not from the high-voltage system. The voltage that distinguishes
the distribution system from the supply system is given in highest_distribution_voltage_kv.
If a method other than these IEEE standards is used for calculating SAIDI and SAIFI indexes, respondents fill out part C and are required to specify:
whether inactive accounts are included
how they define a momentary interruption (less than 1 minute, less than or equal to 5 minutes, or other)
In this table, column standard is “ieee_standard” for respondents who have filled out part B, and
“other_standard” for respondents who have filled out part C.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia861__yearly_reliability.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia861__yearly_reliability.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__yearly_reliability.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia861__yearly_reliability.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
caidi_w_major_event_days_minus_loss_of_service_minutes |
number |
Average number of minutes per interruption (SAIDI/SAIFI) including major event days and excluding reliability events caused by a loss of supply. |
caidi_w_major_event_days_minutes |
number |
Average number of minutes per interruption (SAIDI/SAIFI) including major event days. |
caidi_wo_major_event_days_minutes |
number |
Average number of minutes per interruption (SAIDI/SAIFI) excluding major event days. |
customers |
number |
Number of customers. |
data_maturity |
string |
Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity. |
entity_type |
string |
Entity type of principal owner. |
highest_distribution_voltage_kv |
number |
The highest voltage that’s part of the distribution system. |
inactive_accounts_included |
boolean |
Whether the respondent includes inactive accounts in its definition of customers used to determine SAIDI and SAIFI. |
momentary_interruption_definition |
string |
How the respondent defines momentary service interruptions: less than 1 min, equal to or less than 5 min, or some other way. |
outages_recorded_automatically |
boolean |
Whether the information about customer outages is recorded automatically. |
report_date |
date |
Date reported. |
saidi_w_major_event_days_minus_loss_of_service_minutes |
number |
Cumulative duration (minutes) of interruption for the average customer during the report year including major event days and excluding reliability events caused by a loss of supply. |
saidi_w_major_event_days_minutes |
number |
Cumulative duration (minutes) of interruption for the average customer during the report year including major event days. |
saidi_wo_major_event_days_minutes |
number |
Cumulative duration (minutes) of interruption for the average customer during the report year excluding major event days. |
saifi_w_major_event_days_customers |
number |
Average number of times a customer experienced a sustained interruption (over 5 minutes) during the report year including major event days. |
saifi_w_major_event_days_minus_loss_of_service_customers |
number |
Average number of times a customer experienced a sustained interruption (over 5 minutes) during the report year including major event days and excluding reliability events caused by a loss of supply. |
saifi_wo_major_event_days_customers |
number |
Average number of times a customer experienced a sustained interruption (over 5 minutes) during the report year excluding major event days. |
short_form |
boolean |
Whether the reported information comes from the short form. In the case of form EIA 861, a shorter version of the form was created in 2012 to reduce respondent burden on smaller utilities and increase our processing efficiency. |
standard |
string |
Whether the respondent calculates SAIDI/SAIFI, and major event days according to the IEEE or an Other standard. |
state |
string |
Two letter US state abbreviation. |
utility_id_eia |
integer |
The EIA Utility Identification number. |
utility_name_eia |
string |
The name of the utility. |
core_eia861__yearly_sales#
Noticed any irregularities with this table? File a report.
Annual time series of electricity sales to ultimate customers by utility, balancing authority, state, and customer class.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA Form 861 – Annual Electric Power Industry Report
- Primary key:
utility_id_eia, state, report_date, balancing_authority_code_eia, customer_class, business_model, service_type
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia861__yearly_sales.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia861__yearly_sales.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__yearly_sales.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia861__yearly_sales.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
balancing_authority_code_eia |
string |
EIA short code identifying a balancing authority. May include Canadian and Mexican BAs. |
business_model |
string |
Business model. |
customer_class |
string |
High level categorization of customer type (e.g., commercial, residential). |
customers |
number |
Number of customers. |
data_maturity |
string |
Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity. |
data_observed |
boolean |
Is the value observed (True) or imputed (False). |
entity_type |
string |
Entity type of principal owner. |
report_date |
date |
Date reported. |
sales_mwh |
number |
Quantity of electricity sold in MWh. |
sales_revenue |
number |
Revenue from electricity sold. |
service_type |
string |
The type of service the respondent provides to a given customer class.Bundled: both energy and delivery; energy: just the energy consumed; delivery: just the billing and energy delivery services. |
short_form |
boolean |
Whether the reported information comes from the short form. In the case of form EIA 861, a shorter version of the form was created in 2012 to reduce respondent burden on smaller utilities and increase our processing efficiency. |
state |
string |
Two letter US state abbreviation. |
utility_id_eia |
integer |
The EIA Utility Identification number. |
utility_name_eia |
string |
The name of the utility. |
core_eia861__yearly_service_territory#
Noticed any irregularities with this table? File a report.
Annual time series of counties in utility service territories.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA Form 861 – Annual Electric Power Industry Report
- Primary key:
report_date, utility_id_eia, county_id_fips
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia861__yearly_service_territory.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia861__yearly_service_territory.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__yearly_service_territory.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia861__yearly_service_territory.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
county |
string |
County name. |
county_id_fips |
string |
County ID from the Federal Information Processing Standard Publication 6-4. |
data_maturity |
string |
Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity. |
report_date |
date |
Date reported. |
short_form |
boolean |
Whether the reported information comes from the short form. In the case of form EIA 861, a shorter version of the form was created in 2012 to reduce respondent burden on smaller utilities and increase our processing efficiency. |
state |
string |
Two letter US state abbreviation. |
state_id_fips |
string |
Two digit state FIPS code. |
utility_id_eia |
integer |
The EIA Utility Identification number. |
utility_name_eia |
string |
The name of the utility. |
core_eia861__yearly_short_form#
Noticed any irregularities with this table? File a report.
Annual time series of data from the short form (EIA-861S).
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA Form 861 – Annual Electric Power Industry Report
- Primary key:
utility_id_eia, state, report_date, balancing_authority_code_eia
Usage Warnings#
Some years are missing from the data record.
Additional Details#
The data started being reported in 2012. However, the 2019 data is not available. They are expected to submit the completed Form EIA-861S to EIA by April 30th, following the end of the prior calendar year. Utilities report on Form EIA-861S if they:
Report less than 200,000 megawatthours on the last previous Form EIA-861.
Provide only bundled service (generation and distribution).
Are not needed to ensure acceptable quality of statistical estimates.
Are not part of the aggregate TVA or WPPI.
Do not report on Form EIA-861M.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia861__yearly_short_form.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia861__yearly_short_form.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__yearly_short_form.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia861__yearly_short_form.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
balancing_authority_code_eia |
string |
EIA short code identifying a balancing authority. May include Canadian and Mexican BAs. |
customers |
number |
Number of customers. |
data_maturity |
string |
Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity. |
entity_type |
string |
Entity type of principal owner. |
has_demand_side_management |
boolean |
Whether there were strategies or measures used to control electricity demand by customers |
has_green_pricing |
boolean |
Whether a green pricing program was associated with this utility during the reporting year. |
has_net_metering |
boolean |
Whether the plant has a net metering agreement in effect during the reporting year. (Only displayed for facilities that report the sun or wind as an energy source). This field was only reported up until 2015 |
has_time_responsive_programs |
boolean |
Whether the respondent operates any time-based rate programs (e.g., real-time pricing, critical peak pricing, variable peak pricing and time-of-use rates administered through a tariff). |
report_date |
date |
Date reported. |
sales_mwh |
number |
Quantity of electricity sold in MWh. |
sales_revenue |
number |
Revenue from electricity sold. |
state |
string |
Two letter US state abbreviation. |
utility_id_eia |
integer |
The EIA Utility Identification number. |
utility_name_eia |
string |
The name of the utility. |
core_eia861__yearly_utility_data_misc#
Noticed any irregularities with this table? File a report.
Annual time series of utility business activities.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA Form 861 – Annual Electric Power Industry Report
- Primary key:
This table has no primary key. The primary key would have been: utility_id_eia, state, report_date, and nerc_region, but there are nulls in the state column across several years of reporting.
Additional Details#
This includes whether they operate alternative fuel vehicles, whether they provide transmission, distribution, or generation services (bundled or unbundled), and whether they engage in wholesale and/or retail markets.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia861__yearly_utility_data_misc.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia861__yearly_utility_data_misc.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__yearly_utility_data_misc.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia861__yearly_utility_data_misc.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
alternative_fuel_vehicle_2_activity |
boolean |
Whether the utility plants to operate alternative-fueled vehicles this coming year. |
alternative_fuel_vehicle_activity |
boolean |
Whether the utility operates alternative-fueled vehicles during the year. |
bundled_activity |
boolean |
Whether a utility engaged in combined utility services (electricity plus other services such as gas, water, etc. in addition to electric services) during the year. |
buying_distribution_activity |
boolean |
Whether a utility bought any distribution on other electrical systems during the year. |
buying_transmission_activity |
boolean |
Whether a utility bought any transmission services on other electrical systems during the year. |
data_maturity |
string |
Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity. |
distribution_activity |
boolean |
Whether a utility engaged in any distribution using owned/leased electrical wires during the year. |
entity_type |
string |
Entity type of principal owner. |
generation_activity |
boolean |
Whether a utility utilized generation from company owned plant during the year. |
nerc_region |
string |
NERC region in which the plant is located |
operates_generating_plant |
boolean |
Whether the respondent operated at least one generating plant during the reporting period. |
report_date |
date |
Date reported. |
retail_marketing_activity |
boolean |
Whether a utility engaged in retail power marketing during the year. |
short_form |
boolean |
Whether the reported information comes from the short form. In the case of form EIA 861, a shorter version of the form was created in 2012 to reduce respondent burden on smaller utilities and increase our processing efficiency. |
state |
string |
Two letter US state abbreviation. |
transmission_activity |
boolean |
Whether a utility engaged in any transmission activities during the year. |
utility_id_eia |
integer |
The EIA Utility Identification number. |
utility_name_eia |
string |
The name of the utility. |
wholesale_marketing_activity |
boolean |
Whether a utility engages in wholesale power marketing during the year. |
core_eia861__yearly_utility_data_nerc#
Noticed any irregularities with this table? File a report.
Annual time series of the NERC regions that utilities operate in.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA Form 861 – Annual Electric Power Industry Report
- Primary key:
This table has no primary key. The primary key would have been: utility_id_eia, state, report_date, nerc_region, and nerc_regions_of_operation, but there are nulls in the state column across several years of reporting.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia861__yearly_utility_data_nerc.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia861__yearly_utility_data_nerc.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__yearly_utility_data_nerc.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia861__yearly_utility_data_nerc.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
data_maturity |
string |
Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity. |
nerc_region |
string |
NERC region in which the plant is located |
nerc_regions_of_operation |
string |
All the regional entities within the North American Electric Reliability Corporation (NERC) in which the respodent conducts operations. |
report_date |
date |
Date reported. |
state |
string |
Two letter US state abbreviation. |
utility_id_eia |
integer |
The EIA Utility Identification number. |
core_eia861__yearly_utility_data_rto#
Noticed any irregularities with this table? File a report.
Annual time series of the RTOs that utilities operate in.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA Form 861 – Annual Electric Power Industry Report
- Primary key:
This table has no primary key. The primary key would have been: utility_id_eia, state, report_date, nerc_region, and rtos_of_operation, but there are nulls in the state column across several years of reporting.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia861__yearly_utility_data_rto.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia861__yearly_utility_data_rto.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__yearly_utility_data_rto.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia861__yearly_utility_data_rto.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
data_maturity |
string |
Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity. |
nerc_region |
string |
NERC region in which the plant is located |
report_date |
date |
Date reported. |
rtos_of_operation |
string |
The ISOs/RTOs, in which the respondent conducts operations. |
state |
string |
Two letter US state abbreviation. |
utility_id_eia |
integer |
The EIA Utility Identification number. |
core_eia923__entity_coalmine#
Noticed any irregularities with this table? File a report.
Entity table containing static information about coal mines reporting deliveries in the Fuel Receipts and Costs table.
- Most-recent data:
2025
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA Form 923 – Power Plant Operations Report (Schedule 2, Part C)
- Primary key:
mine_id_pudl
Usage Warnings#
Mine ID’s are not as well defined and consistent as other EIA ID’s.
Additional Details#
This table is produced during the transformation of fuel delivery data, in order to produce a better normalized database. The same coalmines report many individual deliveries, and repeating their attributes many times in the fuel receipts and costs table is duplicative. Unfortunately the coalmine attributes do not generally use a controlled vocabulary or well defined IDs and so in practice there are many distinct records in this table that correspond to the same mines in reality.
We have not yet taken the time to rigorously clean this data, but it could be linked with both Mining Safety and Health Administration (MSHA) and USGS data to provide more insight into where coal is coming from, and what the employment and geological context is for those supplies.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia923__entity_coalmine.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia923__entity_coalmine.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia923__entity_coalmine.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia923__entity_coalmine.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
county_id_fips |
string |
County ID from the Federal Information Processing Standard Publication 6-4. |
data_maturity |
string |
Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity. |
mine_id_msha |
integer |
MSHA issued mine identifier. |
mine_id_pudl |
integer |
Dynamically assigned PUDL mine identifier. |
mine_name |
string |
Coal mine name. |
mine_type_code |
string |
Type of coal mine. |
state |
string |
Two letter US state abbreviation. |
core_eia923__fuel_receipts_costs#
Noticed any irregularities with this table? File a report.
Individual fuel deliveries to power plants, organized by fuel type and supplier.
- Most-recent data:
2025
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA Form 923 – Power Plant Operations Report (Schedule 2 - Part A)
- Primary key:
This table has no primary key. Each record describes an individual fuel delivery. There can be multiple deliveries of the same type of fuel from the same supplier to the same plant in a single month, so the table has no natural primary key.
Usage Warnings#
Date column arbitrarily uses the first of the month.
Some values have been redacted.
Time of fuel deliveries is not necessarily connected with time of fuel consumption.
Additional Details#
There can be a significant delay between the receipt of fuel and its consumption, so using this table to infer monthly attributes associated with power generation may not be entirely accurate. However, this is the most granular data we have describing fuel costs, and we use it in calculating the marginal cost of electricity for individual generation units.
Under some circumstances utilities are allowed to treat the price of fuel as proprietary business data, meaning it is redacted from the publicly available spreadsheets. It’s still reported to EIA and influences the aggregated (state, region, annual, etc.) fuel prices they publish. From 2009-2021 about 1/3 of all prices are redacted. The missing data is not randomly distributed. Deregulated markets dominated by merchant generators (independent power producers) redact much more data, and natural gas is by far the most likely fuel to have its price redacted. This means, for instance, that the entire Northeastern US reports essentially no fine-grained data about its natural gas prices.
Additional data which we haven’t yet integrated is available in a similar format from 2002-2008 via the EIA-423, and going back as far as 1972 from the FERC-423.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia923__fuel_receipts_costs.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia923__fuel_receipts_costs.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia923__fuel_receipts_costs.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia923__fuel_receipts_costs.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
ash_content_pct |
number |
Ash content percentage by weight to the nearest 0.1 percent. |
chlorine_content_ppm |
number |
For coal only: the chlorine content in parts per million (ppm) to the nearest 0.001 ppm. If lab tests of the coal do not include the chlorine content, this field contains the amount specified in the contract with the supplier. |
contract_expiration_date |
date |
Date contract expires.Format: MMYY. |
contract_type_code |
string |
Purchase type under which receipts occurred in the reporting month. C: Contract, NC: New Contract, S: Spot Purchase, T: Tolling Agreement. |
data_maturity |
string |
Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity. |
energy_source_code |
string |
A 2-3 letter code indicating the energy source (e.g. fuel type) associated with the record. |
fuel_cost_per_mmbtu |
number |
Average fuel cost per MMBTU of heat content in nominal USD. |
fuel_group_code |
string |
Fuel groups used in the Electric Power Monthly |
fuel_mmbtu_per_unit |
number |
Heat content of the fuel in millions of Btus per physical unit. |
fuel_received_units |
number |
Quantity of fuel received in tons, barrel, or Mcf. |
fuel_type_code_pudl |
string |
Simplified fuel type code used in PUDL |
mercury_content_ppm |
number |
Mercury content in parts per million (ppm) to the nearest 0.001 ppm. |
mine_id_pudl |
integer |
Dynamically assigned PUDL mine identifier. |
moisture_content_pct |
number |
For coal only: the moisture content of the fuel in terms of moisture percentage by weight. Reported to the nearest 0.01 percent. |
natural_gas_delivery_contract_type_code |
string |
Contract type for natural gas delivery service: |
natural_gas_transport_code |
string |
Contract type for natural gas transportation service. |
plant_id_eia |
integer |
The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration. |
primary_transportation_mode_code |
string |
Transportation mode for the longest distance transported. |
report_date |
date |
Date reported. |
secondary_transportation_mode_code |
string |
Transportation mode for the second longest distance transported. |
sulfur_content_pct |
number |
Sulfur content percentage by weight to the nearest 0.01 percent. |
supplier_name |
string |
Company that sold the fuel to the plant or, in the case of Natural Gas, pipeline owner. |
core_eia923__monthly_boiler_fuel#
Noticed any irregularities with this table? File a report.
Monthly time series of boiler fuel consumption and emissions.
- Most-recent data:
2025
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA Form 923 – Power Plant Operations Report (Schedule 3)
- Primary key:
plant_id_eia, boiler_id, energy_source_code, prime_mover_code, report_date
Usage Warnings#
Date column arbitrarily uses the first of the month.
A small number of respondents only report annual fuel consumption, and all of it is reported in December.
Additional Details#
Reports the quantity of each type of fuel consumed by each boiler on a monthly basis, as well as the sulfur and ash content of those fuels. Fuel quantity is reported in standard EIA fuel units (tons, barrels, Mcf). Heat content per unit of fuel is also reported, making this table useful for calculating the thermal efficiency (heat rate) of various generation units.
This table provides better coverage of the entire fleet of generators than the
core_eia923__monthly_generation_fuel table, but the fuel consumption reported here is not directly
associated with a generator. This complicates the heat rate calculation, since the
associations between individual boilers and generators are incomplete and can be
complex.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia923__monthly_boiler_fuel.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia923__monthly_boiler_fuel.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia923__monthly_boiler_fuel.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia923__monthly_boiler_fuel.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
ash_content_pct |
number |
Ash content percentage by weight to the nearest 0.1 percent. |
boiler_id |
string |
Alphanumeric boiler ID. |
data_maturity |
string |
Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity. |
energy_source_code |
string |
A 2-3 letter code indicating the energy source (e.g. fuel type) associated with the record. |
fuel_consumed_units |
number |
Consumption of the fuel type in physical unit. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production. |
fuel_mmbtu_per_unit |
number |
Heat content of the fuel in millions of Btus per physical unit. |
fuel_type_code_pudl |
string |
Simplified fuel type code used in PUDL |
plant_id_eia |
integer |
The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration. |
prime_mover_code |
string |
Code for the type of prime mover (e.g. CT, CG) |
report_date |
date |
Date reported. |
sulfur_content_pct |
number |
Sulfur content percentage by weight to the nearest 0.01 percent. |
core_eia923__monthly_energy_storage#
Noticed any irregularities with this table? File a report.
Monthly time series of energy consumed and discharged (“generated”) by energy storage units.
- Most-recent data:
2025
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA Form 923 – Power Plant Operations Report (Schedule 3)
- Primary key:
plant_id_eia, report_date, prime_mover_code, energy_source_code
Usage Warnings#
Date column arbitrarily uses the first of the month.
Additional Details#
The total MWh discharged from the energy storage unit during the reporting period is the gross generation and the difference between gross generation and consumption is the net generation.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia923__monthly_energy_storage.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia923__monthly_energy_storage.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia923__monthly_energy_storage.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia923__monthly_energy_storage.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
data_maturity |
string |
Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity. |
energy_source_code |
string |
A 2-3 letter code indicating the energy source (e.g. fuel type) associated with the record. |
fuel_consumed_for_electricity_units |
number |
Consumption for electric generation of the fuel type in physical unit. |
fuel_consumed_units |
number |
Consumption of the fuel type in physical unit. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production. |
fuel_units |
string |
Reported unit of measure for fuel. |
gross_generation_mwh |
number |
Gross electricity generation for the specified period in megawatt-hours (MWh). |
net_generation_mwh |
number |
Net electricity generation for the specified period in megawatt-hours (MWh). |
plant_id_eia |
integer |
The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration. |
prime_mover_code |
string |
Code for the type of prime mover (e.g. CT, CG) |
report_date |
date |
Date reported. |
core_eia923__monthly_generation#
Noticed any irregularities with this table? File a report.
Monthly time series of net electricity generated by power plant generators.
- Most-recent data:
2025
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA Form 923 – Power Plant Operations Report (Schedule 3)
- Primary key:
plant_id_eia, generator_id, report_date
Usage Warnings#
Date column arbitrarily uses the first of the month.
Not all IDs are present.
Table has known low coverage - either geographic or temporal or otherwise.
A small number of respondents only report annual fuel consumption, and all of it is reported in December.
Additional Details#
This is the most granular information we have about how much electricity individual
generators are producing, but only about half of all the generation reported in the
core_eia923__monthly_generation_fuel appears in this table due to the different reporting
requirements imposed on different types and sizes of generators.
Whenever possible, we use this generator-level net generation to estimate the heat rates of generation units and the marginal cost of electricity on a per-generator basis, but those calculations depend on plant-level fuel costs and sometimes uncertain or incomplete boiler-generator associations.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia923__monthly_generation.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia923__monthly_generation.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia923__monthly_generation.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia923__monthly_generation.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
data_maturity |
string |
Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity. |
generator_id |
string |
Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string! |
net_generation_mwh |
number |
Net electricity generation for the specified period in megawatt-hours (MWh). |
plant_id_eia |
integer |
The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration. |
report_date |
date |
Date reported. |
core_eia923__monthly_generation_fuel#
Noticed any irregularities with this table? File a report.
Monthly time series of generation and fuel consumption by fuel and prime mover.
- Most-recent data:
2025
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA Form 923 – Power Plant Operations Report (Schedule 3)
- Primary key:
plant_id_eia, report_date, prime_mover_code, energy_source_code
Usage Warnings#
Date column arbitrarily uses the first of the month.
A small number of respondents only report annual fuel consumption, and all of it is reported in December.
Additional Details#
This data can’t be easily linked to individual boilers, generators, and
generation units, but it is provides the most complete coverage
of fuel consumption and electricity generation for the entire generation fleet. We use
the primary fuels and prime movers reported for each generator along with their
capacities to attribute fuel consumption and generation when it isn’t directly reported
in the core_eia923__monthly_generation and core_eia923__monthly_boiler_fuel tables in order to calculate
capacity factors, heat rates, and the marginal cost of electricity.
The table makes a distinction between all fuel consumed and fuel consumed for electricity generation because some units are also combined heat and power (CHP) units, and also provide high temperature process heat at the expense of net electricity generation.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia923__monthly_generation_fuel.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia923__monthly_generation_fuel.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia923__monthly_generation_fuel.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia923__monthly_generation_fuel.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
data_maturity |
string |
Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity. |
energy_source_code |
string |
A 2-3 letter code indicating the energy source (e.g. fuel type) associated with the record. |
fuel_consumed_for_electricity_mmbtu |
number |
Total consumption of fuel to produce electricity, in physical unit, year to date. |
fuel_consumed_for_electricity_units |
number |
Consumption for electric generation of the fuel type in physical unit. |
fuel_consumed_mmbtu |
number |
Total consumption of fuel in physical unit, year to date. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production. |
fuel_consumed_units |
number |
Consumption of the fuel type in physical unit. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production. |
fuel_mmbtu_per_unit |
number |
Heat content of the fuel in millions of Btus per physical unit. |
fuel_type_code_agg |
string |
A partial aggregation of the reported fuel type codes into larger categories used by EIA in, for example, the Annual Energy Review (AER) or Monthly Energy Review (MER). Two or three letter alphanumeric. |
fuel_type_code_pudl |
string |
Simplified fuel type code used in PUDL |
net_generation_mwh |
number |
Net electricity generation for the specified period in megawatt-hours (MWh). |
plant_id_eia |
integer |
The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration. |
prime_mover_code |
string |
Code for the type of prime mover (e.g. CT, CG) |
report_date |
date |
Date reported. |
core_eia923__monthly_generation_fuel_nuclear#
Noticed any irregularities with this table? File a report.
Monthly time series of generation and fuel consumption of nuclear generation units by fuel and prime mover.
- Most-recent data:
2025
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA Form 923 – Power Plant Operations Report (Schedule 3)
- Primary key:
plant_id_eia, report_date, nuclear_unit_id, energy_source_code, prime_mover_code
Usage Warnings#
Date column arbitrarily uses the first of the month.
A small number of respondents only report annual fuel consumption, and all of it is reported in December.
Additional Details#
This data is originally reported alongside similar information for fossil fuel plants, but the nuclear data is reported by (nuclear) generation unit rather than fuel type and prime mover, and so has a different primary key.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia923__monthly_generation_fuel_nuclear.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia923__monthly_generation_fuel_nuclear.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia923__monthly_generation_fuel_nuclear.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia923__monthly_generation_fuel_nuclear.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
data_maturity |
string |
Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity. |
energy_source_code |
string |
A 2-3 letter code indicating the energy source (e.g. fuel type) associated with the record. |
fuel_consumed_for_electricity_mmbtu |
number |
Total consumption of fuel to produce electricity, in physical unit, year to date. |
fuel_consumed_for_electricity_units |
number |
Consumption for electric generation of the fuel type in physical unit. |
fuel_consumed_mmbtu |
number |
Total consumption of fuel in physical unit, year to date. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production. |
fuel_consumed_units |
number |
Consumption of the fuel type in physical unit. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production. |
fuel_mmbtu_per_unit |
number |
Heat content of the fuel in millions of Btus per physical unit. |
fuel_type_code_agg |
string |
A partial aggregation of the reported fuel type codes into larger categories used by EIA in, for example, the Annual Energy Review (AER) or Monthly Energy Review (MER). Two or three letter alphanumeric. |
fuel_type_code_pudl |
string |
Simplified fuel type code used in PUDL |
net_generation_mwh |
number |
Net electricity generation for the specified period in megawatt-hours (MWh). |
nuclear_unit_id |
string |
For nuclear plants only, the unit number .One digit numeric. Nuclear plants are the only type of plants for which data are shown explicitly at the generating unit level. |
plant_id_eia |
integer |
The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration. |
prime_mover_code |
string |
Code for the type of prime mover (e.g. CT, CG) |
report_date |
date |
Date reported. |
core_eia930__hourly_interchange#
Noticed any irregularities with this table? File a report.
Hourly time series of balancing authority interchange.
- Most-recent data:
2026
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA Form 930 – Hourly and Daily Balancing Authority Operations Report
- Primary key:
datetime_utc, balancing_authority_code_eia, balancing_authority_code_adjacent_eia
Additional Details#
Interchange is the net metered tie line flow from one BA to another directly interconnected BA. Total net interchange is the net sum of all interchange occurring between a BA and its directly interconnected neighboring BAs. Negative interchange values indicate net inflows, and positive interchange values indicate net outflows.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia930__hourly_interchange.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia930__hourly_interchange.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia930__hourly_interchange.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia930__hourly_interchange.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
balancing_authority_code_adjacent_eia |
string |
EIA short code for the other adjacent balancing authority, with which interchange is occurring. Includes Canadian and Mexican BAs. |
balancing_authority_code_eia |
string |
EIA short code identifying a balancing authority. May include Canadian and Mexican BAs. |
datetime_utc |
datetime |
Timestamp at the end of the hour for which the data is reported. |
interchange_reported_mwh |
number |
Original reported energy interchange between adjacent balancing authorities. |
core_eia930__hourly_net_generation_by_energy_source#
Noticed any irregularities with this table? File a report.
Hourly time series of balancing authority net generation by energy source.
- Most-recent data:
2026
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA Form 930 – Hourly and Daily Balancing Authority Operations Report
- Primary key:
datetime_utc, balancing_authority_code_eia, generation_energy_source
Usage Warnings#
Contains known calculation discrepancies.
Additional Details#
BAs report generation from dual-fuel (switching from one fuel to another) and multiple-fuel (using multiple fuels simultaneously) generators under the actual energy source used, if known, and under the generator’s primary energy source, if not known.
To maintain generator confidentiality, generation may sometimes be reported in the Other category if too few generators are reported for a particular energy source category.
In theory, the sum of net generation across all energy sources should equal the total net generation reported in the balancing authority operations table. In practice, there are many cases in which these values diverge significantly, which require further investigation.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia930__hourly_net_generation_by_energy_source.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia930__hourly_net_generation_by_energy_source.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia930__hourly_net_generation_by_energy_source.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia930__hourly_net_generation_by_energy_source.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
balancing_authority_code_eia |
string |
EIA short code identifying a balancing authority. May include Canadian and Mexican BAs. |
datetime_utc |
datetime |
Timestamp at the end of the hour for which the data is reported. |
generation_energy_source |
string |
High level energy source used to produce electricity. |
net_generation_adjusted_mwh |
number |
Reported net generation adjusted by EIA to reflect non-physical commercial transfers through pseudo-ties and dynamic scheduling. |
net_generation_imputed_eia_mwh |
number |
Reported net generation with outlying values removed and missing values imputed by EIA. |
net_generation_reported_mwh |
number |
Unaltered originally reported net generation for the specified period. |
core_eia930__hourly_operations#
Noticed any irregularities with this table? File a report.
Hourly time series of balancing authority net generation, interchange, and demand.
- Most-recent data:
2026
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA Form 930 – Hourly and Daily Balancing Authority Operations Report
- Primary key:
datetime_utc, balancing_authority_code_eia
Additional Details#
Net generation represents the metered output of electric generating units in a BA’s electric system. This generation only includes generating units that are managed by a BA or whose operations are visible to a BA.
Generators on the distribution system—both large-scale resources and small-scale distributed resources, such as rooftop solar photovoltaic (PV) systems—are typically not included.
In some electricity publications, EIA reports generation from all utility-scale generating units in the United States. BAs only meter generating units that are from a subset of all utility-scale generating units. As a result, when hourly generation from the EIA-930 is aggregated to monthly or annual levels, the results will be lower than monthly and annual aggregations in other EIA electricity publications.
Interchange is the net metered tie line flow from one BA to another directly interconnected BA. This table includes the net sum of all interchange occurring between a BA and all of its directly interconnected neighboring BAs. For a detailed breakdown of interchange between each pair of adjacent BAs see core_eia930__hourly_interchange.
Negative interchange values indicate net inflows, and positive interchange values indicate net outflows.
Demand is a calculated value representing the amount of electricity load within a BA’s electric system. A BA derives its demand value by taking the total metered net electricity generation within its electric system and subtracting the total metered net electricity interchange occurring between the BA and its neighboring BAs.
Each BA produces a day-ahead electricity demand forecast for every hour of the next day. These forecasts help BAs plan for and coordinate the reliable operation of their electric system.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia930__hourly_operations.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia930__hourly_operations.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia930__hourly_operations.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia930__hourly_operations.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
balancing_authority_code_eia |
string |
EIA short code identifying a balancing authority. May include Canadian and Mexican BAs. |
datetime_utc |
datetime |
Timestamp at the end of the hour for which the data is reported. |
demand_adjusted_mwh |
number |
Electricity demand adjusted by EIA to reflect non-physical commercial transfers through pseudo-ties and dynamic scheduling. |
demand_forecast_mwh |
number |
Day ahead demand forecast. |
demand_imputed_eia_mwh |
number |
Electricity demand calculated by subtracting BA interchange from net generation, with outliers and missing values imputed by EIA. |
demand_reported_mwh |
number |
Originally reported electricity demand, calculated by taking the net generation within the BA and subtracting the interchange with adjacent BAs. |
interchange_adjusted_mwh |
number |
Energy interchange between adjacent balancing authorities, adjusted by EIA to reflect non-physical commercial transfers through pseudo-ties and dynamic scheduling. |
interchange_imputed_eia_mwh |
number |
Energy interchange between adjacent balancing authorities, with outliers and missing values imputed by EIA. |
interchange_reported_mwh |
number |
Original reported energy interchange between adjacent balancing authorities. |
net_generation_adjusted_mwh |
number |
Reported net generation adjusted by EIA to reflect non-physical commercial transfers through pseudo-ties and dynamic scheduling. |
net_generation_imputed_eia_mwh |
number |
Reported net generation with outlying values removed and missing values imputed by EIA. |
net_generation_reported_mwh |
number |
Unaltered originally reported net generation for the specified period. |
core_eia930__hourly_subregion_demand#
Noticed any irregularities with this table? File a report.
Hourly time series of balancing authority subregion demand.
- Most-recent data:
2026
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA Form 930 – Hourly and Daily Balancing Authority Operations Report
- Primary key:
datetime_utc, balancing_authority_code_eia, balancing_authority_subregion_code_eia
Additional Details#
For select BAs, balancing authority subregion demand provides a further level of geographic disaggregation (for example, load zones, weather zones, operating company areas, local BAs, etc.) within a BA’s electric system.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia930__hourly_subregion_demand.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia930__hourly_subregion_demand.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia930__hourly_subregion_demand.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia930__hourly_subregion_demand.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
balancing_authority_code_eia |
string |
EIA short code identifying a balancing authority. May include Canadian and Mexican BAs. |
balancing_authority_subregion_code_eia |
string |
Code identifying subregions of larger balancing authorities. |
datetime_utc |
datetime |
Timestamp at the end of the hour for which the data is reported. |
demand_reported_mwh |
number |
Originally reported electricity demand for the balancing area subregion. Note that different BAs have different methods of calculating and allocating subregion demand. |
core_eia__codes_averaging_periods#
Noticed any irregularities with this table? File a report.
Code table containing descriptions of categorical codes for the averaging period specified by emissions statutes and regulations for the EIA 860.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA – Mix of multiple EIA Forms
- Primary key:
code
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_averaging_periods.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_averaging_periods.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_averaging_periods.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_averaging_periods.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
code |
string |
Originally reported short code. |
description |
string |
Long human-readable description of the meaning of a code/label. |
label |
string |
Longer human-readable code using snake_case |
core_eia__codes_boiler_generator_assn_types#
Noticed any irregularities with this table? File a report.
Code table containing descriptions of categorical codes for boiler-generator associations in the EIA 860.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA – Mix of multiple EIA Forms
- Primary key:
code
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_boiler_generator_assn_types.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_boiler_generator_assn_types.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_boiler_generator_assn_types.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_boiler_generator_assn_types.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
code |
string |
Originally reported short code. |
description |
string |
Long human-readable description of the meaning of a code/label. |
label |
string |
Longer human-readable code using snake_case |
core_eia__codes_boiler_status#
Noticed any irregularities with this table? File a report.
Code table containing descriptions of categorical codes for boiler status in the EIA 860.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA – Mix of multiple EIA Forms
- Primary key:
code
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_boiler_status.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_boiler_status.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_boiler_status.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_boiler_status.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
code |
string |
Originally reported short code. |
description |
string |
Long human-readable description of the meaning of a code/label. |
label |
string |
Longer human-readable code using snake_case |
core_eia__codes_boiler_types#
Noticed any irregularities with this table? File a report.
Code table containing descriptions of categorical codes for boiler regulatory types in the EIA 860.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA – Mix of multiple EIA Forms
- Primary key:
code
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_boiler_types.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_boiler_types.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_boiler_types.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_boiler_types.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
code |
string |
Originally reported short code. |
description |
string |
Long human-readable description of the meaning of a code/label. |
label |
string |
Longer human-readable code using snake_case |
core_eia__codes_coalmine_types#
Noticed any irregularities with this table? File a report.
Code table containing descriptions of categorical codes for coalmines reported as fuel sources in the EIA-923.
- Most-recent data:
2025
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA – Mix of multiple EIA Forms
- Primary key:
code
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_coalmine_types.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_coalmine_types.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_coalmine_types.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_coalmine_types.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
code |
string |
Originally reported short code. |
description |
string |
Long human-readable description of the meaning of a code/label. |
label |
string |
Longer human-readable code using snake_case |
core_eia__codes_contract_types#
Noticed any irregularities with this table? File a report.
Code table containing descriptions of categorical codes for fuel supply contracts reported in EIA-923.
- Most-recent data:
2025
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA – Mix of multiple EIA Forms
- Primary key:
code
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_contract_types.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_contract_types.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_contract_types.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_contract_types.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
code |
string |
Originally reported short code. |
description |
string |
Long human-readable description of the meaning of a code/label. |
label |
string |
Longer human-readable code using snake_case |
core_eia__codes_cooling_system_types#
Noticed any irregularities with this table? File a report.
Code table containing descriptions of categorical codes for cooling systems.
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA – Mix of multiple EIA Forms
- Primary key:
code
Usage Warnings#
Contains information from multiple raw inputs.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_cooling_system_types.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_cooling_system_types.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_cooling_system_types.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_cooling_system_types.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
code |
string |
Originally reported short code. |
description |
string |
Long human-readable description of the meaning of a code/label. |
label |
string |
Longer human-readable code using snake_case |
core_eia__codes_cooling_tower_types#
Noticed any irregularities with this table? File a report.
Code table containing descriptions of categorical codes for cooling towers.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA – Mix of multiple EIA Forms
- Primary key:
code
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_cooling_tower_types.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_cooling_tower_types.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_cooling_tower_types.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_cooling_tower_types.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
code |
string |
Originally reported short code. |
description |
string |
Long human-readable description of the meaning of a code/label. |
label |
string |
Longer human-readable code using snake_case |
core_eia__codes_cooling_water_sources#
Noticed any irregularities with this table? File a report.
Code table containing descriptions of categorical codes for cooling water sources in EIA-860.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA – Mix of multiple EIA Forms
- Primary key:
code
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_cooling_water_sources.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_cooling_water_sources.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_cooling_water_sources.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_cooling_water_sources.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
code |
string |
Originally reported short code. |
description |
string |
Long human-readable description of the meaning of a code/label. |
label |
string |
Longer human-readable code using snake_case |
core_eia__codes_cooling_water_types#
Noticed any irregularities with this table? File a report.
Code table containing descriptions of categorical codes for cooling water.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA – Mix of multiple EIA Forms
- Primary key:
code
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_cooling_water_types.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_cooling_water_types.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_cooling_water_types.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_cooling_water_types.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
code |
string |
Originally reported short code. |
description |
string |
Long human-readable description of the meaning of a code/label. |
label |
string |
Longer human-readable code using snake_case |
core_eia__codes_emission_control_equipment_types#
Noticed any irregularities with this table? File a report.
Code table containing descriptions of categorical codes for emissions control equipment installed on a boiler.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA – Mix of multiple EIA Forms
- Primary key:
code
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_emission_control_equipment_types.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_emission_control_equipment_types.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_emission_control_equipment_types.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_emission_control_equipment_types.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
code |
string |
Originally reported short code. |
description |
string |
Long human-readable description of the meaning of a code/label. |
label |
string |
Longer human-readable code using snake_case |
core_eia__codes_energy_sources#
Noticed any irregularities with this table? File a report.
Code table containing descriptions of categorical codes for energy sources reported to EIA.
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA – Mix of multiple EIA Forms
- Primary key:
code
Usage Warnings#
Contains information from multiple raw inputs.
Additional Details#
Compiled from EIA-860 instructions and EIA-923 file layout spreadsheets.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_energy_sources.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_energy_sources.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_energy_sources.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_energy_sources.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
code |
string |
Originally reported short code. |
description |
string |
Long human-readable description of the meaning of a code/label. |
fuel_derived_from |
string |
Original fuel from which this refined fuel was derived. |
fuel_group_eia |
string |
High level fuel group defined in the 2021-2023 EIA Form 860 instructions, Table 28. |
fuel_phase |
string |
Physical phase of matter of the fuel. |
fuel_type_code_pudl |
string |
Simplified fuel type code used in PUDL |
fuel_units |
string |
Reported unit of measure for fuel. |
label |
string |
Longer human-readable code using snake_case |
max_fuel_mmbtu_per_unit |
number |
Maximum heat content per physical unit of fuel in MMBTU. |
min_fuel_mmbtu_per_unit |
number |
Minimum heat content per physical unit of fuel in MMBTU. |
core_eia__codes_environmental_equipment_manufacturers#
Noticed any irregularities with this table? File a report.
Code table containing descriptions of categorical codes for manufacturers of boilers and environmental control equipment in the EIA-860.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA – Mix of multiple EIA Forms
- Primary key:
code
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_environmental_equipment_manufacturers.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_environmental_equipment_manufacturers.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_environmental_equipment_manufacturers.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_environmental_equipment_manufacturers.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
code |
string |
Originally reported short code. |
description |
string |
Long human-readable description of the meaning of a code/label. |
label |
string |
Longer human-readable code using snake_case |
core_eia__codes_firing_types#
Noticed any irregularities with this table? File a report.
Code table containing descriptions of categorical codes for boiler firing types in the EIA-860.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA – Mix of multiple EIA Forms
- Primary key:
code
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_firing_types.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_firing_types.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_firing_types.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_firing_types.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
code |
string |
Originally reported short code. |
description |
string |
Long human-readable description of the meaning of a code/label. |
label |
string |
Longer human-readable code using snake_case |
core_eia__codes_fuel_transportation_modes#
Noticed any irregularities with this table? File a report.
Code table containing descriptions of categorical codes for fuel transportation modes reported in the EIA-860 and EIA-923.
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA – Mix of multiple EIA Forms
- Primary key:
code
Usage Warnings#
Contains information from multiple raw inputs.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_fuel_transportation_modes.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_fuel_transportation_modes.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_fuel_transportation_modes.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_fuel_transportation_modes.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
code |
string |
Originally reported short code. |
description |
string |
Long human-readable description of the meaning of a code/label. |
label |
string |
Longer human-readable code using snake_case |
core_eia__codes_fuel_types_agg#
Noticed any irregularities with this table? File a report.
Code table containing descriptions of categorical codes for aggregated fuel types used in the Annual Energy Review or Monthly Energy Review.
- Most-recent data:
2025
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA – Mix of multiple EIA Forms
- Primary key:
code
Additional Details#
See EIA-923 Fuel Code table for additional information.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_fuel_types_agg.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_fuel_types_agg.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_fuel_types_agg.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_fuel_types_agg.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
code |
string |
Originally reported short code. |
description |
string |
Long human-readable description of the meaning of a code/label. |
core_eia__codes_mercury_compliance_strategies#
Noticed any irregularities with this table? File a report.
Code table containing descriptions of categorical codes for compliance strategies used to control mercury in the EIA-860.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA – Mix of multiple EIA Forms
- Primary key:
code
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_mercury_compliance_strategies.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_mercury_compliance_strategies.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_mercury_compliance_strategies.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_mercury_compliance_strategies.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
code |
string |
Originally reported short code. |
description |
string |
Long human-readable description of the meaning of a code/label. |
label |
string |
Longer human-readable code using snake_case |
core_eia__codes_momentary_interruptions#
Noticed any irregularities with this table? File a report.
Code table containing descriptions of categorical codes for utility definitions of momentary service interruptions.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA – Mix of multiple EIA Forms
- Primary key:
code
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_momentary_interruptions.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_momentary_interruptions.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_momentary_interruptions.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_momentary_interruptions.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
code |
string |
Originally reported short code. |
description |
string |
Long human-readable description of the meaning of a code/label. |
label |
string |
Longer human-readable code using snake_case |
core_eia__codes_nox_compliance_strategies#
Noticed any irregularities with this table? File a report.
Code table containing descriptions of categorical codes for compliance strategies used to control nitrogen oxide in the EIA-860.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA – Mix of multiple EIA Forms
- Primary key:
code
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_nox_compliance_strategies.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_nox_compliance_strategies.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_nox_compliance_strategies.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_nox_compliance_strategies.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
code |
string |
Originally reported short code. |
description |
string |
Long human-readable description of the meaning of a code/label. |
label |
string |
Longer human-readable code using snake_case |
core_eia__codes_nox_control_status#
Noticed any irregularities with this table? File a report.
Code table containing descriptions of categorical codes for the operational status of nitrogen oxide control units associated with boilers in the EIA-860 data.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA – Mix of multiple EIA Forms
- Primary key:
code
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_nox_control_status.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_nox_control_status.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_nox_control_status.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_nox_control_status.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
code |
string |
Originally reported short code. |
description |
string |
Long human-readable description of the meaning of a code/label. |
label |
string |
Longer human-readable code using snake_case |
core_eia__codes_nox_units#
Noticed any irregularities with this table? File a report.
Code table containing descriptions of categorical codes for units of measurement for nitrogen oxide in the EIA-860.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA – Mix of multiple EIA Forms
- Primary key:
code
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_nox_units.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_nox_units.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_nox_units.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_nox_units.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
code |
string |
Originally reported short code. |
description |
string |
Long human-readable description of the meaning of a code/label. |
label |
string |
Longer human-readable code using snake_case |
core_eia__codes_operational_status#
Noticed any irregularities with this table? File a report.
Code table containing descriptions of categorical codes for operational status reported to EIA.
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA – Mix of multiple EIA Forms
- Primary key:
code
Usage Warnings#
Contains information from multiple raw inputs.
Additional Details#
Compiled from EIA-860 instructions and EIA-923 file layout spreadsheets.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_operational_status.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_operational_status.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_operational_status.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_operational_status.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
code |
string |
Originally reported short code. |
description |
string |
Long human-readable description of the meaning of a code/label. |
label |
string |
Longer human-readable code using snake_case |
operational_status |
string |
The operating status of the asset. For generators this is based on which tab the generator was listed in in EIA 860. |
core_eia__codes_particulate_compliance_strategies#
Noticed any irregularities with this table? File a report.
Code table containing descriptions of categorical codes for compliance strategies used to control particulate matter in the EIA-860.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA – Mix of multiple EIA Forms
- Primary key:
code
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_particulate_compliance_strategies.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_particulate_compliance_strategies.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_particulate_compliance_strategies.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_particulate_compliance_strategies.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
code |
string |
Originally reported short code. |
description |
string |
Long human-readable description of the meaning of a code/label. |
label |
string |
Longer human-readable code using snake_case |
core_eia__codes_particulate_units#
Noticed any irregularities with this table? File a report.
Code table containing descriptions of categorical codes for units of measurement for particulate matter in the EIA-860.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA – Mix of multiple EIA Forms
- Primary key:
code
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_particulate_units.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_particulate_units.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_particulate_units.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_particulate_units.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
code |
string |
Originally reported short code. |
description |
string |
Long human-readable description of the meaning of a code/label. |
label |
string |
Longer human-readable code using snake_case |
core_eia__codes_prime_movers#
Noticed any irregularities with this table? File a report.
Code table containing descriptions of categorical codes for prime movers reported in the EIA-860 and EIA-923.
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA – Mix of multiple EIA Forms
- Primary key:
code
Usage Warnings#
Contains information from multiple raw inputs.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_prime_movers.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_prime_movers.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_prime_movers.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_prime_movers.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
code |
string |
Originally reported short code. |
description |
string |
Long human-readable description of the meaning of a code/label. |
label |
string |
Longer human-readable code using snake_case |
core_eia__codes_regulations#
Noticed any irregularities with this table? File a report.
Code table containing descriptions of categorical codes for levels of statutes and codes under which boilers operate in the EIA-860.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA – Mix of multiple EIA Forms
- Primary key:
code
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_regulations.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_regulations.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_regulations.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_regulations.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
code |
string |
Originally reported short code. |
description |
string |
Long human-readable description of the meaning of a code/label. |
label |
string |
Longer human-readable code using snake_case |
core_eia__codes_reporting_frequencies#
Noticed any irregularities with this table? File a report.
Code table containing descriptions of categorical codes for the reporting frequencies used by plants in the EIA-923.
- Most-recent data:
2025
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA – Mix of multiple EIA Forms
- Primary key:
code
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_reporting_frequencies.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_reporting_frequencies.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_reporting_frequencies.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_reporting_frequencies.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
code |
string |
Originally reported short code. |
description |
string |
Long human-readable description of the meaning of a code/label. |
label |
string |
Longer human-readable code using snake_case |
core_eia__codes_sector_consolidated#
Noticed any irregularities with this table? File a report.
Code table containing descriptions of categorical codes for EIA consolidated NAICS sectors.
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA – Mix of multiple EIA Forms
- Primary key:
code
Usage Warnings#
Contains information from multiple raw inputs.
Additional Details#
Codes and descriptions taken from the EIA-923 File Layout spreadsheet.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_sector_consolidated.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_sector_consolidated.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_sector_consolidated.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_sector_consolidated.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
code |
string |
Originally reported short code. |
description |
string |
Long human-readable description of the meaning of a code/label. |
label |
string |
Longer human-readable code using snake_case |
core_eia__codes_so2_compliance_strategies#
Noticed any irregularities with this table? File a report.
Code table containing descriptions of categorical codes for compliance strategies used to control sulfur dioxide in the EIA-860.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA – Mix of multiple EIA Forms
- Primary key:
code
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_so2_compliance_strategies.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_so2_compliance_strategies.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_so2_compliance_strategies.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_so2_compliance_strategies.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
code |
string |
Originally reported short code. |
description |
string |
Long human-readable description of the meaning of a code/label. |
label |
string |
Longer human-readable code using snake_case |
core_eia__codes_so2_units#
Noticed any irregularities with this table? File a report.
Code table containing descriptions of categorical codes for units of measurement for sulfur dioxide in the EIA-860.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA – Mix of multiple EIA Forms
- Primary key:
code
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_so2_units.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_so2_units.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_so2_units.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_so2_units.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
code |
string |
Originally reported short code. |
description |
string |
Long human-readable description of the meaning of a code/label. |
label |
string |
Longer human-readable code using snake_case |
core_eia__codes_sorbent_types#
Noticed any irregularities with this table? File a report.
Code table containing descriptions of categorical codes for flue gas desulfurization sorbent types used in EIA-860.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA – Mix of multiple EIA Forms
- Primary key:
code
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_sorbent_types.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_sorbent_types.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_sorbent_types.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_sorbent_types.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
code |
string |
Originally reported short code. |
description |
string |
Long human-readable description of the meaning of a code/label. |
label |
string |
Longer human-readable code using snake_case |
core_eia__codes_steam_plant_types#
Noticed any irregularities with this table? File a report.
Code table containing descriptions of categorical codes for steam plants in the EIA-860.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA – Mix of multiple EIA Forms
- Primary key:
code
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_steam_plant_types.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_steam_plant_types.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_steam_plant_types.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_steam_plant_types.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
code |
string |
Originally reported short code. |
description |
string |
Long human-readable description of the meaning of a code/label. |
label |
string |
Longer human-readable code using snake_case |
core_eia__codes_storage_enclosure_types#
Noticed any irregularities with this table? File a report.
Code table containing descriptions of categorical codes for energy storage enclosures.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA – Mix of multiple EIA Forms
- Primary key:
code
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_storage_enclosure_types.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_storage_enclosure_types.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_storage_enclosure_types.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_storage_enclosure_types.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
code |
string |
Originally reported short code. |
description |
string |
Long human-readable description of the meaning of a code/label. |
label |
string |
Longer human-readable code using snake_case |
core_eia__codes_storage_technology_types#
Noticed any irregularities with this table? File a report.
Code table containing descriptions of categorical codes for energy storage technologies.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA – Mix of multiple EIA Forms
- Primary key:
code
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_storage_technology_types.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_storage_technology_types.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_storage_technology_types.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_storage_technology_types.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
code |
string |
Originally reported short code. |
description |
string |
Long human-readable description of the meaning of a code/label. |
label |
string |
Longer human-readable code using snake_case |
core_eia__codes_wet_dry_bottom#
Noticed any irregularities with this table? File a report.
Code table containing descriptions of categorical codes for boiler bottoms in the EIA-860.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA – Mix of multiple EIA Forms
- Primary key:
code
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_wet_dry_bottom.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_wet_dry_bottom.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_wet_dry_bottom.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_wet_dry_bottom.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
code |
string |
Originally reported short code. |
description |
string |
Long human-readable description of the meaning of a code/label. |
label |
string |
Longer human-readable code using snake_case |
core_eia__codes_wind_quality_class#
Noticed any irregularities with this table? File a report.
Code table containing descriptions of categorical codes for wind quality classes.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA – Mix of multiple EIA Forms
- Primary key:
code
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_wind_quality_class.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_wind_quality_class.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_wind_quality_class.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__codes_wind_quality_class.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
code |
string |
Originally reported short code. |
description |
string |
Long human-readable description of the meaning of a code/label. |
extreme_fifty_year_gust_ms |
number |
The extreme 50-year wind gusts at this generator in meters per hour. |
label |
string |
Longer human-readable code using snake_case |
turbulence_intensity_a |
number |
The upper bounds of the turbulence intensity at the wind site (ratio of standard deviation of fluctuating wind velocity to the mean wind speed). |
turbulence_intensity_b |
number |
The lower bounds of the turbulence intensity at the wind site (ratio of standard deviation of fluctuating wind velocity to the mean wind speed). |
wind_speed_avg_ms |
number |
Average wind speed in meters per second. |
core_eia__entity_boilers#
Noticed any irregularities with this table? File a report.
Entity table containing static information about boilers compiled from the EIA-860 and EIA-923.
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA – Mix of multiple EIA Forms
- Primary key:
plant_id_eia, boiler_id
Usage Warnings#
Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.
Contains information from multiple raw inputs.
Additional Details#
This is one of two tables where canonical values for boilers are set. It contains values which are expected to remain fixed, while core_eia860__scd_boilers contains those which may vary from year to year. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process. All tables downstream of this one inherit the canonical values established here.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__entity_boilers.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__entity_boilers.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__entity_boilers.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__entity_boilers.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
boiler_id |
string |
Alphanumeric boiler ID. |
boiler_manufacturer |
string |
Name of boiler manufacturer. |
boiler_manufacturer_code |
string |
EIA short code for boiler manufacturer. |
plant_id_eia |
integer |
The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration. |
core_eia__entity_generators#
Noticed any irregularities with this table? File a report.
Entity table containing static information about generators compiled from across the EIA-860 and EIA-923.
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA – Mix of multiple EIA Forms
- Primary key:
plant_id_eia, generator_id
Usage Warnings#
Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.
Contains information from multiple raw inputs.
Additional Details#
This is one of two tables where canonical values for generators are set. It contains values which are expected to remain fixed, while core_eia860__scd_generators contains those which may vary from year to year. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process. All tables downstream of this one inherit the canonical values established here.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__entity_generators.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__entity_generators.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__entity_generators.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__entity_generators.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
associated_combined_heat_power |
boolean |
Indicates whether the generator is associated with a combined heat and power system |
bypass_heat_recovery |
boolean |
Can this generator operate while bypassing the heat recovery steam generator? |
can_switch_when_operating |
boolean |
Indicates whether a fuel switching generator can switch fuels while operating. |
duct_burners |
boolean |
Indicates whether the unit has duct-burners for supplementary firing of the turbine exhaust gas |
fluidized_bed_tech |
boolean |
Indicates whether the generator uses fluidized bed technology |
generator_id |
string |
Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string! |
generator_operating_date |
date |
Date the generator began commercial operation. If harvested values are inconsistent, we default to using the most recently reported date. |
original_planned_generator_operating_date |
date |
The date the generator was originally scheduled to be operational |
other_combustion_tech |
boolean |
Indicates whether the generator uses other combustion technologies |
plant_id_eia |
integer |
The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration. |
previously_canceled |
boolean |
Indicates whether the generator was previously reported as indefinitely postponed or canceled |
pulverized_coal_tech |
boolean |
Indicates whether the generator uses pulverized coal technology |
rto_iso_lmp_node_id |
string |
The designation used to identify the price node in RTO/ISO Locational Marginal Price reports |
rto_iso_location_wholesale_reporting_id |
string |
The designation used to report the specific location of the wholesale sales transactions to FERC for the Electric Quarterly Report |
solid_fuel_gasification |
boolean |
Indicates whether the generator is part of a solid fuel gasification system |
stoker_tech |
boolean |
Indicates whether the generator uses stoker technology |
subcritical_tech |
boolean |
Indicates whether the generator uses subcritical technology |
supercritical_tech |
boolean |
Indicates whether the generator uses supercritical technology |
topping_bottoming_code |
string |
If the generator is associated with a combined heat and power system, indicates whether the generator is part of a topping cycle or a bottoming cycle |
ultrasupercritical_tech |
boolean |
Indicates whether the generator uses ultra-supercritical technology |
core_eia__entity_plants#
Noticed any irregularities with this table? File a report.
Entity table containing static information about plants, compiled from across all EIA-860 and EIA-923 data.
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA – Mix of multiple EIA Forms
- Primary key:
plant_id_eia
Usage Warnings#
Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.
Contains information from multiple raw inputs.
Additional Details#
This is one of two tables where canonical values for plants are set. It contains values which are expected to remain fixed, while core_eia860__scd_plants contains those which may vary from year to year. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process. All tables downstream of this one inherit the canonical values established here.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__entity_plants.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__entity_plants.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__entity_plants.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__entity_plants.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
city |
string |
Name of the city. |
county |
string |
County name. |
latitude |
number |
Latitude of the plant’s location, in degrees. |
longitude |
number |
Longitude of the plant’s location, in degrees. |
plant_id_eia |
integer |
The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration. |
plant_name_eia |
string |
Plant name. |
state |
string |
Two letter US state abbreviation. |
street_address |
string |
Physical street address. |
timezone |
string |
IANA timezone name |
zip_code |
string |
Five digit US Zip Code. |
core_eia__entity_utilities#
Noticed any irregularities with this table? File a report.
Entity table containing static information about utilities, compiled from all EIA data.
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA – Mix of multiple EIA Forms
- Primary key:
utility_id_eia
Usage Warnings#
Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.
Contains information from multiple raw inputs.
Additional Details#
This is one of two tables where canonical values for utilities are set. It contains values which are expected to remain fixed, while core_eia860__scd_utilities contains those which may vary from year to year. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process. All tables downstream of this one inherit the canonical values established here.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__entity_utilities.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__entity_utilities.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__entity_utilities.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__entity_utilities.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
utility_id_eia |
integer |
The EIA Utility Identification number. |
utility_name_eia |
string |
The name of the utility. |
core_eia__yearly_fuel_receipts_costs_aggs#
Noticed any irregularities with this table? File a report.
Annual time series of aggregated fuel receipts and costs from the EIA API.
- Most-recent data:
2026
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA – Mix of multiple EIA Forms
- Primary key:
fuel_agg, geo_agg, sector_agg, temporal_agg, report_date
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__yearly_fuel_receipts_costs_aggs.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__yearly_fuel_receipts_costs_aggs.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__yearly_fuel_receipts_costs_aggs.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eia__yearly_fuel_receipts_costs_aggs.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
fuel_agg |
string |
Category of fuel aggregation in EIA bulk electricity data. |
fuel_cost_per_mmbtu |
number |
Average fuel cost per MMBTU of heat content in nominal USD. |
fuel_received_mmbtu |
number |
Aggregated fuel receipts, in MMBTU, in EIA bulk electricity data. |
geo_agg |
string |
Category of geographic aggregation in EIA bulk electricity data. |
report_date |
date |
Date reported. |
sector_agg |
string |
Category of sectoral aggregation in EIA bulk electricity data. |
temporal_agg |
string |
Category of temporal aggregation in EIA bulk electricity data. |
core_eiaaeo__yearly_projected_electric_sales#
Noticed any irregularities with this table? File a report.
Annual time series of projected electricity sales by region and customer class.
- Most-recent data:
2025
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA Annual Energy Outlook (AEO)
- Primary key:
report_year, electricity_market_module_region_eiaaeo, model_case_eiaaeo, projection_year, customer_class
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eiaaeo__yearly_projected_electric_sales.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eiaaeo__yearly_projected_electric_sales.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eiaaeo__yearly_projected_electric_sales.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eiaaeo__yearly_projected_electric_sales.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
customer_class |
string |
High level categorization of customer type (e.g., commercial, residential). |
electricity_market_module_region_eiaaeo |
string |
AEO projection region. |
model_case_eiaaeo |
string |
Factors such as economic growth, future oil prices, the ultimate size of domestic energy resources, and technological change are often uncertain. To illustrate some of these uncertainties, EIA runs side cases to show how the model responds to changes in key input variables compared with the Reference case. See https://www.eia.gov/outlooks/aeo/assumptions/case_descriptions.php for more details. |
projection_year |
integer |
The year of the projected value. |
report_year |
integer |
Four-digit year in which the data was reported. |
sales_mwh |
number |
Quantity of electricity sold in MWh. |
core_eiaaeo__yearly_projected_energy_use_by_sector_and_type#
Noticed any irregularities with this table? File a report.
Annual time series of projected energy use for commercial, electric power, industrial, residential, and transportation sectors, drawn from AEO Table 2.
- Most-recent data:
2025
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA Annual Energy Outlook (AEO)
- Primary key:
report_year, region_name_eiaaeo, model_case_eiaaeo, projection_year, energy_use_sector, energy_use_type
Usage Warnings#
Some columns contain subtotals; use caution when choosing columns to aggregate.
Additional Details#
The series in Table 2 which track energy use by sector do not always define each type of use the same way across sectors. There is detailed information about what is included or excluded in each use type for each sector in the footnotes of the EIA’s online AEO data browser:
https://www.eia.gov/outlooks/aeo/data/browser/#/?id=2-AEO2023
Use caution when aggregating across use types! Energy Use has a tricky system of subtotals, and summing all types within a sector will result in double-counting. Consult the EIA’s data browser for visibility into which use types are subtotals, and what they contain: subtotal series are displayed indented, and include all lines above them which are one level out, up to the next indented line. Delivered Energy and Total are special cases which include those plus all subtotals above. In this way, “Delivered Energy” includes purchased electricity, renewable energy, and an array of fuels based on sector, and explicitly excludes electricity-related losses.
AEO Energy Use figures are variously referred to as delivered energy, energy consumption, energy use, and energy demand, depending on which use types are being discussed, and which org and which document is describing them. In PUDL we say energy use or energy consumption.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eiaaeo__yearly_projected_energy_use_by_sector_and_type.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eiaaeo__yearly_projected_energy_use_by_sector_and_type.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eiaaeo__yearly_projected_energy_use_by_sector_and_type.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eiaaeo__yearly_projected_energy_use_by_sector_and_type.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
energy_use_mmbtu |
number |
Energy use, in MMBTU; also referred to as energy consumption, energy demand, or delivered energy, depending on type. |
energy_use_sector |
string |
Sector for energy use figures in AEO Table 2. Similar to customer class, but with some missing and some extra values. |
energy_use_type |
string |
Type of energy use, indicating the name of the series from AEO Table 2. Includes fuels, electricity, losses, and various subtotals; consult table documentation for aggregation guidelines. |
model_case_eiaaeo |
string |
Factors such as economic growth, future oil prices, the ultimate size of domestic energy resources, and technological change are often uncertain. To illustrate some of these uncertainties, EIA runs side cases to show how the model responds to changes in key input variables compared with the Reference case. See https://www.eia.gov/outlooks/aeo/assumptions/case_descriptions.php for more details. |
projection_year |
integer |
The year of the projected value. |
region_name_eiaaeo |
string |
EIA AEO region for energy consumption. Includes US Census Divisions plus United States. |
region_type_eiaaeo |
string |
Region type for EIA AEO energy consumption, indicating whether region_name_eiaaeo is a US Census Division or country (United States) |
report_year |
integer |
Four-digit year in which the data was reported. |
core_eiaaeo__yearly_projected_fuel_cost_in_electric_sector_by_type#
Noticed any irregularities with this table? File a report.
Annual time series of projected fuel prices for the electric power sector, including electricity-only and combined-heat-and-power plants that have a regulatory status.
- Most-recent data:
2025
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA Annual Energy Outlook (AEO)
- Primary key:
report_year, electricity_market_module_region_eiaaeo, model_case_eiaaeo, projection_year, fuel_type_eiaaeo
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eiaaeo__yearly_projected_fuel_cost_in_electric_sector_by_type.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eiaaeo__yearly_projected_fuel_cost_in_electric_sector_by_type.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eiaaeo__yearly_projected_fuel_cost_in_electric_sector_by_type.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eiaaeo__yearly_projected_fuel_cost_in_electric_sector_by_type.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
electricity_market_module_region_eiaaeo |
string |
AEO projection region. |
fuel_cost_per_mmbtu |
number |
Average fuel cost per MMBTU of heat content in nominal USD. |
fuel_cost_real_per_mmbtu_eiaaeo |
number |
Average fuel cost per MMBTU of heat content in real USD, standardized to the value of a USD in the year defined by |
fuel_type_eiaaeo |
string |
Fuel type reported for AEO end-use sector generation data. |
model_case_eiaaeo |
string |
Factors such as economic growth, future oil prices, the ultimate size of domestic energy resources, and technological change are often uncertain. To illustrate some of these uncertainties, EIA runs side cases to show how the model responds to changes in key input variables compared with the Reference case. See https://www.eia.gov/outlooks/aeo/assumptions/case_descriptions.php for more details. |
projection_year |
integer |
The year of the projected value. |
real_cost_basis_year |
integer |
Four-digit year which is the basis for any ‘real cost’ monetary values (as opposed to nominal values). |
report_year |
integer |
Four-digit year in which the data was reported. |
core_eiaaeo__yearly_projected_generation_in_electric_sector_by_technology#
Noticed any irregularities with this table? File a report.
Annual time series of projected generation capacity & total generation in the electric sector, broken out by technology.
- Most-recent data:
2025
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA Annual Energy Outlook (AEO)
- Primary key:
report_year, electricity_market_module_region_eiaaeo, model_case_eiaaeo, projection_year, technology_description_eiaaeo
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eiaaeo__yearly_projected_generation_in_electric_sector_by_technology.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eiaaeo__yearly_projected_generation_in_electric_sector_by_technology.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eiaaeo__yearly_projected_generation_in_electric_sector_by_technology.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eiaaeo__yearly_projected_generation_in_electric_sector_by_technology.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
electricity_market_module_region_eiaaeo |
string |
AEO projection region. |
gross_generation_mwh |
number |
Gross electricity generation for the specified period in megawatt-hours (MWh). |
model_case_eiaaeo |
string |
Factors such as economic growth, future oil prices, the ultimate size of domestic energy resources, and technological change are often uncertain. To illustrate some of these uncertainties, EIA runs side cases to show how the model responds to changes in key input variables compared with the Reference case. See https://www.eia.gov/outlooks/aeo/assumptions/case_descriptions.php for more details. |
projection_year |
integer |
The year of the projected value. |
report_year |
integer |
Four-digit year in which the data was reported. |
summer_capacity_mw |
number |
The net summer capacity. |
summer_capacity_planned_additions_mw |
number |
The total planned additions to net summer generating capacity. |
summer_capacity_retirements_mw |
number |
The total retirements from net summer generating capacity. |
summer_capacity_unplanned_additions_mw |
number |
The total unplanned additions to net summer generating capacity. |
technology_description_eiaaeo |
string |
Generation technology reported for AEO. |
core_eiaaeo__yearly_projected_generation_in_end_use_sectors_by_fuel_type#
Noticed any irregularities with this table? File a report.
Annual time series of projected generation capacity and total generation in the end-use sector, broken out by fuel type.
- Most-recent data:
2025
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EIA Annual Energy Outlook (AEO)
- Primary key:
report_year, electricity_market_module_region_eiaaeo, model_case_eiaaeo, projection_year, fuel_type_eiaaeo
Usage Warnings#
The solar photovoltaic capacity portion of Renewable Sources is reported in megawatts direct current; other technologies are reported in megawatts of alternating current.
Additional Details#
Includes combined-heat-and-power plants and electricity-only plants in the commercial and industrial sectors; and small on-site generating systems in the residential, commercial, and industrial sectors used primarily for own-use generation, but which may also sell some power to the grid.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eiaaeo__yearly_projected_generation_in_end_use_sectors_by_fuel_type.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eiaaeo__yearly_projected_generation_in_end_use_sectors_by_fuel_type.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eiaaeo__yearly_projected_generation_in_end_use_sectors_by_fuel_type.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_eiaaeo__yearly_projected_generation_in_end_use_sectors_by_fuel_type.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
electricity_market_module_region_eiaaeo |
string |
AEO projection region. |
fuel_type_eiaaeo |
string |
Fuel type reported for AEO end-use sector generation data. |
gross_generation_mwh |
number |
Gross electricity generation for the specified period in megawatt-hours (MWh). |
model_case_eiaaeo |
string |
Factors such as economic growth, future oil prices, the ultimate size of domestic energy resources, and technological change are often uncertain. To illustrate some of these uncertainties, EIA runs side cases to show how the model responds to changes in key input variables compared with the Reference case. See https://www.eia.gov/outlooks/aeo/assumptions/case_descriptions.php for more details. |
projection_year |
integer |
The year of the projected value. |
report_year |
integer |
Four-digit year in which the data was reported. |
summer_capacity_mw |
number |
The net summer capacity. |
core_epa__assn_eia_epacamd#
Noticed any irregularities with this table? File a report.
Association table providing connections between EPA units and EIA plants, boilers, and generators.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EPA – Mix of multiple EPA sources
- Primary key:
This table has no primary key.
Additional Details#
This crosswalk table comes from the PUDL fork of the EPA camd-eia-crosswalk Github repo: https://github.com/catalyst-cooperative/camd-eia-crosswalk-latest.
The camd-eia-crosswalk README and our Data Source documentation page on EPA Hourly Continuous Emission Monitoring System (CEMS) depict the complicated relationship between EIA and EPA data, specifically the nature of EPA vs. EIA “units” and the level of granularity that one can connect the two sources.
The original EPA crosswalk runs on 2018 EIA data. We adapted the crosswalk code to run on each new year of EIA data, capturing changes in plant information over time.
Our version of the crosswalk clarifies some of the column names and removes unmatched
rows. The pudl.etl.glue_assets.core_epa__assn_eia_epacamd() function doc strings explain
what changes are made from the EPA’s version.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_epa__assn_eia_epacamd.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_epa__assn_eia_epacamd.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_epa__assn_eia_epacamd.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_epa__assn_eia_epacamd.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
boiler_id |
string |
Alphanumeric boiler ID. |
emissions_unit_id_epa |
string |
Emissions (smokestack) unit monitored by EPA CEMS. |
generator_id |
string |
Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string! |
generator_id_epa |
string |
Generator ID used by the EPA. |
plant_id_eia |
integer |
The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration. |
plant_id_epa |
integer |
The ORISPL ID used by EPA to refer to the plant. Usually but not always the same as plant_id_eia. |
report_year |
integer |
Four-digit year in which the data was reported. |
core_epa__assn_eia_epacamd_subplant_ids#
Noticed any irregularities with this table? File a report.
Association table providing connections between EPA units and EIA units/generators, at the subplant level.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EPA – Mix of multiple EPA sources
- Primary key:
This table has no primary key. The primary keys would have been: plant_id_eia, generator_id, subplant_id and emissions_unit_id_epa, but there are some null records in the generator_id column. ~2 percent of all EPA CAMD records are not successfully mapped to EIA generators.
Additional Details#
This table is an augmented version of the core_epa__assn_eia_epacamd crosswalk table which initially comes from the EPA’s Github repo camd-eia-crosswalk: https://github.com/USEPA/camd-eia-crosswalk.
This table identifies subplants within plant_ids, which are the smallest coherent units for aggregation. A plant_id refers to a legal entity that often contains multiple distinct power plants, even of different technology or fuel types.
EPA CEMS data combines information from several parts of a power plant:
emissions from smokestacks
fuel use from combustors
electricity production from generators
But smokestacks, combustors, and generators can be connected in complex, many-to-many relationships. This complexity makes attribution difficult for, as an example, allocating pollution to energy producers. Furthermore, heterogeneity within plant_ids make aggregation to the parent entity difficult or inappropriate.
This table inherits from the EPA’s crosswalk, the IDs from EPA CAMD core_epacems__hourly_emissions table itself, the core_eia860__assn_boiler_generator table and the core_eia860__scd_generators table. While the core_epa__assn_eia_epacamd table is the core backbone of the table, EPA CAMD IDs ensure there is complete coverage of EPA CAMD reporting units. The EIA 860 table addition ensures there is also complete coverage of those units as well.
For more information about the how this subplant_id is made, see the documentation for
pudl.etl.glue_assets.make_subplant_ids() and pudl.etl.glue_assets.update_subplant_ids().
But by analyzing the relationships between combustors and generators, as provided in the core_epa__assn_eia_epacamd crosswalk, we can identify distinct power plants. These are the smallest coherent units of aggregation.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_epa__assn_eia_epacamd_subplant_ids.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_epa__assn_eia_epacamd_subplant_ids.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_epa__assn_eia_epacamd_subplant_ids.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_epa__assn_eia_epacamd_subplant_ids.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
emissions_unit_id_epa |
string |
Emissions (smokestack) unit monitored by EPA CEMS. |
generator_id |
string |
Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string! |
plant_id_eia |
integer |
The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration. |
plant_id_epa |
integer |
The ORISPL ID used by EPA to refer to the plant. Usually but not always the same as plant_id_eia. |
subplant_id |
integer |
Sub-plant ID links EPA CEMS emissions units to EIA units. |
unit_id_pudl |
integer |
Dynamically assigned PUDL unit id. WARNING: This ID is not guaranteed to be static long term as the input data and algorithm may evolve over time. |
core_epacems__hourly_emissions#
Noticed any irregularities with this table? File a report.
Hourly time series of emissions and plant operational data as required by 40 CFR Part 75.
- Most-recent data:
2025
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
EPA Hourly Continuous Emission Monitoring System (CEMS)
- Primary key:
plant_id_epa, emissions_unit_id_epa, operating_datetime_utc
Usage Warnings#
Large table; do not attempt to open with Excel.
Not all IDs are present.
Contains information from multiple raw inputs.
Additional Details#
Continuous Emissions Monitoring Systems (CEMS) are used to determine the rate of gas or particulate matter exiting a point source of emissions. The EPA Clean Air Markets Division (CAMD) has collected data on power plant emissions from CEMS units stretching back to 1995. The CEMS dataset includes hourly gross load, SO2, CO2, and NOx emissions associated with a given emissions-unit during startup, shutdown, and instances of malfunction. An EPA CEMS emissions-unit or smokestack unit is not the same as an EIA unit. See core_epa__assn_eia_epacamd for details about how to connect CEMS data to corresponding EIA units.
Only fossil-combustion units over 25 MW are required to install and use CEMS, so there are some units that do report in EIA-860 or EIA-923 that do not have any CEMS data.
Access methods#
Warning
This table will likely not fit in your computer’s memory.
Make sure to filter or aggregate to reduce the amount of data read in.
This is easier in polars and duckdb than in pandas,
so we only provide examples for these.
import polars as pl
df = (
# Use scan_parquet (lazy evaluation) to plan the query before execution
pl.scan_parquet(
"s3://pudl.catalyst.coop/nightly/core_epacems__hourly_emissions.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
# Select only a subset of columns to reduce network and memory usage
.select(
[
"state",
"plant_id_eia",
"plant_id_epa",
"operating_datetime_utc",
"operating_time_hours",
"emissions_unit_id_epa",
"gross_load_mw",
"heat_content_mmbtu",
"co2_mass_tons",
]
)
.filter(
# Select data for only a single plant to reduce network and memory usage
(pl.col("plant_id_eia") == 470)
# The row groups in this Parquet file are partitioned by state and year,
# so indicating the state allows the query to skip irrelevant groups.
& (pl.col("state") == "CO")
)
# Now execute the planned query
.collect()
)
SELECT
-- Select only a subset of columns to reduce network and memory usage
state,
plant_id_eia,
plant_id_epa,
operating_datetime_utc,
operating_time_hours,
emissions_unit_id_epa,
gross_load_mw,
heat_content_mmbtu,
co2_mass_tons
FROM 's3://pudl.catalyst.coop/nightly/core_epacems__hourly_emissions.parquet'
WHERE
-- Select data for only a single plant to reduce network and memory usage
plant_id_eia = 470
-- The row groups in this Parquet file are partitioned by state and year,
-- so indicating the state allows the query to skip irrelevant groups.
AND state = 'CO';
Columns#
Field Name |
Type |
Description |
|---|---|---|
co2_mass_measurement_code |
string |
Identifies whether the reported value of emissions was measured, calculated, or measured and substitute. |
co2_mass_tons |
number |
Carbon dioxide emissions in short tons. |
emissions_unit_id_epa |
string |
Emissions (smokestack) unit monitored by EPA CEMS. |
gross_load_mw |
number |
Average power in megawatts delivered during time interval measured. |
heat_content_mmbtu |
number |
The energy contained in fuel burned, measured in million BTU. |
nox_mass_lbs |
number |
NOx emissions in pounds. |
nox_mass_measurement_code |
string |
Identifies whether the reported value of emissions was measured, calculated, or measured and substitute. |
operating_datetime_utc |
datetime |
Date and time measurement began (UTC). |
operating_time_hours |
number |
Length of time interval measured. |
plant_id_eia |
integer |
The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration. |
plant_id_epa |
integer |
The ORISPL ID used by EPA to refer to the plant. Usually but not always the same as plant_id_eia. |
so2_mass_lbs |
number |
Sulfur dioxide emissions in pounds. |
so2_mass_measurement_code |
string |
Identifies whether the reported value of emissions was measured, calculated, or measured and substitute. |
state |
string |
Two letter US state abbreviation. |
steam_load_lbs |
number |
Total steam pressure produced by a unit during the reported hour. |
year |
integer |
Year the data was reported in, used for partitioning EPA CEMS. |
core_ferc1__codes_power_purchase_types#
Noticed any irregularities with this table? File a report.
Code table containing descriptions of categorical codes for electric power purchase types.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
FERC Form 1 – Annual Report of Major Electric Utilities
- Primary key:
code
Usage Warnings#
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc1__codes_power_purchase_types.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc1__codes_power_purchase_types.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc1__codes_power_purchase_types.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc1__codes_power_purchase_types.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
code |
string |
Originally reported short code. |
description |
string |
Long human-readable description of the meaning of a code/label. |
label |
string |
Longer human-readable code using snake_case |
core_ferc1__yearly_balance_sheet_assets_sched110#
Noticed any irregularities with this table? File a report.
Annual time series of utility assets and other debits.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 110)
- Primary key:
utility_id_ferc1, report_year, asset_type
Usage Warnings#
Some columns contain subtotals; use caution when choosing columns to aggregate.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_balance_sheet_assets_sched110.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_balance_sheet_assets_sched110.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc1__yearly_balance_sheet_assets_sched110.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_balance_sheet_assets_sched110.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
asset_type |
string |
Type of asset being reported to the core_ferc1__yearly_balance_sheet_assets_sched110 table. |
balance |
string |
Indication of whether a column is a credit or debit, as reported in the XBRL taxonomy. |
ending_balance |
number |
Account balance at end of year. |
ferc_account |
string |
Actual FERC Account number (e.g. ‘359.1’) if available, or a PUDL assigned ID when FERC accounts have been split or combined in reporting. |
record_id |
string |
Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped. |
report_year |
integer |
Four-digit year in which the data was reported. |
row_type_xbrl |
string |
Indicates whether the value reported in the row is calculated, or uniquely reported within the table. |
starting_balance |
number |
Account balance at beginning of year. |
utility_id_ferc1 |
integer |
PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year. |
utility_type |
string |
Listing of utility plant types. Examples include Electric Utility, Gas Utility, and Other Utility. |
core_ferc1__yearly_balance_sheet_liabilities_sched110#
Noticed any irregularities with this table? File a report.
Annual time series of utility liabilities and other credits.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 110)
- Primary key:
utility_id_ferc1, report_year, liability_type
Usage Warnings#
Some columns contain subtotals; use caution when choosing columns to aggregate.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_balance_sheet_liabilities_sched110.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_balance_sheet_liabilities_sched110.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc1__yearly_balance_sheet_liabilities_sched110.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_balance_sheet_liabilities_sched110.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
balance |
string |
Indication of whether a column is a credit or debit, as reported in the XBRL taxonomy. |
ending_balance |
number |
Account balance at end of year. |
ferc_account |
string |
Actual FERC Account number (e.g. ‘359.1’) if available, or a PUDL assigned ID when FERC accounts have been split or combined in reporting. |
liability_type |
string |
Type of liability being reported to the core_ferc1__yearly_balance_sheet_liabilities_sched110 table. |
record_id |
string |
Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped. |
report_year |
integer |
Four-digit year in which the data was reported. |
row_type_xbrl |
string |
Indicates whether the value reported in the row is calculated, or uniquely reported within the table. |
starting_balance |
number |
Account balance at beginning of year. |
utility_id_ferc1 |
integer |
PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year. |
utility_type |
string |
Listing of utility plant types. Examples include Electric Utility, Gas Utility, and Other Utility. |
core_ferc1__yearly_cash_flows_sched120#
Noticed any irregularities with this table? File a report.
Annual time series of utility cash flow.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 120)
- Primary key:
utility_id_ferc1, report_year, amount_type
Usage Warnings#
Some columns contain subtotals; use caution when choosing columns to aggregate.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_cash_flows_sched120.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_cash_flows_sched120.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc1__yearly_cash_flows_sched120.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_cash_flows_sched120.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
amount |
number |
Reported amount of dollars. This could be a balance or a change in value. |
amount_type |
string |
Label describing the type of amount being reported. This could be a balance or a change in value. |
balance |
string |
Indication of whether a column is a credit or debit, as reported in the XBRL taxonomy. |
record_id |
string |
Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped. |
report_year |
integer |
Four-digit year in which the data was reported. |
row_type_xbrl |
string |
Indicates whether the value reported in the row is calculated, or uniquely reported within the table. |
utility_id_ferc1 |
integer |
PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year. |
core_ferc1__yearly_depreciation_by_function_sched219#
Noticed any irregularities with this table? File a report.
Annual time series of ending balances in accumulated provision for depreciation of electric utility plant.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 219 - Section B)
- Primary key:
utility_id_ferc1, report_year, depreciation_type, plant_function, plant_status, utility_type
Usage Warnings#
Some columns contain subtotals; use caution when choosing columns to aggregate.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.
Additional Details#
Electric utility plant refers to FERC Account 108. Section B includes the Balances at End of Year According to Functional Classification (plant_function).
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_depreciation_by_function_sched219.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_depreciation_by_function_sched219.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc1__yearly_depreciation_by_function_sched219.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_depreciation_by_function_sched219.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
balance |
string |
Indication of whether a column is a credit or debit, as reported in the XBRL taxonomy. |
depreciation_type |
string |
Type of depreciation provision within FERC Account 108, including cost ofremoval, depreciation expenses, salvage, cost of retired plant, etc. |
ending_balance |
number |
Account balance at end of year. |
plant_function |
string |
Functional role played by utility plant (steam production, nuclear production, distribution, transmission, etc.). |
plant_status |
string |
Utility plant financial status (in service, future, leased, total). |
record_id |
string |
Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped. |
report_year |
integer |
Four-digit year in which the data was reported. |
row_type_xbrl |
string |
Indicates whether the value reported in the row is calculated, or uniquely reported within the table. |
utility_id_ferc1 |
integer |
PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year. |
utility_type |
string |
Listing of utility plant types. Examples include Electric Utility, Gas Utility, and Other Utility. |
core_ferc1__yearly_depreciation_changes_sched219#
Noticed any irregularities with this table? File a report.
Annual time series of changes in accumulated provision for depreciation of electric utility plant.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 219 - Section A)
- Primary key:
utility_id_ferc1, report_year, depreciation_type, plant_status, utility_type
Usage Warnings#
Some columns contain subtotals; use caution when choosing columns to aggregate.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.
Additional Details#
Electric utility plant refers to FERC Account 108.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_depreciation_changes_sched219.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_depreciation_changes_sched219.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc1__yearly_depreciation_changes_sched219.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_depreciation_changes_sched219.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
balance |
string |
Indication of whether a column is a credit or debit, as reported in the XBRL taxonomy. |
depreciation_type |
string |
Type of depreciation provision within FERC Account 108, including cost ofremoval, depreciation expenses, salvage, cost of retired plant, etc. |
dollar_value |
number |
Dollar value of reported income, expense, asset, or liability. |
ferc_account |
string |
Actual FERC Account number (e.g. ‘359.1’) if available, or a PUDL assigned ID when FERC accounts have been split or combined in reporting. |
plant_status |
string |
Utility plant financial status (in service, future, leased, total). |
record_id |
string |
Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped. |
report_year |
integer |
Four-digit year in which the data was reported. |
row_type_xbrl |
string |
Indicates whether the value reported in the row is calculated, or uniquely reported within the table. |
utility_id_ferc1 |
integer |
PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year. |
utility_type |
string |
Listing of utility plant types. Examples include Electric Utility, Gas Utility, and Other Utility. |
core_ferc1__yearly_depreciation_factors_sched336#
Noticed any irregularities with this table? File a report.
Annual time series of factors used in estimating depreciation charges.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 336 - Section C)
- Primary key:
This table has no primary key. This table is too unstructured to have a primary key.
Usage Warnings#
Some columns contain subtotals; use caution when choosing columns to aggregate.
Contains columns which may appear categorical, but are actually free text.
The rate and percentage (pct) columns are reported either as values between 0-1 or 0-100. It is often consistent within a given utility-year but use caution when comparing value across different utilities and even the same utility filing in a different year.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.
Additional Details#
This table contains details at a variety of levels of granularity.There are many free-form text fields in this table which results in respondents filling out this table very differently from each other or from year to year. We recommend using this table carefully with one utility-year at a time - not attempting to perform analysis across long time-series or across utilities without much caution and cleaning. This table only contains information from Section C: Factors Used in Estimating Depreciation Charges.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_depreciation_factors_sched336.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_depreciation_factors_sched336.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc1__yearly_depreciation_factors_sched336.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_depreciation_factors_sched336.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
account_num |
string |
Account number(s) in connection with factors used in estimating depreciation charges. |
depreciable_plant_base |
number |
Depreciable plant balance (depreciable base) to which rates are applied. |
depreciation_factors |
string |
Label of the factor of depreciation factors. This field contains is an unstructured, free-form strings. It often includes FERC account IDs, sometimes includes plant names and sometimes includes headers indicating the beginning of a new section - such as a plant or asset type header which is followed by sub-components like FERC account IDs or plant names depending on the section. |
depreciation_rate |
string |
Depreciation rate applied to utility plant balance.Be aware that the formatting of this column is not expected to be standard - expect some values between 0-1 and some between 0-100. |
mortality_curve_type |
string |
Description of the type of mortality curve selected in plant mortality studies prepared to assist in estimating average service lives. |
net_salvage_pct |
number |
Percentage representing the estimated value of utility plant at the end of its service life. Be aware that the formatting of this column is not expected to be standard - expect some values between 0-1 and some between 0-100. |
order_num |
number |
This field is defined in FERC-XBRL documentation as a field that is used to sequence a table.FERC-XBRL documentation notes: ‘This field is added to a table to control ordering of the items on the table.’FERC’s documentation also notes that this field should always be an integer - although there are many instances of floating point values which seem to increment by decimal points. Nonetheless, this field can be used to help understand the original order of the table. This field did not exist prior to FERC publishing Form 1 as XBRL and thus is always null prior to 2021. |
record_id |
string |
Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped. |
remaining_life_avg |
number |
Estimated weighted average of remaining life of utility plant assets.The original DBF data was reported as years while the XBRL data was reported as strings in this time duration format: PnYnMnDTnHnMnS. PUDL converts these string time durations into years. |
report_year |
integer |
Four-digit year in which the data was reported. |
service_life_avg |
number |
Estimated average service life of utility plant. The original DBF data was reported as years while the XBRL data was reported as strings in this time duration format: PnYnMnDTnHnMnS. PUDL converts these string time durations into years. |
utility_id_ferc1 |
integer |
PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year. |
core_ferc1__yearly_depreciation_summary_sched336#
Noticed any irregularities with this table? File a report.
Annual time series of depreciation and amortization of electric plant.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 336 - Section A)
- Primary key:
utility_id_ferc1, report_year, plant_function, ferc_account_label
Usage Warnings#
Some columns contain subtotals; use caution when choosing columns to aggregate.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.
Additional Details#
Electric Plant refers to FERC Accounts 403, 404, and 405. This table only contains information from Section A: Summary of depreciation and amortization changes.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_depreciation_summary_sched336.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_depreciation_summary_sched336.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc1__yearly_depreciation_summary_sched336.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_depreciation_summary_sched336.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
dollar_value |
number |
Dollar value of reported income, expense, asset, or liability. |
ferc_account |
string |
Actual FERC Account number (e.g. ‘359.1’) if available, or a PUDL assigned ID when FERC accounts have been split or combined in reporting. |
ferc_account_label |
string |
Long FERC account identifier derived from values reported in the XBRL taxonomies. May also refer to aggregations of individual FERC accounts. |
plant_function |
string |
Functional role played by utility plant (steam production, nuclear production, distribution, transmission, etc.). |
record_id |
string |
Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped. |
report_year |
integer |
Four-digit year in which the data was reported. |
row_type_xbrl |
string |
Indicates whether the value reported in the row is calculated, or uniquely reported within the table. |
utility_id_ferc1 |
integer |
PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year. |
utility_type |
string |
Listing of utility plant types. Examples include Electric Utility, Gas Utility, and Other Utility. |
core_ferc1__yearly_energy_dispositions_sched401#
Noticed any irregularities with this table? File a report.
Annual time series of dispositions of electric energy sold, exchanged, or stored.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 401a)
- Primary key:
utility_id_ferc1, report_year, energy_disposition_type
Usage Warnings#
Some columns contain subtotals; use caution when choosing columns to aggregate.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.
Additional Details#
Electric Energy Account, dispositions only. Schedule 401a. Electricity utilities delivered to end users, internal losses, etc.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_energy_dispositions_sched401.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_energy_dispositions_sched401.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc1__yearly_energy_dispositions_sched401.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_energy_dispositions_sched401.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
energy_disposition_type |
string |
Type of energy disposition reported in the core_ferc1__yearly_energy_dispositions_sched401. Dispositions include sales to customers, re-sales of energy, energy used internally, losses, etc. |
energy_mwh |
number |
Sources and uses of energy in MWh. |
record_id |
string |
Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped. |
report_year |
integer |
Four-digit year in which the data was reported. |
row_type_xbrl |
string |
Indicates whether the value reported in the row is calculated, or uniquely reported within the table. |
utility_id_ferc1 |
integer |
PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year. |
core_ferc1__yearly_energy_sources_sched401#
Noticed any irregularities with this table? File a report.
Annual time series of sources of electric energy generated or purchased, exchanged and wheeled.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 401a)
- Primary key:
utility_id_ferc1, report_year, energy_source_type
Usage Warnings#
Some columns contain subtotals; use caution when choosing columns to aggregate.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.
Additional Details#
Electric Energy Account, sources only. Schedule 401a. Amount of electricity the utility obtained from each of several sources.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_energy_sources_sched401.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_energy_sources_sched401.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc1__yearly_energy_sources_sched401.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_energy_sources_sched401.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
energy_mwh |
number |
Sources and uses of energy in MWh. |
energy_source_type |
string |
Type of energy source reported in the core_ferc1__yearly_energy_sources_sched401 table. There are three groups of energy sources: generation, power exchanges and transmission. |
record_id |
string |
Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped. |
report_year |
integer |
Four-digit year in which the data was reported. |
row_type_xbrl |
string |
Indicates whether the value reported in the row is calculated, or uniquely reported within the table. |
utility_id_ferc1 |
integer |
PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year. |
core_ferc1__yearly_hydroelectric_plants_sched406#
Noticed any irregularities with this table? File a report.
Annual time series of plant statistics for large hydroelectric generating plants.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 406)
- Primary key:
This table has no primary key. The best approximation for primary keys for this table would be:
report_year,utility_id_ferc1,plant_name_ferc1. FERC does not publish plant IDs. The main identifying column isplant_name_ferc1but that is a free-form string field and there are duplicate records.
Usage Warnings#
The
plant_typeandconstruction_typeare standardized into categorical values from free-form strings.FERC does not restrict respondents to report unique and non-duplicative plant records. There are sporadic instances of respondents reporting portions of plants and then the total plant (ex: unit 1, unit 2 and total). Use caution when aggregating.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.
Additional Details#
Large plants have an installed nameplate capacity of more than 10 MW.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_hydroelectric_plants_sched406.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_hydroelectric_plants_sched406.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc1__yearly_hydroelectric_plants_sched406.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_hydroelectric_plants_sched406.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
asset_retirement_cost |
number |
Asset retirement cost (USD). |
avg_num_employees |
number |
The average number of employees assigned to each plant. |
capacity_mw |
number |
Total installed (nameplate) capacity, in megawatts. |
capex_equipment |
number |
Cost of plant: equipment (USD). |
capex_facilities |
number |
Cost of plant: reservoirs, dams, and waterways (USD). |
capex_land |
number |
Cost of plant: land and land rights (USD). |
capex_per_mw |
number |
Cost of plant per megawatt of installed (nameplate) capacity. Nominal USD. |
capex_roads |
number |
Cost of plant: roads, railroads, and bridges (USD). |
capex_structures |
number |
Cost of plant: structures and improvements (USD). |
capex_total |
number |
Total cost of plant (USD). |
construction_type |
string |
Type of plant construction (‘outdoor’, ‘semioutdoor’, or ‘conventional’). Categorized by PUDL based on our best guess of intended value in FERC1 freeform strings. |
construction_year |
integer |
Year the plant’s oldest still operational unit was built. |
installation_year |
integer |
Year the plant’s most recently built unit was installed. |
net_capacity_adverse_conditions_mw |
number |
Net plant capability under the least favorable operating conditions, in megawatts. |
net_capacity_favorable_conditions_mw |
number |
Net plant capability under the most favorable operating conditions, in megawatts. |
net_generation_mwh |
number |
Net electricity generation for the specified period in megawatt-hours (MWh). |
opex_dams |
number |
Production expenses: maintenance of reservoirs, dams, and waterways (USD). |
opex_electric |
number |
Production expenses: electric expenses (USD). |
opex_engineering |
number |
Production expenses: maintenance, supervision, and engineering (USD). |
opex_generation_misc |
number |
Production expenses: miscellaneous power generation expenses (USD). |
opex_hydraulic |
number |
Production expenses: hydraulic expenses (USD). |
opex_misc_plant |
number |
Production expenses: maintenance of miscellaneous hydraulic plant (USD). |
opex_operations |
number |
Production expenses: operations, supervision, and engineering (USD). |
opex_per_mwh |
number |
Total production expenses (USD per MWh generated). |
opex_plant |
number |
Production expenses: maintenance of electric plant (USD). |
opex_rents |
number |
Production expenses: rents (USD). |
opex_structures |
number |
Production expenses: maintenance of structures (USD). |
opex_total |
number |
Total production expenses, excluding fuel (USD). |
opex_water_for_power |
number |
Production expenses: water for power (USD). |
peak_demand_mw |
number |
Net peak demand for 60 minutes. Note: in some cases peak demand for other time periods may have been reported instead, if hourly peak demand was unavailable. |
plant_hours_connected_while_generating |
number |
Hours the plant was connected to load while generating in the report year. |
plant_name_ferc1 |
string |
Name of the plant, as reported to FERC. This is a freeform string, not guaranteed to be consistent across references to the same plant. |
plant_type |
string |
Type of plant. |
project_num |
integer |
FERC Licensed Project Number. |
record_id |
string |
Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped. |
report_year |
integer |
Four-digit year in which the data was reported. |
utility_id_ferc1 |
integer |
PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year. |
core_ferc1__yearly_identification_certification#
Noticed any irregularities with this table? File a report.
Annual time series of respondent identification and corporate officer certification information.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
FERC Form 1 – Annual Report of Major Electric Utilities (Part I and II)
- Primary key:
utility_id_ferc1, report_year, report_filing_type
Usage Warnings#
Contains columns which may appear categorical, but are actually free text.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.
Additional Details#
This table contains information identifying a utility’s contact information, addresses and individuals responsible for accounting and certification. Note that the entity_id_gleif field does not conform to the expected format and largely includes the names of respondents.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_identification_certification.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_identification_certification.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc1__yearly_identification_certification.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_identification_certification.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
attestation_date |
date |
Date that corporate officer’s certification was signed. |
attestation_name |
string |
Name of person signing the corporate officer’s certification. |
attestation_signature |
string |
Signature of corporate officer to certify the report. |
attestation_title |
string |
Title of person signing the corporate officer’s certification. |
company_id_ferc |
string |
The Company Identifier (CID) obtained through FERC’s Company Registration system. |
contact_address |
string |
Street address for contact identified in contact_name. |
contact_city |
string |
City for contact identified in contact_name. |
contact_email |
string |
Email for contact identified in contact_name. |
contact_name |
string |
Name of contact person for the respondent. |
contact_phone |
string |
Phone number for contact identified in contact_name. |
contact_state |
string |
State for contact identified in contact_name. |
contact_title |
string |
Title of contact person for the respondent. |
contact_zip |
string |
Zip code for contact identified in contact_name. |
entity_id_gleif |
string |
Legal Entity Identification Number issued by Global Legal Entity Identifier Foundation (GLEIF). The Legal Entity Identifier (LEI) is a 20-character, alpha-numeric code based on the ISO 17442 standard developed by the International Organization for Standardization (ISO). |
filing_date |
date |
Date on which the filing was submitted. |
filing_software_vendor_name |
string |
Name of software company used for creating the instance file submitted. If the filer created the submission, is null |
name_change_date |
date |
Date of last name change of the company. |
office_city |
string |
City of principal office at end of period. |
office_state |
string |
State of principal office at end of period. |
office_street_address |
string |
Street address of principal office at end of period. |
office_zip |
string |
Zipcode of principal office at end of period. |
prior_utility_name_ferc1 |
string |
Former name of the responding utility, when it changed within the last year. |
record_id |
string |
Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped. |
report_filing_type |
string |
Type of report submitted: O (original) or R (revision). |
report_year |
integer |
Four-digit year in which the data was reported. |
utility_id_ferc1 |
integer |
PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year. |
utility_name_ferc1 |
string |
Name of the responding utility, as it is reported in FERC Form 1. For human readability only. |
core_ferc1__yearly_income_statements_sched114#
Noticed any irregularities with this table? File a report.
Annual time series of utility income statements.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 114)
- Primary key:
utility_id_ferc1, report_year, utility_type, income_type
Usage Warnings#
Some columns contain subtotals; use caution when choosing columns to aggregate.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_income_statements_sched114.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_income_statements_sched114.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc1__yearly_income_statements_sched114.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_income_statements_sched114.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
balance |
string |
Indication of whether a column is a credit or debit, as reported in the XBRL taxonomy. |
dollar_value |
number |
Dollar value of reported income, expense, asset, or liability. |
ferc_account |
string |
Actual FERC Account number (e.g. ‘359.1’) if available, or a PUDL assigned ID when FERC accounts have been split or combined in reporting. |
income_type |
string |
Type of income reported in core_ferc1__yearly_income_statements_sched114 table. |
record_id |
string |
Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped. |
report_year |
integer |
Four-digit year in which the data was reported. |
row_type_xbrl |
string |
Indicates whether the value reported in the row is calculated, or uniquely reported within the table. |
utility_id_ferc1 |
integer |
PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year. |
utility_type |
string |
Listing of utility plant types. Examples include Electric Utility, Gas Utility, and Other Utility. |
core_ferc1__yearly_operating_expenses_sched320#
Noticed any irregularities with this table? File a report.
Annual time series of operating and maintenance costs associated with producing electricity.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 320)
- Primary key:
utility_id_ferc1, report_year, expense_type
Usage Warnings#
Some columns contain subtotals; use caution when choosing columns to aggregate.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_operating_expenses_sched320.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_operating_expenses_sched320.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc1__yearly_operating_expenses_sched320.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_operating_expenses_sched320.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
dollar_value |
number |
Dollar value of reported income, expense, asset, or liability. |
expense_type |
string |
The type of expense. |
ferc_account |
string |
Actual FERC Account number (e.g. ‘359.1’) if available, or a PUDL assigned ID when FERC accounts have been split or combined in reporting. |
record_id |
string |
Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped. |
report_year |
integer |
Four-digit year in which the data was reported. |
row_type_xbrl |
string |
Indicates whether the value reported in the row is calculated, or uniquely reported within the table. |
utility_id_ferc1 |
integer |
PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year. |
utility_type |
string |
Listing of utility plant types. Examples include Electric Utility, Gas Utility, and Other Utility. |
core_ferc1__yearly_operating_revenues_sched300#
Noticed any irregularities with this table? File a report.
Annual time series of utilities’ electric operating revenues.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 300)
- Primary key:
utility_id_ferc1, report_year, revenue_type
Usage Warnings#
Some columns contain subtotals; use caution when choosing columns to aggregate.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.
Additional Details#
This table includes only the structured part of schedule 300. There are a number of revenue_type’s that do not have sales_mwh,or avg_customers_per_month provided, in which case these columnswill be NULL.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_operating_revenues_sched300.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_operating_revenues_sched300.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc1__yearly_operating_revenues_sched300.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_operating_revenues_sched300.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
avg_customers_per_month |
number |
Average number of customers per month. |
dollar_value |
number |
Dollar value of reported income, expense, asset, or liability. |
ferc_account |
string |
Actual FERC Account number (e.g. ‘359.1’) if available, or a PUDL assigned ID when FERC accounts have been split or combined in reporting. |
record_id |
string |
Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped. |
report_year |
integer |
Four-digit year in which the data was reported. |
revenue_type |
string |
Label describing types of revenues. |
row_type_xbrl |
string |
Indicates whether the value reported in the row is calculated, or uniquely reported within the table. |
sales_mwh |
number |
Quantity of electricity sold in MWh. |
utility_id_ferc1 |
integer |
PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year. |
utility_type |
string |
Listing of utility plant types. Examples include Electric Utility, Gas Utility, and Other Utility. |
core_ferc1__yearly_other_regulatory_assets_sched232#
Noticed any irregularities with this table? File a report.
Annual time series of utilities’ other regulatory assets, including rate order docket number.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 232)
- Primary key:
This table has no primary key. Respondents are able to enter any number of assets across many rows. There are no IDs or set fields enforced in the original table.
Usage Warnings#
Contains columns which may appear categorical, but are actually free text.
Some columns contain subtotals; use caution when choosing columns to aggregate.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_other_regulatory_assets_sched232.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_other_regulatory_assets_sched232.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc1__yearly_other_regulatory_assets_sched232.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_other_regulatory_assets_sched232.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
account_detail |
string |
Description of the account number credited from making debit adjustment to other regulatory liabilities. |
additional_description |
string |
Additional free-form description column. In most instances, the values in this column are exactly the same as the description column,but it differs from the standard description enough that we kept both columns. This field did not exist prior to FERC publishing Form 1 as XBRL and thus is always null prior to 2021. |
credits_written_off_recovered |
number |
Recovered amount of divestiture of other assets lacking physical substance resulting from the ratemaking actions of regulatory agencies. |
debits |
number |
The increase (decrease) during the period in the value of other assets resulting from the ratemaking actions of regulatory agencies. |
description |
string |
Free-form description and purpose of other regulatory assets. |
ending_balance |
number |
Account balance at end of year. |
report_year |
integer |
Four-digit year in which the data was reported. |
starting_balance |
number |
Account balance at beginning of year. |
utility_id_ferc1 |
integer |
PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year. |
core_ferc1__yearly_other_regulatory_liabilities_sched278#
Noticed any irregularities with this table? File a report.
Annual time series of utilities’ other regulatory liabilities, including rate order docket number.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 278)
- Primary key:
This table has no primary key. Respondents are able to enter any number of liabilities across many rows. There are no IDs or set fields enforced in the original table.
Usage Warnings#
Contains columns which may appear categorical, but are actually free text.
Some columns contain subtotals; use caution when choosing columns to aggregate.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_other_regulatory_liabilities_sched278.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_other_regulatory_liabilities_sched278.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc1__yearly_other_regulatory_liabilities_sched278.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_other_regulatory_liabilities_sched278.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
account_detail |
string |
Description of the account number credited from making debit adjustment to other regulatory liabilities. |
additional_description |
string |
Additional free-form description column. In most instances, the values in this column are exactly the same as the description column,but it differs from the standard description enough that we kept both columns. This field did not exist prior to FERC publishing Form 1 as XBRL and thus is always null prior to 2021. |
decrease_in_other_regulatory_liabilities |
number |
The decrease during the reporting period of other regulatory liabilities. |
description |
string |
Long human-readable description of the meaning of a code/label. |
ending_balance |
number |
Account balance at end of year. |
increase_in_other_regulatory_liabilities |
number |
The increase during the reporting period of other regulatory liabilities. |
report_year |
integer |
Four-digit year in which the data was reported. |
starting_balance |
number |
Account balance at beginning of year. |
utility_id_ferc1 |
integer |
PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year. |
core_ferc1__yearly_plant_in_service_sched204#
Noticed any irregularities with this table? File a report.
Annual time series of utilities’ balances and changes to FERC Electric Plant in Service accounts.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 204)
- Primary key:
utility_id_ferc1, report_year, ferc_account_label
Usage Warnings#
Some columns contain subtotals; use caution when choosing columns to aggregate.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.
Additional Details#
Account numbers correspond to the FERC Uniform System of Accounts for Electric Plant, which is defined in Code of Federal Regulations (CFR) Title 18, Chapter I, Subchapter C, Part 101. (See e.g. https://www.law.cornell.edu/cfr/text/18/part-101). Each FERC respondent reports starting and ending balances for each account annually. Balances are organization wide, and are not broken down on a per-plant basis. End of year balance should equal beginning year balance plus the sum of additions, retirements, adjustments, and transfers.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_plant_in_service_sched204.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_plant_in_service_sched204.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc1__yearly_plant_in_service_sched204.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_plant_in_service_sched204.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
additions |
number |
Cost of acquisition of items classified within the account. |
adjustments |
number |
Cost of adjustments to the account. |
ending_balance |
number |
Account balance at end of year. |
ferc_account |
string |
Actual FERC Account number (e.g. ‘359.1’) if available, or a PUDL assigned ID when FERC accounts have been split or combined in reporting. |
ferc_account_label |
string |
Long FERC account identifier derived from values reported in the XBRL taxonomies. May also refer to aggregations of individual FERC accounts. |
plant_status |
string |
Utility plant financial status (in service, future, leased, total). |
record_id |
string |
Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped. |
report_year |
integer |
Four-digit year in which the data was reported. |
retirements |
number |
Cost of disposal of items classified within the account. |
row_type_xbrl |
string |
Indicates whether the value reported in the row is calculated, or uniquely reported within the table. |
starting_balance |
number |
Account balance at beginning of year. |
transfers |
number |
Cost of transfers into (out of) the account. |
utility_id_ferc1 |
integer |
PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year. |
utility_type |
string |
Listing of utility plant types. Examples include Electric Utility, Gas Utility, and Other Utility. |
core_ferc1__yearly_pumped_storage_plants_sched408#
Noticed any irregularities with this table? File a report.
Annual time series of plant statistics for hydroelectric pumped storage plants with an installed nameplate capacity of 10+ MW.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 408)
- Primary key:
This table has no primary key. The best approximation for primary keys for this table would be:
report_year,utility_id_ferc1,plant_name_ferc1. FERC does not publish plant IDs. The main identifying column isplant_name_ferc1but that is a free-form string field and there are duplicate records.
Usage Warnings#
FERC does not restrict respondents to report unique and non-duplicative plant records. There are sporadic instances of respondents reporting portions of plants and then the total plant (ex: unit 1, unit 2 and total). Use caution when aggregating.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.
Additional Details#
As reported in Schedule 408 of FERC Form 1.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_pumped_storage_plants_sched408.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_pumped_storage_plants_sched408.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc1__yearly_pumped_storage_plants_sched408.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_pumped_storage_plants_sched408.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
asset_retirement_cost |
number |
Asset retirement cost (USD). |
avg_num_employees |
number |
The average number of employees assigned to each plant. |
capacity_mw |
number |
Total installed (nameplate) capacity, in megawatts. |
capex_equipment_electric |
number |
Cost of plant: accessory electric equipment (USD). |
capex_equipment_misc |
number |
Cost of plant: miscellaneous power plant equipment (USD). |
capex_facilities |
number |
Cost of plant: reservoirs, dams, and waterways (USD). |
capex_land |
number |
Cost of plant: land and land rights (USD). |
capex_per_mw |
number |
Cost of plant per megawatt of installed (nameplate) capacity. Nominal USD. |
capex_roads |
number |
Cost of plant: roads, railroads, and bridges (USD). |
capex_structures |
number |
Cost of plant: structures and improvements (USD). |
capex_total |
number |
Total cost of plant (USD). |
capex_wheels_turbines_generators |
number |
Cost of plant: water wheels, turbines, and generators (USD). |
construction_type |
string |
Type of plant construction (‘outdoor’, ‘semioutdoor’, or ‘conventional’). Categorized by PUDL based on our best guess of intended value in FERC1 freeform strings. |
construction_year |
integer |
Year the plant’s oldest still operational unit was built. |
energy_used_for_pumping_mwh |
number |
Energy used for pumping, in megawatt-hours. |
installation_year |
integer |
Year the plant’s most recently built unit was installed. |
net_generation_mwh |
number |
Net electricity generation for the specified period in megawatt-hours (MWh). |
net_load_mwh |
number |
Net output for load (net generation - energy used for pumping) in megawatt-hours. |
opex_dams |
number |
Production expenses: maintenance of reservoirs, dams, and waterways (USD). |
opex_electric |
number |
Production expenses: electric expenses (USD). |
opex_engineering |
number |
Production expenses: maintenance, supervision, and engineering (USD). |
opex_generation_misc |
number |
Production expenses: miscellaneous power generation expenses (USD). |
opex_misc_plant |
number |
Production expenses: maintenance of miscellaneous hydraulic plant (USD). |
opex_operations |
number |
Production expenses: operations, supervision, and engineering (USD). |
opex_per_mwh |
number |
Total production expenses (USD per MWh generated). |
opex_plant |
number |
Production expenses: maintenance of electric plant (USD). |
opex_production_before_pumping |
number |
Total production expenses before pumping (USD). |
opex_pumped_storage |
number |
Production expenses: pumped storage (USD). |
opex_pumping |
number |
Production expenses: We are here to PUMP YOU UP! (USD). |
opex_rents |
number |
Production expenses: rents (USD). |
opex_structures |
number |
Production expenses: maintenance of structures (USD). |
opex_total |
number |
Total production expenses, excluding fuel (USD). |
opex_water_for_power |
number |
Production expenses: water for power (USD). |
peak_demand_mw |
number |
Net peak demand for 60 minutes. Note: in some cases peak demand for other time periods may have been reported instead, if hourly peak demand was unavailable. |
plant_capability_mw |
number |
Net plant capability in megawatts. |
plant_hours_connected_while_generating |
number |
Hours the plant was connected to load while generating in the report year. |
plant_name_ferc1 |
string |
Name of the plant, as reported to FERC. This is a freeform string, not guaranteed to be consistent across references to the same plant. |
project_num |
integer |
FERC Licensed Project Number. |
record_id |
string |
Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped. |
report_year |
integer |
Four-digit year in which the data was reported. |
utility_id_ferc1 |
integer |
PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year. |
core_ferc1__yearly_purchased_power_and_exchanges_sched326#
Noticed any irregularities with this table? File a report.
Annual time series of purchased power (Account 555) including power exchanges (transactions involving a balancing of debits and credits for energy, capacity, etc.) and any settlements for imbalanced exchanges.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 326)
- Primary key:
This table has no primary key. The primary key for this table would be record_id, however a small number of these values are duplicated in 2021, 2022, and 2024.
Usage Warnings#
Contains columns which may appear categorical, but are actually free text.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.
Additional Details#
This table has data about inter-utility power purchases. This includes how much electricity was purchased, how much it cost, and who it was purchased from. Unfortunately the field describing which other utility the power was being bought from (seller_name) is poorly standardized, making it difficult to correlate with other data.
Purchased Power is considered FERC Account 555 according to FERC’s Uniform System of Accounts. Reported on pages 326-327 of FERC Form 1.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_purchased_power_and_exchanges_sched326.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_purchased_power_and_exchanges_sched326.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc1__yearly_purchased_power_and_exchanges_sched326.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_purchased_power_and_exchanges_sched326.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
billing_demand_mw |
number |
Monthly average billing demand (for requirements purchases, and any transactions involving demand charges). In megawatts. |
coincident_peak_demand_mw |
number |
Average monthly coincident peak (CP) demand (for requirements purchases, and any transactions involving demand charges). Monthly CP demand is the metered demand during the hour (60-minute integration) in which the supplier’s system reaches its monthly peak. In megawatts. |
delivered_mwh |
number |
Gross megawatt-hours delivered in power exchanges and used as the basis for settlement. |
demand_charges |
number |
Demand charges (USD). |
energy_charges |
number |
Energy charges (USD). |
non_coincident_peak_demand_mw |
number |
Average monthly non-coincident peak (NCP) demand (for requirements purhcases, and any transactions involving demand charges). Monthly NCP demand is the maximum metered hourly (60-minute integration) demand in a month. In megawatts. |
other_charges |
number |
Other charges, including out-of-period adjustments (USD). |
purchase_type_code |
string |
Categorization based on the original contractual terms and conditions of the service. Must be one of ‘requirements’, ‘long_firm’, ‘intermediate_firm’, ‘short_firm’, ‘long_unit’, ‘intermediate_unit’, ‘electricity_exchange’, ‘other_service’, or ‘adjustment’. Requirements service is ongoing high reliability service, with load integrated into system resource planning. ‘Long term’ means 5+ years. ‘Intermediate term’ is 1-5 years. ‘Short term’ is less than 1 year. ‘Firm’ means not interruptible for economic reasons. ‘unit’ indicates service from a particular designated generating unit. ‘exchange’ is an in-kind transaction. |
purchased_mwh |
number |
Megawatt-hours shown on bills rendered to the respondent. Includes both electricity purchased for storage and non-storage purposes, which were lumped together prior to 2021. |
purchased_other_than_storage_mwh |
number |
Number of megawatt hours purchased during the period for other than energy storage. |
purchased_storage_mwh |
number |
Number of megawatt hours purchased during the period for energy storage. |
received_mwh |
number |
Gross megawatt-hours received in power exchanges and used as the basis for settlement. |
record_id |
string |
Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped. |
report_year |
integer |
Four-digit year in which the data was reported. |
seller_name |
string |
Name of the seller, or the other party in an exchange transaction. |
tariff |
string |
FERC Rate Schedule Number or Tariff. (Note: may be incomplete if originally reported on multiple lines.) |
total_settlement |
number |
Sum of demand, energy, and other charges (USD). For power exchanges, the settlement amount for the net receipt of energy. If more energy was delivered than received, this amount is negative. |
utility_id_ferc1 |
integer |
PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year. |
core_ferc1__yearly_retained_earnings_sched118#
Noticed any irregularities with this table? File a report.
Annual time series of utilities’ statements of retained earnings.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 118)
- Primary key:
utility_id_ferc1, report_year, earnings_type
Usage Warnings#
Some columns contain subtotals; use caution when choosing columns to aggregate.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_retained_earnings_sched118.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_retained_earnings_sched118.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc1__yearly_retained_earnings_sched118.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_retained_earnings_sched118.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
balance |
string |
Indication of whether a column is a credit or debit, as reported in the XBRL taxonomy. |
earnings_type |
string |
Label describing types of earnings. |
ending_balance |
number |
Account balance at end of year. |
ferc_account |
string |
Actual FERC Account number (e.g. ‘359.1’) if available, or a PUDL assigned ID when FERC accounts have been split or combined in reporting. |
record_id |
string |
Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped. |
report_year |
integer |
Four-digit year in which the data was reported. |
row_type_xbrl |
string |
Indicates whether the value reported in the row is calculated, or uniquely reported within the table. |
starting_balance |
number |
Account balance at beginning of year. |
utility_id_ferc1 |
integer |
PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year. |
utility_type |
string |
Listing of utility plant types. Examples include Electric Utility, Gas Utility, and Other Utility. |
core_ferc1__yearly_sales_by_rate_schedules_sched304#
Noticed any irregularities with this table? File a report.
Annual time series of utilities’ electric sales from all rate schedules in effect throughout the year.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 304)
- Primary key:
This table has no primary key.
Usage Warnings#
Some columns contain subtotals; use caution when choosing columns to aggregate.
Values in rate_schedule_description are free-form strings.
Data prior to 2021 does not include information in columns: rate_schedule_type and billing_status.
Units of revenue_per_kwh are suspected to include a mix of dollars and possibly cents.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.
Additional Details#
The pre-2021 data in this table (extracted from FoxProDB vs. XBRL) is extremely unstructured. Where the post-2020 data (from XBRL) sorts the data into rate schedule types: residential, industrial, commercial, public_lighting, public_authorities, railroads, interdepartmental, provision_for_rate_refund, commercial_and_industrial, total, and billing status: billed, unbilled, total, the pre-2021 data stuffs all of that information (if you’re lucky) into the rate_schedule_description column. There’s no point trying to parse through the pre 2021 rate_schedule_description column en masse because it’s just too messy. The contents of rate_schedule_description often contain numbers and acronyms that have little to no meaning out of context. The table is structured somewhat like the FERC1 small generators table with headings about rate structure type also embedded into the rate_schedule_description column. To all who dare, beware.
This table is a combination of one pre-2021 (DBF) table and nine post-2020 (XBRL) tables–one for each rate schedule type plus totals–hence increase in data clarity post-2020. The rate_schedule_type and billing_status columns are only relevant for post-2020 data as they can be reliably parsed from each of the tables and incorporated into columns. The rate_schedule_description is supposed to contain sub-rate_schedule_type names for charges (Ex: Residential 1, Residential 2, etc.). However, the pre-2021 data contains a little bit of everything (or nothing) and the post-2020 has some totals or wonky data thrown in. That’s to say, even when working with post-2020 data, be wary of aggregating the data. That’s what the “total” rows are for.
The values that come from from the totals table are marked with the string “total” in the rate_schedule_description column. The totals table is a product of the transition to XBRL, so these distinguishable totals are only available for data post-2020 (otherwise you could try keyword searching for “total” in rate_schedule_description). The total table contains two types of totals, the utility totals across all rate schedules in a given year (marked with rate_schedule_description = “total” and rate_schedule_type = “total”) and each of the utility’s individual rate schedule totals in a given year (marked with rate_schedule_description = “total” and rate_schedule_type = “residential” or any other rate schedule type).
The rate schedule based XBRL tables only report billed values whereas the total tables report billed, unbilled, and total values. (See the column description for more info on the difference between billed and unbilled). This is important to consider if you’re endeavoring to compare the subtotal values with the total values. We have not attempted to fix or verify any subtotals or totals that don’t add up.
Another important note is the possibility of unit discrepancies in certain columns. The revenue_per_kwh column does not specify reporting units, and closer inspection of the data reveals two clear peaks approximate two orders of magnitude apart. This indicates that values may be reported in both dollars and cents. However, because the price of energy per kwh varies so much regionally, we cannot guarantee which is which and have not put any cleaning mechanisms in place to account for this.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_sales_by_rate_schedules_sched304.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_sales_by_rate_schedules_sched304.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc1__yearly_sales_by_rate_schedules_sched304.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_sales_by_rate_schedules_sched304.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
avg_customers_per_month |
number |
Average number of customers per month. |
billing_status |
string |
Whether an amount is billed, unbilled, or both. Billed amounts pertain to the exchange of energy and unbilled amounts pertain to other sources of revenue such as contracts with peaker plants to keep them on standby or charging rent to host cell antennas on transmission towers. |
dollar_value |
number |
Dollar value of reported income, expense, asset, or liability. |
kwh_per_customer |
number |
kWh per customer. |
rate_schedule_description |
string |
Free-form description of what the rate schedule name is. Not standardized. Often a sub-category of rate_schedule_type. |
rate_schedule_type |
string |
Categorization of rate schedule type. |
record_id |
string |
Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped. |
report_year |
integer |
Four-digit year in which the data was reported. |
revenue_per_kwh |
number |
The amount of revenue per kWh by rate schedule acquired in the given report year. |
sales_mwh |
number |
Quantity of electricity sold in MWh. |
utility_id_ferc1 |
integer |
PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year. |
core_ferc1__yearly_small_plants_sched410#
Noticed any irregularities with this table? File a report.
Annual time series of plant statistics for internal combustion plants, gas turbine-plants, conventional hydro plants, and pumped storage plants with less than 10 MW installed nameplate capacity and steam plants with less than 25 MW installed nameplate capacity.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 410)
- Primary key:
This table has no primary key. The best approximation for primary keys for this table would be:
report_year,utility_id_ferc1,plant_name_ferc1. FERC does not publish plant IDs. The main identifying column isplant_name_ferc1but that is a free-form string field and there are duplicate records.
Usage Warnings#
FERC does not restrict respondents to report unique and non-duplicative plant records. There are sporadic instances of respondents reporting portions of plants and then the total plant (ex: unit 1, unit 2 and total). Use caution when aggregating.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.
Additional Details#
As reported on FERC Form 1 Schedule 410 (pages 410-411)
and extracted from the FERC Visual FoxPro and XBRL. See our
pudl.extract.ferc1.TABLE_NAME_MAP_FERC1 for links to the raw tables.
The raw version of this table is more like a digitized PDF than an actual data table. The rows contain lots of information in addition to what the columns might suggest. For instance, a single column may contain header rows, note rows, and total rows. This extraneous information is useful, but it prevents proper analysis when mixed in with the rest of the values data in the column. We employ a couple of data transformations to extract these rows from the data and preserve some of the information they contain (fuel type, plant type, FERC license, or general notes about the plant) in separate columns.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_small_plants_sched410.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_small_plants_sched410.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc1__yearly_small_plants_sched410.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_small_plants_sched410.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
capacity_mw |
number |
Total installed (nameplate) capacity, in megawatts. |
capex_per_mw |
number |
Cost of plant per megawatt of installed (nameplate) capacity. Nominal USD. |
capex_total |
number |
Total cost of plant (USD). |
construction_year |
integer |
Year the plant’s oldest still operational unit was built. |
fuel_cost_per_mmbtu |
number |
Average fuel cost per MMBTU of heat content in nominal USD. |
fuel_type |
string |
Type of fuel. |
license_id_ferc1 |
integer |
FERC issued operating license ID for the facility, if available. This value is extracted from the original plant name where possible. |
net_generation_mwh |
number |
Net electricity generation for the specified period in megawatt-hours (MWh). |
opex_fuel |
number |
Production expenses: fuel (USD). |
opex_maintenance |
number |
Production expenses: Maintenance (USD). |
opex_operations |
number |
Production expenses: operations, supervision, and engineering (USD). |
peak_demand_mw |
number |
Net peak demand for 60 minutes. Note: in some cases peak demand for other time periods may have been reported instead, if hourly peak demand was unavailable. |
plant_name_ferc1 |
string |
Name of the plant, as reported to FERC. This is a freeform string, not guaranteed to be consistent across references to the same plant. |
plant_type |
string |
Type of plant. |
record_id |
string |
Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped. |
report_year |
integer |
Four-digit year in which the data was reported. |
utility_id_ferc1 |
integer |
PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year. |
core_ferc1__yearly_steam_plants_fuel_sched402#
Noticed any irregularities with this table? File a report.
Annual time series of fuel cost and quantity for steam plants with a capacity of 25+ MW, internal combustion and gas-turbine plants of 10+ MW, and all nuclear plants.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 402)
- Primary key:
This table has no primary key.
Usage Warnings#
The
fuel_type_code_pudlis inferred from a free-form string field.FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.
Additional Details#
This table is a subset of the steam plant table reported on page 402 of FERC Form 1.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_steam_plants_fuel_sched402.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_steam_plants_fuel_sched402.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc1__yearly_steam_plants_fuel_sched402.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_steam_plants_fuel_sched402.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
fuel_consumed_units |
number |
Consumption of the fuel type in physical unit. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production. |
fuel_cost_per_mmbtu |
number |
Average fuel cost per MMBTU of heat content in nominal USD. |
fuel_cost_per_unit_burned |
number |
Average cost of fuel consumed in the report year per reported fuel unit (USD). |
fuel_cost_per_unit_delivered |
number |
Average cost of fuel delivered in the report year per reported fuel unit (USD). |
fuel_mmbtu_per_unit |
number |
Heat content of the fuel in millions of Btus per physical unit. |
fuel_type_code_pudl |
string |
Simplified fuel type code used in PUDL |
fuel_units |
string |
Reported unit of measure for fuel. |
plant_name_ferc1 |
string |
Name of the plant, as reported to FERC. This is a freeform string, not guaranteed to be consistent across references to the same plant. |
record_id |
string |
Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped. |
report_year |
integer |
Four-digit year in which the data was reported. |
utility_id_ferc1 |
integer |
PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year. |
core_ferc1__yearly_steam_plants_sched402#
Noticed any irregularities with this table? File a report.
Annual time series of plant statistics for steam plants with a capacity of 25+ MW, internal combustion and gas-turbine plants of 10+ MW, and all nuclear plants.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 402)
- Primary key:
This table has no primary key. The best approximation for primary keys for this table would be:
report_year,utility_id_ferc1,plant_name_ferc1. FERC does not publish plant IDs. The main identifying column isplant_name_ferc1but that is a free-form string field and there are duplicate records.
Usage Warnings#
FERC does not restrict respondents to report unique and non-duplicative plant records. There are sporadic instances of respondents reporting portions of plants and then the total plant (ex: unit 1, unit 2 and total). Use caution when aggregating.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_steam_plants_sched402.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_steam_plants_sched402.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc1__yearly_steam_plants_sched402.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_steam_plants_sched402.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
asset_retirement_cost |
number |
Asset retirement cost (USD). |
avg_num_employees |
number |
The average number of employees assigned to each plant. |
capacity_mw |
number |
Total installed (nameplate) capacity, in megawatts. |
capex_equipment |
number |
Cost of plant: equipment (USD). |
capex_land |
number |
Cost of plant: land and land rights (USD). |
capex_per_mw |
number |
Cost of plant per megawatt of installed (nameplate) capacity. Nominal USD. |
capex_structures |
number |
Cost of plant: structures and improvements (USD). |
capex_total |
number |
Total cost of plant (USD). |
construction_type |
string |
Type of plant construction (‘outdoor’, ‘semioutdoor’, or ‘conventional’). Categorized by PUDL based on our best guess of intended value in FERC1 freeform strings. |
construction_year |
integer |
Year the plant’s oldest still operational unit was built. |
installation_year |
integer |
Year the plant’s most recently built unit was installed. |
net_generation_mwh |
number |
Net electricity generation for the specified period in megawatt-hours (MWh). |
not_water_limited_capacity_mw |
number |
Plant capacity in MW when not limited by condenser water. |
opex_allowances |
number |
Allowances. |
opex_boiler |
number |
Maintenance of boiler (or reactor) plant. |
opex_coolants |
number |
Cost of coolants and water (nuclear plants only) |
opex_electric |
number |
Production expenses: electric expenses (USD). |
opex_engineering |
number |
Production expenses: maintenance, supervision, and engineering (USD). |
opex_fuel |
number |
Production expenses: fuel (USD). |
opex_misc_power |
number |
Miscellaneous steam (or nuclear) expenses. |
opex_misc_steam |
number |
Maintenance of miscellaneous steam (or nuclear) plant. |
opex_operations |
number |
Production expenses: operations, supervision, and engineering (USD). |
opex_per_mwh |
number |
Total production expenses (USD per MWh generated). |
opex_plants |
number |
Maintenance of electrical plant. |
opex_production_total |
number |
Total operating expenses. |
opex_rents |
number |
Production expenses: rents (USD). |
opex_steam |
number |
Steam expenses. |
opex_steam_other |
number |
Steam from other sources. |
opex_structures |
number |
Production expenses: maintenance of structures (USD). |
opex_transfer |
number |
Steam transferred (Credit). |
peak_demand_mw |
number |
Net peak demand for 60 minutes. Note: in some cases peak demand for other time periods may have been reported instead, if hourly peak demand was unavailable. |
plant_capability_mw |
number |
Net plant capability in megawatts. |
plant_hours_connected_while_generating |
number |
Hours the plant was connected to load while generating in the report year. |
plant_name_ferc1 |
string |
Name of the plant, as reported to FERC. This is a freeform string, not guaranteed to be consistent across references to the same plant. |
plant_type |
string |
Type of plant. |
record_id |
string |
Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped. |
report_year |
integer |
Four-digit year in which the data was reported. |
utility_id_ferc1 |
integer |
PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year. |
water_limited_capacity_mw |
number |
Plant capacity in MW when limited by condenser water. |
core_ferc1__yearly_transmission_lines_sched422#
Noticed any irregularities with this table? File a report.
Annual time series of statistics about transmission lines.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 422)
- Primary key:
This table has no primary key. Each record of this table is supposed to represent one stretch of a transmission line, but there are no IDs and many nulls in the fields which would nominally distinguish unique transmission lines.
Usage Warnings#
Contains columns which may appear categorical, but are actually free text.
Some columns contain subtotals; use caution when choosing columns to aggregate.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.
Additional Details#
Information describing transmission lines, the cost of lines, annual operating and capital expenses, etc. This table includes transmission lines having nominal voltage of 132 kilovolts or greater. Transmission lines below these voltages are required to be reported in group totals only for each voltage.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_transmission_lines_sched422.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_transmission_lines_sched422.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc1__yearly_transmission_lines_sched422.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_transmission_lines_sched422.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
capex_land |
number |
Cost of Land and land rights for the transmission line. |
capex_other |
number |
Construction and other costs for the transmission line. |
capex_total |
number |
Total costs for the transmission line. |
conductor_size_and_material |
string |
Size of transmission conductor and material of the transmission line. |
designed_voltage_kv |
number |
Manufactured (Designed) voltage, expressed in kilo-volts, for three-phase 60 cycle alternative current transmission lines |
end_point |
string |
The end point of a transmission line. |
num_transmission_circuits |
integer |
Number of circuits in a transmission line. |
operating_voltage_kv |
number |
The operating voltage, expressed kilo-volts, for three-phase 60 cycle alternative current transmission lines. |
opex_maintenance |
number |
Maintenance expenses for the transmission line. |
opex_operations |
number |
Operating expenses for the transmission line. |
opex_rents |
number |
Rent expenses for the transmission line. |
opex_total |
number |
Overall expenses for the transmission line. |
record_id |
string |
Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped. |
report_year |
integer |
Four-digit year in which the data was reported. |
start_point |
string |
The starting point of a transmission line. |
supporting_structure_type |
string |
Supporting structure of the transmission line. |
transmission_line_and_structures_length_miles |
number |
Length (in pole miles or circuit miles (if transmission lines are underground)) for lines that are agrregated with other lines / structures (whose cost are aggregated and combined with other structures). |
transmission_line_length_miles |
number |
Length (in pole miles or circuit miles (if transmission lines are underground)) for lines that are stand alone structures (whose cost are reported on a stand-alone basis). |
utility_id_ferc1 |
integer |
PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year. |
core_ferc1__yearly_utility_plant_summary_sched200#
Noticed any irregularities with this table? File a report.
Annual time series of utility plant and accumulated provisions for depreciation, amortization and depletion of utility plant assets.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 200)
- Primary key:
utility_id_ferc1, report_year, utility_type, utility_plant_asset_type
Usage Warnings#
Some columns contain subtotals; use caution when choosing columns to aggregate.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_utility_plant_summary_sched200.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_utility_plant_summary_sched200.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc1__yearly_utility_plant_summary_sched200.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_utility_plant_summary_sched200.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
ending_balance |
number |
Account balance at end of year. |
record_id |
string |
Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped. |
report_year |
integer |
Four-digit year in which the data was reported. |
row_type_xbrl |
string |
Indicates whether the value reported in the row is calculated, or uniquely reported within the table. |
utility_id_ferc1 |
integer |
PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year. |
utility_plant_asset_type |
string |
Type of utility plant asset reported in the core_ferc1__yearly_utility_plant_summary_sched200 table. Assets include those leased to others, held for future use, construction work-in-progress and details of accumulated depreciation. |
utility_type |
string |
Listing of utility plant types. Examples include Electric Utility, Gas Utility, and Other Utility. |
utility_type_other |
string |
Freeform description of type of utility reported in one of the other three other utility_type sections in the core_ferc1__yearly_utility_plant_summary_sched200 table. This field is reported only in the DBF reporting years (1994-2020). |
core_ferc714__hourly_planning_area_demand#
Noticed any irregularities with this table? File a report.
Hourly time series of electricity demand by planning area.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
FERC Form 714 – Annual Electric Balancing Authority Area and Planning Area Report (Part III, Schedule 2a)
- Primary key:
respondent_id_ferc714, datetime_utc
Usage Warnings#
The datetime_utc timestamps have been cleaned due to inconsistent datetime reporting. See below for additional details.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.
Additional Details#
This table includes data from the pre-2021 CSV raw source as well as the newer 2021 through present XBRL raw source.
This table includes three respondent ID columns: one from the CSV raw source, one from the XBRL raw source and another that is PUDL-derived that links those two source ID’s together. This table has filled in source IDs for all records so you can select the full timeseries for a given respondent from any of these three IDs.
An important caveat to note is that there was some cleaning done to the datetime_utc timestamps. The Form 714 includes sparse documentation for respondents for how to interpret timestamps - the form asks respondents to provide 24 instances of hourly demand for each day. The form is labeled with hour 1-24. There is no indication if hour 1 begins at midnight.
The XBRL data contained several formats of timestamps. Most records corresponding to hour 1 of the Form have a timestamp with hour 1 as T1. About two thirds of the records in the hour 24 location of the form have a timestamp with an hour reported as T24 while the remaining third report this as T00 of the next day. T24 is not a valid format for the hour of a datetime, so we convert these T24 hours into T00 of the next day. A smaller subset of the respondents reports the 24th hour as the last second of the day - we also convert these records to the T00 of the next day.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc714__hourly_planning_area_demand.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc714__hourly_planning_area_demand.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc714__hourly_planning_area_demand.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc714__hourly_planning_area_demand.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
datetime_utc |
datetime |
Date and time converted to Coordinated Universal Time (UTC). |
demand_mwh |
number |
Electricity demand (energy) within a given timeframe. |
report_date |
date |
Date reported. |
respondent_id_ferc714 |
integer |
PUDL-assigned identifying a respondent to FERC Form 714. This ID associates natively reported respondent IDs from the original CSV and XBRL data sources. |
respondent_id_ferc714_csv |
integer |
FERC Form 714 respondent ID from CSV reported data - published from years: 2006-2020. This ID is linked to the newer years of reported XBRL data through the PUDL-assigned respondent_id_ferc714 ID. This ID was originally reported as respondent_id. Note that this ID does not correspond to FERC respondent IDs from other forms. |
respondent_id_ferc714_xbrl |
string |
FERC Form 714 respondent ID from XBRL reported data - published from years: 2021-present. This ID is linked to the older years of reported CSV data through the PUDL-assigned respondent_id_ferc714 ID. This ID was originally reported as entity_id. Note that this ID does not correspond to FERC respondent IDs from other forms. |
timezone |
string |
IANA timezone name |
core_ferc714__respondent_id#
Noticed any irregularities with this table? File a report.
Respondent identification.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
FERC Form 714 – Annual Electric Balancing Authority Area and Planning Area Report (Part I, Schedule 1)
- Primary key:
respondent_id_ferc714
Usage Warnings#
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc714__respondent_id.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc714__respondent_id.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc714__respondent_id.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc714__respondent_id.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
eia_code |
integer |
EIA utility or balancing area authority ID associated with this FERC Form 714 respondent. Note that many utilities are also balancing authorities and in many cases EIA uses the same integer ID to identify a utility in its role as a balancing authority AND as a utility, but there is no requirement that these IDs be the same, and in a number of cases they are different. |
respondent_id_ferc714 |
integer |
PUDL-assigned identifying a respondent to FERC Form 714. This ID associates natively reported respondent IDs from the original CSV and XBRL data sources. |
respondent_id_ferc714_csv |
integer |
FERC Form 714 respondent ID from CSV reported data - published from years: 2006-2020. This ID is linked to the newer years of reported XBRL data through the PUDL-assigned respondent_id_ferc714 ID. This ID was originally reported as respondent_id. Note that this ID does not correspond to FERC respondent IDs from other forms. |
respondent_id_ferc714_xbrl |
string |
FERC Form 714 respondent ID from XBRL reported data - published from years: 2021-present. This ID is linked to the older years of reported CSV data through the PUDL-assigned respondent_id_ferc714 ID. This ID was originally reported as entity_id. Note that this ID does not correspond to FERC respondent IDs from other forms. |
respondent_name_ferc714 |
string |
Name of the utility, balancing area authority, or planning authority responding to FERC Form 714. |
core_ferc714__yearly_planning_area_demand_forecast#
Noticed any irregularities with this table? File a report.
Annual time series of 10-year forecasted summer and winter peak demand and annual net energy per planning area.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
FERC Form 714 – Annual Electric Balancing Authority Area and Planning Area Report (Part III, Schedule 2b)
- Primary key:
respondent_id_ferc714, report_year, forecast_year
Usage Warnings#
This table includes data from the pre-2021 CSV raw source as well as the newer 2021 through present XBRL raw source. We created the respondent_id_ferc714 field to blend disparate IDs from the CSV and XBRL data over time. See the core_ferc714__respondent_id table for links to the original source IDs.
There is a small handful of respondents (~11) that report more than 10 years and an even smaller handful that report less than 10 (~9).
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.
Additional Details#
This table contains forecasted net demand (MWh) as well as summer and winter peak demand (MW) for the next ten years after after the report_year.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc714__yearly_planning_area_demand_forecast.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc714__yearly_planning_area_demand_forecast.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc714__yearly_planning_area_demand_forecast.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc714__yearly_planning_area_demand_forecast.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
forecast_year |
integer |
Four-digit year that applies to a particular forecasted value. |
net_demand_forecast_mwh |
number |
Net forecasted electricity demand for the specific period in megawatt-hours (MWh). |
report_year |
integer |
Four-digit year in which the data was reported. |
respondent_id_ferc714 |
integer |
PUDL-assigned identifying a respondent to FERC Form 714. This ID associates natively reported respondent IDs from the original CSV and XBRL data sources. |
summer_peak_demand_forecast_mw |
number |
The maximum forecasted hourly sumemr load (for the months of June through September). |
winter_peak_demand_forecast_mw |
number |
The maximum forecasted hourly winter load (for the months of January through March). |
core_ferc__codes_accounts#
Noticed any irregularities with this table? File a report.
Code table containing descriptions of categorical codes for account numbers from the FERC Uniform System of Accounts for Electric Plant.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
FERC – Mix of multiple FERC Forms
- Primary key:
ferc_account_id
Usage Warnings#
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.
Additional Details#
These codes are defined in Code of Federal Regulations (CFR) Title 18, Chapter I, Subchapter C, Part 101. (See e.g. https://www.law.cornell.edu/cfr/text/18/part-101).
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc__codes_accounts.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc__codes_accounts.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc__codes_accounts.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc__codes_accounts.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
ferc_account_description |
string |
Description of the FERC account. |
ferc_account_id |
string |
Account identifier from FERC’s Uniform System of Accounts for Electric Plant. Includes higher level labeled categories. |
core_ferc__entity_companies#
Noticed any irregularities with this table? File a report.
Entity table containing static information about the companies that submit required filings to FERC.
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
FERC – Mix of multiple FERC Forms
- Primary key:
company_id_ferc
Usage Warnings#
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc__entity_companies.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc__entity_companies.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc__entity_companies.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_ferc__entity_companies.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
address_2 |
string |
Second line of the address. |
city |
string |
Name of the city. |
company_id_ferc |
string |
The Company Identifier (CID) obtained through FERC’s Company Registration system. |
company_name |
string |
The name of the FERC-reporting organization or company. |
company_website |
string |
The website URL of the company, which can provide additional information about the organization. |
program |
string |
The specific program or initiative associated with the FERC organization, which can provide context for the company’s activities. |
state |
string |
Two letter US state abbreviation. |
street_address |
string |
Physical street address. |
zip_code |
string |
Five digit US Zip Code. |
zip_code_4 |
string |
Four digit US Zip Code suffix. |
core_ferceqr__contracts#
Noticed any irregularities with this table? File a report.
Contains information about contracts between companies selling and buying electricity market products.
- Most-recent data:
2026q1
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
FERC Form 920 – Electric Quarterly Report (EQR)
- Primary key:
This table has no primary key. The FERC EQR contracts table has no natural primary key.
Usage Warnings#
This table is experimental and/or a work in progress and may change in the future.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.
Access methods#
Note
This table is partitioned into multiple Parquet files which can be treated as a single Parquet dataset.
import pandas as pd
# Select all Parquet files under this bucket prefix.
df = pd.read_parquet(
"s3://pudl.catalyst.coop/ferceqr/core_ferceqr__contracts/",
dtype_backend="pyarrow",
)
import polars as pl
# Select all Parquet files under this bucket prefix.
df = pl.read_parquet(
"s3://pudl.catalyst.coop/ferceqr/core_ferceqr__contracts/",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
-- Use a wildcard to select all parquet files with this bucket prefix.
SELECT * FROM 's3://pudl.catalyst.coop/ferceqr/core_ferceqr__contracts/*.parquet';
library(arrow)
library(dplyr)
# Establish S3 connection
bucket <- s3_bucket("pudl.catalyst.coop/ferceqr/core_ferceqr__contracts")
# Read all parquet files in that prefix
df <- open_dataset(bucket) |> collect()
Columns#
Field Name |
Type |
Description |
|---|---|---|
actual_termination_date |
date |
The date the contract actually terminates. |
begin_date |
datetime |
First date and time for the sale of the product at the rate specified. |
class_name |
string |
F - Firm: For transmission sales, a service or product that always has priority over non-firm service. For power sales, a service or product that is not interruptible for economic reasons. NF - Non-firm: For transmission sales, a service that is reserved and/or scheduled on an as-available basis and is subject to curtailment or interruption at a lesser priority compared to Firm service. For an energy sale, a service or product for which delivery or receipt of the energy may be interrupted for any reason or no reason, without liability on the part of either the buyer or seller. UP - Unit Power Sale: Designates a dedicated sale of energy and capacity from one or more than one specified generation unit(s). N/A: To be used only when the other available Class Names do not apply. |
commencement_date_of_contract_term |
date |
The date the terms of the contract reported in fields 18, 23 and 25 through 44 (as defined in the data dictionary) became effective. If those terms became effective on multiple dates (i.e., due to one or more amendments), the date to be reported in this field is the date the most recent amendment became effective. If the contract or the most recent reported amendment does not have an effective date, the date when service began pursuant to the contract or most recent reported amendment may be used. If the terms reported in fields 18, 23 and 25 through 44 have not been amended since January 1, 2009, the initial date the contract became effective (or absent an effective date the initial date when service began) may be used. |
contract_affiliate |
boolean |
If True, this field indicates the customer is an affiliate of the seller. The customer is an affiliate if it controls, is controlled by, or is under common control with the seller. This includes a division that operates as a functional unit. A customer of a seller who is an Exempt Wholesale Generator may be defined as an affiliate under the Public Utility Holding Company Act and the FPA. |
contract_execution_date |
date |
The date the contract was signed. If the parties signed on different dates, use the most recent date signed. |
contract_service_agreement_id |
string |
Unique identifier given to each service agreement that can be used by the Seller to produce the agreement, if requested. The identifier may be the number assigned by FERC for those service agreements that have been filed with and accepted by the Commission, or it may be generated as part of an internal identification system. |
contract_termination_date |
date |
The date that the contract expires. |
contract_unique_id |
string |
An identifier beginning with the letter “C” and followed by a number (e.g., ‘C1’, ‘C2’) used to designate a record containing contract information. Note that these contract IDs may only be unique within the context of a particular seller, seller-buyer pair, or timeframe. FERC documentation of the field is limited. |
customer_company_name |
string |
The name of the purchaser of contract products and services. Unlike the seller_company_name this name is not guaranteed to match a name in the core_ferceqr__quarterly_identity name. In addition, the same customer company may appear with different names in different contracts and transactions, since this field is an unconstrained string chosen by the seller. |
end_date |
datetime |
Last date and time for the sale of the product at the rate specified. |
extension_provision_description |
string |
Description of terms that provide for the continuation of the contract. |
ferc_tariff_reference |
string |
The FERC tariff reference cites the document that specifies the terms and conditions under which a Seller is authorized to make transmission sales, power sales or sales of related jurisdictional services at cost-based rates or at market-based rates. If the sales are market-based, the tariff that is specified in the FERC order granting the Seller Market Based Rate Authority must be listed. If a non-public utility does not have a FERC Tariff Reference, it should enter “NPU” for the FERC Tariff Reference. |
increment_name |
string |
5: Terms of the contract (if specifically noted in the contract) set for more than 0 minutes and less than or equal to 5 minutes (> 0 and ≤ 5 minutes). 15: Terms of the contract (if specifically noted in the contract) set for more than 5 minutes and less than or equal to 15 minutes (> 5 and ≤ 15 minutes). H: Terms of the contract (if specifically noted in the contract) set for more than 15 minutes and less than or equal to 6 hours (> 15 minutes and ≤ 6 hours). D: Terms of the contract (if specifically noted in the contract) set for more than 6 and up to 60 hours (> 6 and ≤ 60 hours). W: Terms of the contract (if specifically noted in the contract) set for over 60 hours and up to 168 hours (> 60 and ≤ 168 hours). M: Terms of the contract (if specifically noted in the contract) set for more than 168 hours up to, but not including, one year (> 168 hours and < 1 year). Y: Terms of the contract (if specifically noted in the contract) set for one year or more (≥ 1 year). N/A: Terms of the contract do not specify an increment. |
increment_peaking_name |
string |
FP: The product described may be sold during those hours designated as on-peak and off-peak at the point of delivery. OP: The product described may be sold only during those hours designated as off-peak at the point of delivery. P: The product described may be sold only during those hours designated as on-peak at the point of delivery. N/A: To be used only when the increment peaking name is not specified in the contract. |
point_of_delivery_balancing_authority |
string |
The registered Balancing Authority (formerly called NERC Control Area) where a jurisdictional product is delivered and/or service ends for a transmission or transmission-related jurisdictional sale. The Balancing Authority will be identified with the abbreviation used in OASIS applications. If delivery occurs at the interconnection of two control areas, the control area that the product is entering should be used. If delivery occurs at a trading hub, the term ‘Hub’ should be used. |
point_of_delivery_specific_location |
string |
The specific location at which the product is delivered if designated in the contract. If receipt occurs at a trading hub, a standardized hub name must be used. |
point_of_receipt_balancing_authority |
string |
The registered Balancing Authority (formerly called NERC Control Area) where service begins for a transmission or transmission-related jurisdictional sale. The Balancing Authority will be identified with the abbreviation used in OASIS applications. If receipt occurs at a trading hub, the term ‘Hub’ should be used. |
point_of_receipt_specific_location |
string |
The specific location at which the product is received if designated in the contract. If receipt occurs at a trading hub, a standardized hub name must be used. If more points of receipt are listed in the contract than can fit into the 50 character space, a description of the collection of points may be used. ‘Various’ alone, is unacceptable unless the contract itself uses that terminology. |
product_name |
string |
Description of product being offered. Note that allowed values differ slightly from those in core_ferceqr__transactions. BLACK START SERVICE: Service available after a system-wide blackout where a generator participates in system restoration activities without the availability of an outside electric supply (Ancillary Service). CAPACITY: A quantity of demand that is charged on a $/KW or $/MW basis. CUSTOMER CHARGE: Fixed contractual charges assessed on a per customer basis that could include billing service. DIRECT ASSIGNMENT FACILITIES CHARGE: Charges for facilities or portions of facilities that are constructed or used for the sole use/benefit of a particular customer. EMERGENCY ENERGY: Contractual provisions to supply energy or capacity to another entity during critical situations. ENERGY: A quantity of electricity that is sold or transmitted over a period of time. ENERGY IMBALANCE: Service provided when a difference occurs between the scheduled and the actual delivery of energy to a load obligation (Ancillary Service). For Contracts, reported if the contract provides for sale of the product. For Transactions, sales by third-party providers (i.e., non-transmission function) are reported. EXCHANGE: Transaction whereby the receiver accepts delivery of energy for a supplier’s account and returns energy at times, rates, and in amounts as mutually agreed if the receiver is not an RTO/ISO. FUEL CHARGE: Charge based on the cost or amount of fuel used for generation. GENERATOR IMBALANCE: Service provided when a difference occurs between the output of a generator located in the Transmission Provider’s Control Area and a delivery schedule from that generator to (1) another Control Area or (2) a load within the Transmission Provider’s Control Area over a single hour (Ancillary Service). For Contracts, reported if the contract provides for sale of the product. For Transactions, sales by third-party providers (i.e., non-transmission function) are reported. GRANDFATHERED BUNDLED: Services provided for bundled transmission, ancillary services and energy under contracts effective prior to Order No. 888’s OATTs. INTERCONNECTION AGREEMENT: Contract that provides the terms and conditions for a generator, distribution system owner, transmission owner, transmission provider, or transmission system to physically connect to a transmission system or distribution system. MEMBERSHIP AGREEMENT: Agreement to participate and be subject to rules of a system operator. MUST RUN AGREEMENT: An agreement that requires a unit to run. NEGOTIATED-RATE TRANSMISSION: Transmission performed under a negotiated rate contract (applies only to merchant transmission companies). NETWORK: Transmission service under contract providing network service. NETWORK OPERATING AGREEMENT: An executed agreement that contains the terms and conditions under which a network customer operates its facilities and the technical and operational matters associated with the implementation of network integration transmission service. OTHER: Product name not otherwise included. POINT-TO-POINT AGREEMENT: Transmission service under contract between specified Points of Receipt and Delivery. PRIMARY FREQUENCY RESPONSE: Service provided as a stand-by resource to support autonomous, pre-programmed changes in output to rapidly arrest large changes in frequency until dispatched resources can take over. REACTIVE SUPPLY & VOLTAGE CONTROL: Production or absorption of reactive power to maintain voltage levels on transmission systems (Ancillary Service). REAL POWER TRANSMISSION LOSS: The loss of energy, resulting from transporting power over a transmission system. REASSIGNMENT AGREEMENT: Transmission capacity reassignment agreement. REGULATION & FREQUENCY RESPONSE: Service providing for continuous balancing of resources (generation and interchange) with load, and for maintaining scheduled interconnection frequency by committing on-line generation where output is raised or lowered and by other non-generation resources capable of providing this service as necessary to follow the moment-by-moment changes in load (Ancillary Service). For Contracts, reported if the contract provides for sale of the product. For Transactions, sales by third-party providers (i.e., non-transmission function) are reported. REQUIREMENTS SERVICE: Firm, load-following power supply necessary to serve a specified share of customer’s aggregate load during the term of the agreement. Requirements service may include some or all of the energy, capacity and ancillary service products. SCHEDULE SYSTEM CONTROL & DISPATCH: Scheduling, confirming and implementing an interchange schedule with other Balancing Authorities, including intermediary Balancing Authorities providing transmission service, and ensuring operational security during the interchange transaction (Ancillary Service). SPINNING RESERVE: Unloaded synchronized generating capacity that is immediately responsive to system frequency and that is capable of being loaded in a short time period or non-generation resources capable of providing this service (Ancillary Service). For Contracts, reported if the contract provides for sale of the product. For Transactions, sales by third-party providers (i.e., non-transmission function) are reported. SUPPLEMENTAL RESERVE: Service needed to serve load in the event of a system contingency, available with greater delay than SPINNING RESERVE. This service may be provided by generating units that are on-line but unloaded, by quick-start generation, or by interruptible load or other non-generation resources capable of providing this service (Ancillary Service). For Contracts, reported if the contract provides for sale of the product. For Transactions, sales by third-party providers (i.e., non-transmission function) are reported. SYSTEM OPERATING AGREEMENTS: An executed agreement that contains the terms and conditions under which a system or network customer shall operate its facilities and the technical and operational matters associated with the implementation of network. TOLLING ENERGY: Energy sold from a plant whereby the buyer provides fuel to a generator (seller) and receives power in return for pre-established fees. TRANSMISSION OWNERS AGREEMENT: The agreement that establishes the terms and conditions under which a transmission owner transfers operational control over designated transmission facilities. UPLIFT: A make-whole payment by an RTO/ISO to a utility. |
product_type_name |
string |
CB: Energy, capacity or ancillary services sold under a FERC-approved cost-based rate tariff. CR: An agreement under which a transmission provider sells, assigns or transfers all or portion of its rights to an eligible customer. CR-AD: Transmission capacity reassignments reported in Atlantic Daylight time. CR-AP: Transmission capacity reassignments reported in Atlantic Prevailing time. CR-AS: Transmission capacity reassignments reported in Atlantic Standard time. CR-CD: Transmission capacity reassignments reported in Central Daylight time. CR-CP: Transmission capacity reassignments reported in Central Prevailing time. CR-CS: Transmission capacity reassignments reported in Central Standard time. CR-ED: Transmission capacity reassignments reported in Eastern Daylight time. CR-EP: Transmission capacity reassignments reported in Eastern Prevailing time. CR-ES: Transmission capacity reassignments reported in Eastern Standard time. CR-MD: Transmission capacity reassignments reported in Mountain Daylight time. CR-MP: Transmission capacity reassignments reported in Mountain Prevailing time. CR-MS: Transmission capacity reassignments reported in Mountain Standard time. CR-PD: Transmission capacity reassignments reported in Pacific Daylight time. CR-PP: Transmission capacity reassignments reported in Pacific Prevailing time. CR-PS: Transmission capacity reassignments reported in Pacific Standard time. MB: Energy, capacity or ancillary services sold under the seller’s FERC-approved market-based rate tariff. T: The product is sold under a FERC-approved transmission tariff. NPU: The product is sold by a non-public utility that is required to file the EQR under section 220 of the Federal Power Act. OTHER: The product cannot be characterized by the other product type names. |
quantity |
number |
Quantity for the contract product identified. |
rate |
number |
The charge for the product per unit as stated in the contract. |
rate_description |
string |
Text description of rate. If the rate is currently available on the FERC website, a citation of the FERC Accession Number and the relevant FERC tariff including page number or section may be included instead of providing the entire rate algorithm. If the rate is not available on the FERC website, include the rate algorithm, if rate is calculated. If the algorithm would exceed the 300 character field limit, it may be provided in a descriptive summary (including bases and methods of calculations) with a detailed citation of the relevant FERC tariff including page number and section. |
rate_maximum |
number |
Maximum rate to be charged per the contract, if a range is specified. |
rate_minimum |
number |
Minimum rate to be charged per the contract, if a range is specified. |
rate_units |
string |
Measure stated in the contract for the product sold. FERC EQR. |
seller_company_id_ferc |
string |
The Company Identifier (CID) obtained through FERC’s Company Registration system corresponding to the selling company. |
seller_company_name |
string |
The name of the company that is authorized to make sales as indicated in the company’s FERC tariff(s) or that is required to file the EQR under section 220 of the Federal Power Act. This name must match the name provided as seller_company_name in the core_ferceqr__quarterly_identity table. There are a handful of cases in which this requirement is violated, so any joins between tables should rely on company_id_ferc, not the company names. |
term_name |
string |
Contracts with durations of one year or greater are long-term (LT). Contracts with shorter durations are short-term (ST). |
units |
string |
Measure stated in the contract for the product sold. |
year_quarter |
string |
Year-quarter corresponding to record. Formatted like YYYYq{1-4}. |
core_ferceqr__quarterly_identity#
Noticed any irregularities with this table? File a report.
Quarterly time series of individuals who filed FERC EQR for a company in a given quarter.
- Most-recent data:
2026q1
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
FERC Form 920 – Electric Quarterly Report (EQR)
- Primary key:
This table has no primary key. The primary key ought to be [‘year_quarter’, ‘company_id_ferc’, ‘filer_unique_id’], where filer_unique_id is an employee-level ID. However, a handful of companies have erroneously reported the same filer_unique_id for multiple employees, resulting in duplicate records. In other cases, there appear to be multiple filings in a given quarter for the same company and filer, resulting in additional duplicates. Thus, there is no reliable natural primary key for the identity table.
Usage Warnings#
This table is experimental and/or a work in progress and may change in the future.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.
Access methods#
Note
This table is partitioned into multiple Parquet files which can be treated as a single Parquet dataset.
import pandas as pd
# Select all Parquet files under this bucket prefix.
df = pd.read_parquet(
"s3://pudl.catalyst.coop/ferceqr/core_ferceqr__quarterly_identity/",
dtype_backend="pyarrow",
)
import polars as pl
# Select all Parquet files under this bucket prefix.
df = pl.read_parquet(
"s3://pudl.catalyst.coop/ferceqr/core_ferceqr__quarterly_identity/",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
-- Use a wildcard to select all parquet files with this bucket prefix.
SELECT * FROM 's3://pudl.catalyst.coop/ferceqr/core_ferceqr__quarterly_identity/*.parquet';
library(arrow)
library(dplyr)
# Establish S3 connection
bucket <- s3_bucket("pudl.catalyst.coop/ferceqr/core_ferceqr__quarterly_identity")
# Read all parquet files in that prefix
df <- open_dataset(bucket) |> collect()
Columns#
Field Name |
Type |
Description |
|---|---|---|
company_id_ferc |
string |
The Company Identifier (CID) obtained through FERC’s Company Registration system. |
company_name |
string |
Name of the reporting company. |
contact_address |
string |
Street address for contact identified in contact_name. |
contact_city |
string |
City for contact identified in contact_name. |
contact_country_name |
string |
Country (USA, Canada, Mexico, or United Kingdom) for contact identified in contact_name. |
contact_email |
string |
Email for contact identified in contact_name. |
contact_name |
string |
(Seller) – The name of the contact for the company authorized to make sales as indicated in the company’s FERC tariff(s) or that is required to file the EQR under section 220 of the Federal Power Act. (Agent) – Name of the person who prepared the filing |
contact_phone |
string |
Phone number for contact identified in contact_name. |
contact_state |
string |
State for contact identified in contact_name. |
contact_title |
string |
Title of of utility contact 1. |
contact_zip |
string |
Zip code for contact identified in contact_name. |
filer_unique_id |
string |
(Seller) An identifier (e.g., “FS1”, “FS2”) used to designate a record containing Seller identification information. One record for each seller company must be included in an EQR for a given quarter. (Agent) – An identifier (i.e., “FA1”) used to designate a record containing Agent identification information. One record with the FA1 identifier must be included in an EQR for a given quarter. |
transactions_reported_to_index_price_publishers |
boolean |
Sellers should indicate whether they have reported their sales transactions to index price publisher(s). If they have, Sellers should indicate specifically which index publisher(s) in Field Number 73. |
year_quarter |
string |
Year-quarter corresponding to record. Formatted like YYYYq{1-4}. |
core_ferceqr__quarterly_index_pub#
Noticed any irregularities with this table? File a report.
Quarterly time series of electricity market price indices that individual EQR filers reported transactions to.
- Most-recent data:
2026q1
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
FERC Form 920 – Electric Quarterly Report (EQR)
- Primary key:
year_quarter, company_id_ferc, filer_unique_id
Usage Warnings#
This table is experimental and/or a work in progress and may change in the future.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.
Access methods#
Note
This table is partitioned into multiple Parquet files which can be treated as a single Parquet dataset.
import pandas as pd
# Select all Parquet files under this bucket prefix.
df = pd.read_parquet(
"s3://pudl.catalyst.coop/ferceqr/core_ferceqr__quarterly_index_pub/",
dtype_backend="pyarrow",
)
import polars as pl
# Select all Parquet files under this bucket prefix.
df = pl.read_parquet(
"s3://pudl.catalyst.coop/ferceqr/core_ferceqr__quarterly_index_pub/",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
-- Use a wildcard to select all parquet files with this bucket prefix.
SELECT * FROM 's3://pudl.catalyst.coop/ferceqr/core_ferceqr__quarterly_index_pub/*.parquet';
library(arrow)
library(dplyr)
# Establish S3 connection
bucket <- s3_bucket("pudl.catalyst.coop/ferceqr/core_ferceqr__quarterly_index_pub")
# Read all parquet files in that prefix
df <- open_dataset(bucket) |> collect()
Columns#
Field Name |
Type |
Description |
|---|---|---|
company_id_ferc |
string |
The Company Identifier (CID) obtained through FERC’s Company Registration system. |
filer_unique_id |
string |
(Seller) An identifier (e.g., “FS1”, “FS2”) used to designate a record containing Seller identification information. One record for each seller company must be included in an EQR for a given quarter. (Agent) – An identifier (i.e., “FA1”) used to designate a record containing Agent identification information. One record with the FA1 identifier must be included in an EQR for a given quarter. |
index_price_publisher_name |
string |
Name of index price publisher, which can be one of the following: AM - Argus Media, EIG - Energy Intelligence Group, Inc., IP - Intelligence Press, P - Platts, B - Bloomberg, PDX - Powerdex, SNL - SNL Energy |
seller_company_name |
string |
The name of the company that is authorized to make sales as indicated in the company’s FERC tariff(s) or that is required to file the EQR under section 220 of the Federal Power Act. This name must match the name provided as seller_company_name in the core_ferceqr__quarterly_identity table. There are a handful of cases in which this requirement is violated, so any joins between tables should rely on company_id_ferc, not the company names. |
transactions_reported |
string |
Description of the types of transactions reported to the index publisher identified in this record. |
year_quarter |
string |
Year-quarter corresponding to record. Formatted like YYYYq{1-4}. |
core_ferceqr__transactions#
Noticed any irregularities with this table? File a report.
Contains information about individual electricity market transactions that took place during a given reporting quarter. Reported by the seller.
- Most-recent data:
2026q1
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
FERC Form 920 – Electric Quarterly Report (EQR)
- Primary key:
year_quarter, seller_company_id_ferc, transaction_unique_id
Usage Warnings#
This table is experimental and/or a work in progress and may change in the future.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.
Access methods#
Note
This table is partitioned into multiple Parquet files which can be treated as a single Parquet dataset.
Warning
This table will likely not fit in your computer’s memory.
Make sure to filter or aggregate to reduce the amount of data read in.
This is easier in polars and duckdb than in pandas,
so we only provide examples for these.
import polars as pl
# Select some data relevant to tracking solar PPA prices
df = (
# Limit query to data from a single quarterly Parquet file
pl.scan_parquet(
"s3://pudl.catalyst.coop/ferceqr/core_ferceqr__transactions/2013q3.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
.select(
[
"seller_company_name",
"product_name",
"price",
"standardized_price",
"transaction_quantity",
"standardized_quantity",
"total_transmission_charge",
"total_transaction_charge",
"rate_units",
"trade_date"
]
)
.filter(
pl.col("seller_company_name").str.contains("(?i)solar")
& pl.col("standardized_price").is_not_null()
& (pl.col("product_name") == "ENERGY")
& (pl.col("rate_units") == "$/MWH")
& (pl.col("trade_date") >= 0)
)
.collect()
)
-- Select some data relevant to tracking solar PPA prices
SELECT
-- Select a subset of columns to reduce network and memory usage
seller_company_name,
product_name,
price,
standardized_price,
transaction_quantity,
standardized_quantity,
total_transmission_charge,
total_transaction_charge,
rate_units,
trade_date
-- Limit to 2013Q3 data by querying only a single Parquet file
FROM 's3://pudl.catalyst.coop/ferceqr/core_ferceqr__transactions/2013q3.parquet'
WHERE seller_company_name ILIKE '%solar%'
AND standardized_price IS NOT NULL
AND product_name = 'ENERGY'
AND rate_units = '$/MWH'
AND trade_date >= '1970-01-01'::DATE;
Columns#
Field Name |
Type |
Description |
|---|---|---|
class_name |
string |
F - Firm: For transmission sales, a service or product that always has priority over non-firm service. For power sales, a service or product that is not interruptible for economic reasons. NF - Non-firm: For transmission sales, a service that is reserved and/or scheduled on an as-available basis and is subject to curtailment or interruption at a lesser priority compared to Firm service. For an energy sale, a service or product for which delivery or receipt of the energy may be interrupted for any reason or no reason, without liability on the part of either the buyer or seller. UP - Unit Power Sale: Designates a dedicated sale of energy and capacity from one or more than one specified generation unit(s). N/A: To be used only when the other available Class Names do not apply. |
contract_service_agreement_id |
string |
Unique identifier given to each service agreement that can be used by the Seller to produce the agreement, if requested. The identifier may be the number assigned by FERC for those service agreements that have been filed with and accepted by the Commission, or it may be generated as part of an internal identification system. |
customer_company_name |
string |
The name of the purchaser of contract products and services. Unlike the seller_company_name this name is not guaranteed to match a name in the core_ferceqr__quarterly_identity name. In addition, the same customer company may appear with different names in different contracts and transactions, since this field is an unconstrained string chosen by the seller. |
exchange_brokerage_service |
string |
If a broker service is used to consummate or effectuate a transaction, the term “Broker” shall be provided. If an exchange is used, the specific exchange that is used shall be selected from the Commission-provided list. Allowed values include BROKER, ICE, NODAL, and NYMEX. |
ferc_tariff_reference |
string |
The FERC tariff reference cites the document that specifies the terms and conditions under which a Seller is authorized to make transmission sales, power sales or sales of related jurisdictional services at cost-based rates or at market-based rates. If the sales are market-based, the tariff that is specified in the FERC order granting the Seller Market Based Rate Authority must be listed. If a non-public utility does not have a FERC Tariff Reference, it should enter “NPU” for the FERC Tariff Reference. |
increment_name |
string |
5: Terms of the contract (if specifically noted in the contract) set for more than 0 minutes and less than or equal to 5 minutes (> 0 and ≤ 5 minutes). 15: Terms of the contract (if specifically noted in the contract) set for more than 5 minutes and less than or equal to 15 minutes (> 5 and ≤ 15 minutes). H: Terms of the contract (if specifically noted in the contract) set for more than 15 minutes and less than or equal to 6 hours (> 15 minutes and ≤ 6 hours). D: Terms of the contract (if specifically noted in the contract) set for more than 6 and up to 60 hours (> 6 and ≤ 60 hours). W: Terms of the contract (if specifically noted in the contract) set for over 60 hours and up to 168 hours (> 60 and ≤ 168 hours). M: Terms of the contract (if specifically noted in the contract) set for more than 168 hours up to, but not including, one year (> 168 hours and < 1 year). Y: Terms of the contract (if specifically noted in the contract) set for one year or more (≥ 1 year). N/A: Terms of the contract do not specify an increment. |
increment_peaking_name |
string |
FP: The product described may be sold during those hours designated as on-peak and off-peak at the point of delivery. OP: The product described may be sold only during those hours designated as off-peak at the point of delivery. P: The product described may be sold only during those hours designated as on-peak at the point of delivery. N/A: To be used only when the increment peaking name is not specified in the contract. |
point_of_delivery_balancing_authority |
string |
The registered Balancing Authority (formerly called NERC Control Area) where a jurisdictional product is delivered and/or service ends for a transmission or transmission-related jurisdictional sale. The Balancing Authority will be identified with the abbreviation used in OASIS applications. If delivery occurs at the interconnection of two control areas, the control area that the product is entering should be used. If delivery occurs at a trading hub, the term ‘Hub’ should be used. |
point_of_delivery_specific_location |
string |
The specific location at which the product is delivered if designated in the contract. If receipt occurs at a trading hub, a standardized hub name must be used. |
price |
number |
Actual price charged for the product per unit. The price reported cannot be averaged or otherwise aggregated. |
product_name |
string |
Description of product being offered. Note that allowed values differ slightly from those in core_ferceqr__contracts. BLACK START SERVICE: Service available after a system-wide blackout where a generator participates in system restoration activities without the availability of an outside electric supply (Ancillary Service). BOOKED OUT POWER: Energy or capacity contractually committed bilaterally for delivery but not actually delivered due to some offsetting or countervailing trade (Transaction only). CAPACITY: A quantity of demand that is charged on a $/KW or $/MW basis. CUSTOMER CHARGE: Fixed contractual charges assessed on a per customer basis that could include billing service. ENERGY: A quantity of electricity that is sold or transmitted over a period of time. ENERGY IMBALANCE: Service provided when a difference occurs between the scheduled and the actual delivery of energy to a load obligation (Ancillary Service). For Contracts, reported if the contract provides for sale of the product. For Transactions, sales by third-party providers (i.e., non-transmission function) are reported. EXCHANGE: Transaction whereby the receiver accepts delivery of energy for a supplier’s account and returns energy at times, rates, and in amounts as mutually agreed if the receiver is not an RTO/ISO. FUEL CHARGE: Charge based on the cost or amount of fuel used for generation. GENERATOR IMBALANCE: Service provided when a difference occurs between the output of a generator located in the Transmission Provider’s Control Area and a delivery schedule from that generator to (1) another Control Area or (2) a load within the Transmission Provider’s Control Area over a single hour (Ancillary Service). For Contracts, reported if the contract provides for sale of the product. For Transactions, sales by third-party providers (i.e., non-transmission function) are reported. GRANDFATHERED BUNDLED: Services provided for bundled transmission, ancillary services and energy under contracts effective prior to Order No. 888’s OATTs. NEGOTIATED-RATE TRANSMISSION: Transmission performed under a negotiated rate contract (applies only to merchant transmission companies). OTHER: Product name not otherwise included. PRIMARY FREQUENCY RESPONSE: Service provided as a stand-by resource to support autonomous, pre-programmed changes in output to rapidly arrest large changes in frequency until dispatched resources can take over. REACTIVE SUPPLY & VOLTAGE CONTROL: Production or absorption of reactive power to maintain voltage levels on transmission systems (Ancillary Service). REAL POWER TRANSMISSION LOSS: The loss of energy, resulting from transporting power over a transmission system. REGULATION & FREQUENCY RESPONSE: Service providing for continuous balancing of resources (generation and interchange) with load, and for maintaining scheduled interconnection frequency by committing on-line generation where output is raised or lowered and by other non-generation resources capable of providing this service as necessary to follow the moment-by-moment changes in load (Ancillary Service). For Contracts, reported if the contract provides for sale of the product. For Transactions, sales by third-party providers (i.e., non-transmission function) are reported. REQUIREMENTS SERVICE: Firm, load-following power supply necessary to serve a specified share of customer’s aggregate load during the term of the agreement. Requirements service may include some or all of the energy, capacity and ancillary service products. SCHEDULE SYSTEM CONTROL & DISPATCH: Scheduling, confirming and implementing an interchange schedule with other Balancing Authorities, including intermediary Balancing Authorities providing transmission service, and ensuring operational security during the interchange transaction (Ancillary Service). SPINNING RESERVE: Unloaded synchronized generating capacity that is immediately responsive to system frequency and that is capable of being loaded in a short time period or non-generation resources capable of providing this service (Ancillary Service). For Contracts, reported if the contract provides for sale of the product. For Transactions, sales by third-party providers (i.e., non-transmission function) are reported. SUPPLEMENTAL RESERVE: Service needed to serve load in the event of a system contingency, available with greater delay than SPINNING RESERVE. This service may be provided by generating units that are on-line but unloaded, by quick-start generation, or by interruptible load or other non-generation resources capable of providing this service (Ancillary Service). For Contracts, reported if the contract provides for sale of the product. For Transactions, sales by third-party providers (i.e., non-transmission function) are reported. TOLLING ENERGY: Energy sold from a plant whereby the buyer provides fuel to a generator (seller) and receives power in return for pre-established fees. UPLIFT: A make-whole payment by an RTO/ISO to a utility. |
rate_units |
string |
Measure stated in the contract for the product sold. FERC EQR. |
seller_company_id_ferc |
string |
The Company Identifier (CID) obtained through FERC’s Company Registration system corresponding to the selling company. |
seller_company_name |
string |
The name of the company that is authorized to make sales as indicated in the company’s FERC tariff(s) or that is required to file the EQR under section 220 of the Federal Power Act. This name must match the name provided as seller_company_name in the core_ferceqr__quarterly_identity table. There are a handful of cases in which this requirement is violated, so any joins between tables should rely on company_id_ferc, not the company names. |
seller_transaction_id |
string |
Unique reference number assigned by the Seller for each transaction. May only be unique in the context of the seller’s internal record keeping. This is an unrestricted text field. |
standardized_price |
number |
For product names energy, capacity, and booked out power only. Specify the price in $/MWh if the product is energy or booked out power and specify the price in $/MW-month if the product is capacity or booked out power. |
standardized_quantity |
number |
For product names energy, capacity, and booked out power only. Specify the quantity in MWh if the product is energy or booked out power and specify the quantity in MW-month if the product is capacity or booked out power. |
term_name |
string |
Contracts with durations of one year or greater are long-term (LT). Contracts with shorter durations are short-term (ST). |
timezone |
string |
A code representing the so-called ‘time zone’ in which the sale was made. However, these codes do not actually correspond to time zones. Rather, they indicate an offset from UTC, which changes according to daylight savings vs. standard time. The codes are as follows: AD: Atlantic Daylight Time (UTC-3), AP: Atlantic Prevailing Time, AS: Atlantic Standard Time (UTC-4), CD: Central Daylight Time (UTC-5), CP: Central Prevailing Time, CS: Central Standard Time (UTC-6), ED: Eastern Daylight Time (UTC-4), EP: Eastern Prevailing Time, ES: Eastern Standard Time (UTC-5), MD: Mountain Daylight Time (UTC-6), MP: Mountain Prevailing Time, MS: Mountain Standard Time (UTC-7), PD: Pacific Daylight Time (UTC-7), PP: Pacific Prevailing Time, PS: Pacific Standard Time (UTC-8), UT: Coordinated Universal Time (UTC+0) |
total_transaction_charge |
number |
transaction_quantity * price + total_transmission_charge. |
total_transmission_charge |
number |
Payments received for transmission services when explicitly identified. |
trade_date |
date |
The date upon which the parties made the legally binding agreement on the price of a transaction. |
transaction_begin_date |
datetime |
First date and time the product is sold during the quarter. |
transaction_end_date |
datetime |
Last date and time the product is sold during the quarter. |
transaction_quantity |
number |
The quantity of the product in this transaction record. |
transaction_unique_id |
string |
An identifier beginning with the letter “T” and followed by a number (e.g., “T1”, “T2”) used to designate a record containing transaction information. One record for each transaction record must be included in an EQR for a given quarter. A new transaction record must be used every time a price changes in a sale. Note, these ID’s are only unique for a single company_identifier and year_quarter. |
type_of_rate |
string |
FIXED: A fixed charge per unit of consumption. No variables are used to determine this rate. FORMULA: A calculation of a rate based upon a formula that does not contain an electric index component. ELECTRIC INDEX: A calculation of a rate based upon an index or a formula that contains an electric index component. An electric index includes an index published by an index publisher such as those required to be listed in Field Number 73 or a price published by an RTO/ISO (e.g., PJM West or Illinois Hub). RTO/ISO: If the price is the result of an RTO/ISO market or the sale is made to the RTO/ISO. |
year_quarter |
string |
Year-quarter corresponding to record. Formatted like YYYYq{1-4}. |
core_gridpathratoolkit__assn_generator_aggregation_group#
Noticed any irregularities with this table? File a report.
Association table providing connections between individual generator IDs and their corresponding aggregation group when creating aggregated profiles.
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
GridPath Resource Adequacy Toolkit Data
- Primary key:
plant_id_eia, generator_id
Usage Warnings#
The original data is no longer being collected or reported in this way.
Additional Details#
This table defines which individual generator profiles are combined when creating aggregated capacity factors / generation profiles. Generator capacity is used to weight the contribution of each generator in the resulting aggregated profiles, and is made available in this table for convenience and legibility. The resulting aggregated profiles are stored in out_gridpathratoolkit__hourly_available_capacity_factor.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_gridpathratoolkit__assn_generator_aggregation_group.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_gridpathratoolkit__assn_generator_aggregation_group.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_gridpathratoolkit__assn_generator_aggregation_group.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_gridpathratoolkit__assn_generator_aggregation_group.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
aggregation_group |
string |
A label identifying a group of aggregated generator capacity factors. |
capacity_mw |
number |
Total installed (nameplate) capacity, in megawatts. |
generator_id |
string |
Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string! |
include_generator |
boolean |
Every row in the aggregation table describes a single generator. Groups of rows with the same aggregation are combined using a capacity weighted average to produce an aggregate generation profile. A few generators are not included in that aggregation process. This column determines whether a generator is included. |
plant_id_eia |
integer |
The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration. |
core_nrelatb__yearly_projected_cost_performance#
Noticed any irregularities with this table? File a report.
Annual time series of projections of costs and performance for various technologies.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
NREL Annual Technology Baseline (ATB) for Electricity
- Primary key:
This table has no primary key. The fields technology_description_detail_1 and technology_description_detail_2 occasionally contain nulls, preventing this table from having a primary key.
Additional Details#
This table includes NREL’s projections for capital and operational costs as well as operational attributes for a variety of technologies for each of the NREL ATB scenarios. Each new year of ATB data includes projections for the next 25 years.
All of the NREL tables are related. If you want to understand the financial assumptions
of a given scenario in this table, see core_nrelatb__yearly_projected_financial_cases.
The other two core_nrealatb tables include attributes about technologies in these
projections.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_nrelatb__yearly_projected_cost_performance.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_nrelatb__yearly_projected_cost_performance.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_nrelatb__yearly_projected_cost_performance.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_nrelatb__yearly_projected_cost_performance.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
capacity_factor |
number |
Fraction of potential generation that was actually reported for a plant part. Energy generated over time period / nameplate capacity * time period (hours/years/etc.). |
capex_construction_finance_factor |
number |
Portion of all-in capital cost associated with construction period financing. This factor is applied to an overnight capital cost to represent the financing costs incurred during the construction period. |
capex_grid_connection_per_kw |
number |
Overnight capital cost includes a nominal-distance spur line (<1 mi) for all technologies, and for offshore wind, it includes export cable and construction period transit costs for a 30-km distance from shore. Project-specific costs lines that are based on distance to existing transmission are not included. This only applies to offshore wind. |
capex_overnight_additional_per_kw |
number |
capex for retrofits if plant could be constructed overnight (i.e., excludes construction period financing); includes on-site electrical equipment (e.g., switchyard), a nominal-distance spur line (<1 mi), and necessary upgrades at a transmission substation. |
capex_overnight_per_kw |
number |
capex if plant could be constructed overnight (i.e., excludes construction period financing); includes on-site electrical equipment (e.g., switchyard), a nominal-distance spur line (<1 mi), and necessary upgrades at a transmission substation. |
capex_per_kw |
number |
Capital cost (USD). Expenditures required to achieve commercial operation of the generation plant. |
cost_recovery_period_years |
integer |
The period over which the initial capital investment to build a plant is recovered. |
fuel_cost_per_mwh |
number |
Fuel costs in USD$/MWh. NREL-derived values using heat rates. |
heat_rate_mmbtu_per_mwh |
number |
Fuel content per unit of electricity generated. |
heat_rate_penalty |
number |
Heat rate penalty for retrofitting. This column only has contents to retrofit technologies. It seems to be a rate between 0.35 and 0.09 |
levelized_cost_of_energy_per_mwh |
number |
Levelized cost of energy (LCOE) is a summary metric that combines the primary technology cost and performance parameters: capital expenditures, operations expenditures, and capacity factor. |
model_case_nrelatb |
string |
NREL’s financial assumption cases. There are two cases which effect project financial assumptions: R&D Only Case and Market + Policies Case. R&D Only includes only projected R&D improvements while Market + Policy case includes policy and tax incentives. https://atb.nrel.gov/electricity/2024/financial_cases_&_methods |
model_tax_credit_case_nrelatb |
string |
NREL’s tax credit assumption cases. There are two types of tax credits: production tax credit (PTC) and investment tax credit (ITC). For more detail, see: https://atb.nrel.gov/electricity/2024/financial_cases_&_methods |
net_output_penalty |
number |
Penalty for retrofitting for net output. This column only has contents to retrofit technologies. It seems to be a rate between -0.25 and -0.08 |
opex_fixed_per_kw |
number |
Fixed operation and maintenance expenses. Annual expenditures to operate and maintain equipment that are not incurred on a per-unit-energy basis. |
opex_variable_per_mwh |
number |
Operation and maintenance costs incurred on a per-unit-energy basis. |
projection_year |
integer |
The year of the projected value. |
report_year |
integer |
Four-digit year in which the data was reported. |
scenario_atb |
string |
Technology innovation scenarios. https://atb.nrel.gov/electricity/2023/definitions#scenarios |
technology_description |
string |
High level description of the technology used by the generator to produce electricity. |
technology_description_detail_1 |
string |
Technology details indicate resource levels and specific technology subcategories. |
technology_description_detail_2 |
string |
Technology details indicate resource levels and specific technology subcategories. |
core_nrelatb__yearly_projected_financial_cases#
Noticed any irregularities with this table? File a report.
Annual time series of financial assumptions for each model case (model_case_nrelatb), and sub-type of technology (technology_description).
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
NREL Annual Technology Baseline (ATB) for Electricity
- Primary key:
report_year, model_case_nrelatb, projection_year, technology_description
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_nrelatb__yearly_projected_financial_cases.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_nrelatb__yearly_projected_financial_cases.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_nrelatb__yearly_projected_financial_cases.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_nrelatb__yearly_projected_financial_cases.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
inflation_rate |
number |
Rate of inflation. All dollar values are given in 2021 USD, using the Consumer Price Index for All Urban Consumers for dollar year conversions where the source year dollars do not match 2021. |
interest_rate_calculated_real |
number |
Calculated real interest rate. |
interest_rate_during_construction_nominal |
number |
Also referred to as construction finance cost. Portion of all-in capital cost associated with construction period financing. It is a function of construction duration, capital fraction during construction, and interest during construction. |
interest_rate_nominal |
number |
Nominal interest rate. |
model_case_nrelatb |
string |
NREL’s financial assumption cases. There are two cases which effect project financial assumptions: R&D Only Case and Market + Policies Case. R&D Only includes only projected R&D improvements while Market + Policy case includes policy and tax incentives. https://atb.nrel.gov/electricity/2024/financial_cases_&_methods |
projection_year |
integer |
The year of the projected value. |
rate_of_return_on_equity_calculated_real |
number |
Calculated real rate of return on equity. |
rate_of_return_on_equity_nominal |
number |
Nomial rate of return on equity. |
report_year |
integer |
Four-digit year in which the data was reported. |
tax_rate_federal_state |
number |
Combined federal and state tax rate. The R&D model_case_nrelatb holds tax and inflation rates constant at assumed long-term values: 21 percent federal tax rate, 6 percent state tax rate (though actual state tax rates vary), and 2.5 percent inflation rate excludes effects of tax credits. The Market + Policy model_case_nrelatb applies federal tax credits and expires them as consistent with existing law and guidelines. |
technology_description |
string |
High level description of the technology used by the generator to produce electricity. |
core_nrelatb__yearly_projected_financial_cases_by_scenario#
Noticed any irregularities with this table? File a report.
Annual time series of additional financial assumptions for NREL ATB projections that also vary by technology innovation scenario (scenario_atb), tax credit case (model_tax_credit_case_nrelatb), and cost recovery period (cost_recovery_period_years).
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
NREL Annual Technology Baseline (ATB) for Electricity
- Primary key:
This table has no primary key. There are a small number of records which have nulls in the cost_recovery_period_years column. Based on NREL’s documentation, this seems to indicate that those records apply to any relevant cost_recovery_period_years. If those records were non-null, the primary keys of this table would be: [‘report_year’, ‘model_case_nrelatb’, ‘model_tax_credit_case_nrelatb’, ‘projection_year’, ‘technology_description’, ‘scenario_atb’, ‘cost_recovery_period_years’]
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_nrelatb__yearly_projected_financial_cases_by_scenario.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_nrelatb__yearly_projected_financial_cases_by_scenario.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_nrelatb__yearly_projected_financial_cases_by_scenario.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_nrelatb__yearly_projected_financial_cases_by_scenario.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
capital_recovery_factor |
number |
Ratio of a constant annuity to the present value of receiving that annuity for a given length of time. |
cost_recovery_period_years |
integer |
The period over which the initial capital investment to build a plant is recovered. |
debt_fraction |
number |
Fraction of capital financed with debt; Debt fraction is assumed financed with equity; also referred to as the leverage ratio. |
fixed_charge_rate |
number |
Amount of revenue per dollar of investment required that must be collected annually from customers to pay the carrying charges on that investment. |
model_case_nrelatb |
string |
NREL’s financial assumption cases. There are two cases which effect project financial assumptions: R&D Only Case and Market + Policies Case. R&D Only includes only projected R&D improvements while Market + Policy case includes policy and tax incentives. https://atb.nrel.gov/electricity/2024/financial_cases_&_methods |
model_tax_credit_case_nrelatb |
string |
NREL’s tax credit assumption cases. There are two types of tax credits: production tax credit (PTC) and investment tax credit (ITC). For more detail, see: https://atb.nrel.gov/electricity/2024/financial_cases_&_methods |
projection_year |
integer |
The year of the projected value. |
report_year |
integer |
Four-digit year in which the data was reported. |
scenario_atb |
string |
Technology innovation scenarios. https://atb.nrel.gov/electricity/2023/definitions#scenarios |
technology_description |
string |
High level description of the technology used by the generator to produce electricity. |
wacc_nominal |
number |
Nominal weighted average cost of capital - average expected rate that is paid to finance assets. |
wacc_real |
number |
Real weighted average cost of capital - average expected rate that is paid to finance assets. |
core_nrelatb__yearly_technology_status#
Noticed any irregularities with this table? File a report.
Annual time series of technology statuses, indicating whether technologies in the ATB scenarios are mature, and whether they are the default technologies.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
NREL Annual Technology Baseline (ATB) for Electricity
- Primary key:
This table has no primary key.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_nrelatb__yearly_technology_status.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_nrelatb__yearly_technology_status.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_nrelatb__yearly_technology_status.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_nrelatb__yearly_technology_status.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
is_default |
boolean |
Indicator of whether the technology is default. |
is_technology_mature |
boolean |
Indicator of whether the technology is mature. Technologies are definedas mature if a representative plant is operating or under constructionin the United States in the Base Year. |
report_year |
integer |
Four-digit year in which the data was reported. |
technology_description |
string |
High level description of the technology used by the generator to produce electricity. |
technology_description_detail_1 |
string |
Technology details indicate resource levels and specific technology subcategories. |
technology_description_detail_2 |
string |
Technology details indicate resource levels and specific technology subcategories. |
core_phmsagas__yearly_distribution_operators#
Noticed any irregularities with this table? File a report.
Annual time series of distribution operator information.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
Pipelines and Hazardous Materials Safety Administration (PHMSA) Annual Natural Gas Report (Part A)
- Primary key:
report_id, report_date, operator_id_phmsa
Additional Details#
This table contains operator-level information including office and headquarter location.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_phmsagas__yearly_distribution_operators.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_phmsagas__yearly_distribution_operators.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_phmsagas__yearly_distribution_operators.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_phmsagas__yearly_distribution_operators.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
additional_information |
string |
Any additional information which will assist in clarifying or classifying the reported data. |
headquarters_city |
string |
City where an operator’s headquarters are located. |
headquarters_county |
string |
County where an operator’s headquarters are located. |
headquarters_state |
string |
State where an operator’s headquarters are located. |
headquarters_street_address |
string |
Street address for an operator’s headquarters. |
headquarters_zip |
string |
Zipcode where an operator’s headquarters are located. |
office_city |
string |
City where an operator’s office is located. |
office_county |
string |
County where an operator’s office is located. |
office_state |
string |
State where an operator’s office is located. |
office_street_address |
string |
Street address of an operator’s office. |
office_zip |
string |
Zipcode where an operator’s office is located. |
operator_id_phmsa |
integer |
PHMSA unique operator ID. A value of zero represents an unknown operator ID. |
operator_name_phmsa |
string |
PHMSA operator name. |
report_date |
date |
Date reported. |
report_id |
integer |
Report number of the PHMSA Gas utility submission. |
core_pudl__assn_eia_pudl_plants#
Noticed any irregularities with this table? File a report.
Association table providing connections between EIA plant IDs and manually assigned PUDL plant IDs.
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
The Public Utility Data Liberation (PUDL) Project
- Primary key:
plant_id_eia
Usage Warnings#
Contains information from multiple raw inputs.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_pudl__assn_eia_pudl_plants.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_pudl__assn_eia_pudl_plants.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_pudl__assn_eia_pudl_plants.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_pudl__assn_eia_pudl_plants.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
plant_id_eia |
integer |
The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration. |
plant_id_pudl |
integer |
A manually assigned PUDL plant ID. May not be constant over time. |
plant_name_eia |
string |
Plant name. |
core_pudl__assn_eia_pudl_utilities#
Noticed any irregularities with this table? File a report.
Association table providing connections between EIA utility IDs and manually assigned PUDL utility IDs.
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
The Public Utility Data Liberation (PUDL) Project
- Primary key:
utility_id_eia
Usage Warnings#
Contains information from multiple raw inputs.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_pudl__assn_eia_pudl_utilities.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_pudl__assn_eia_pudl_utilities.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_pudl__assn_eia_pudl_utilities.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_pudl__assn_eia_pudl_utilities.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
utility_id_eia |
integer |
The EIA Utility Identification number. |
utility_id_pudl |
integer |
A manually assigned PUDL utility ID. May not be stable over time. |
utility_name_eia |
string |
The name of the utility. |
core_pudl__assn_ferc1_dbf_pudl_utilities#
Noticed any irregularities with this table? File a report.
Association table providing connections between PUDL-assigned FERC1 utility IDs and the native FERC1 DBF utility IDs originally reported as respondent_id.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
The Public Utility Data Liberation (PUDL) Project
- Primary key:
utility_id_ferc1_dbf
Usage Warnings#
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_pudl__assn_ferc1_dbf_pudl_utilities.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_pudl__assn_ferc1_dbf_pudl_utilities.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_pudl__assn_ferc1_dbf_pudl_utilities.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_pudl__assn_ferc1_dbf_pudl_utilities.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
utility_id_ferc1 |
integer |
PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year. |
utility_id_ferc1_dbf |
integer |
FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year. |
core_pudl__assn_ferc1_pudl_plants#
Noticed any irregularities with this table? File a report.
Association table providing connections between FERC 1 plants and their manually assigned PUDL plant IDs.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
The Public Utility Data Liberation (PUDL) Project
- Primary key:
utility_id_ferc1, plant_name_ferc1
Usage Warnings#
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.
Additional Details#
FERC does not assign IDs to plants, so each FERC 1 plant is identified by a utility_id_ferc1 and a plant_name_ferc1.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_pudl__assn_ferc1_pudl_plants.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_pudl__assn_ferc1_pudl_plants.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_pudl__assn_ferc1_pudl_plants.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_pudl__assn_ferc1_pudl_plants.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
plant_id_pudl |
integer |
A manually assigned PUDL plant ID. May not be constant over time. |
plant_name_ferc1 |
string |
Name of the plant, as reported to FERC. This is a freeform string, not guaranteed to be consistent across references to the same plant. |
utility_id_ferc1 |
integer |
PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year. |
core_pudl__assn_ferc1_pudl_utilities#
Noticed any irregularities with this table? File a report.
Association table providing connections between PUDL utility IDs and PUDL-assigned FERC1 utility IDs.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
The Public Utility Data Liberation (PUDL) Project
- Primary key:
utility_id_ferc1
Usage Warnings#
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.
Additional Details#
This table maps two manually assigned utility IDs: a PUDL ID and a FERC1 ID. The PUDL IDs link EIA and FERC1 utilities. The PUDL FERC1 IDs link records from older DBF respondent IDs and new XBRL entity IDs via core_pudl__assn_ferc1_dbf_pudl_utilities and core_pudl__assn_ferc1_xbrl_pudl_utilities respectively. This table is generated from a table stored in the PUDL repository: src/package_data/glue/utility_id_pudl.csv
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_pudl__assn_ferc1_pudl_utilities.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_pudl__assn_ferc1_pudl_utilities.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_pudl__assn_ferc1_pudl_utilities.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_pudl__assn_ferc1_pudl_utilities.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
utility_id_ferc1 |
integer |
PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year. |
utility_id_pudl |
integer |
A manually assigned PUDL utility ID. May not be stable over time. |
utility_name_ferc1 |
string |
Name of the responding utility, as it is reported in FERC Form 1. For human readability only. |
core_pudl__assn_ferc1_xbrl_pudl_utilities#
Noticed any irregularities with this table? File a report.
Association table providing connections between PUDL-assigned FERC1 utility IDs and the native FERC1 XBRL utility IDs originally reported as entity_id.
- Most-recent data:
2024
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
The Public Utility Data Liberation (PUDL) Project
- Primary key:
utility_id_ferc1_xbrl
Usage Warnings#
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_pudl__assn_ferc1_xbrl_pudl_utilities.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_pudl__assn_ferc1_xbrl_pudl_utilities.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_pudl__assn_ferc1_xbrl_pudl_utilities.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_pudl__assn_ferc1_xbrl_pudl_utilities.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
utility_id_ferc1 |
integer |
PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year. |
utility_id_ferc1_xbrl |
string |
FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year. |
core_pudl__assn_ferc714_csv_pudl_respondents#
Noticed any irregularities with this table? File a report.
Association table providing connections between PUDL-assigned FERC714 respondent IDs and native FERC714 CSV respondent IDs originally reported as respondent_id.
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
The Public Utility Data Liberation (PUDL) Project
- Primary key:
respondent_id_ferc714, respondent_id_ferc714_csv
Usage Warnings#
Contains information from multiple raw inputs.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_pudl__assn_ferc714_csv_pudl_respondents.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_pudl__assn_ferc714_csv_pudl_respondents.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_pudl__assn_ferc714_csv_pudl_respondents.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_pudl__assn_ferc714_csv_pudl_respondents.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
respondent_id_ferc714 |
integer |
PUDL-assigned identifying a respondent to FERC Form 714. This ID associates natively reported respondent IDs from the original CSV and XBRL data sources. |
respondent_id_ferc714_csv |
integer |
FERC Form 714 respondent ID from CSV reported data - published from years: 2006-2020. This ID is linked to the newer years of reported XBRL data through the PUDL-assigned respondent_id_ferc714 ID. This ID was originally reported as respondent_id. Note that this ID does not correspond to FERC respondent IDs from other forms. |
core_pudl__assn_ferc714_pudl_respondents#
Noticed any irregularities with this table? File a report.
Entity table containing static information about PUDL-derived FERC 714 respondent IDs.
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
The Public Utility Data Liberation (PUDL) Project
- Primary key:
respondent_id_ferc714
Usage Warnings#
Contains information from multiple raw inputs.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.
Additional Details#
These IDs are used to connect older CSV data which uses different respondent IDs than the newer XBRL entity IDs. These IDs are manually assigned when new FERC 714 data is integrated, and any newly found utilities are added to the list with a new ID. This table is read in from a CSV stored in the PUDL repository: src/pudl/package_data/glue/respondent_id_ferc714.xlsx
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_pudl__assn_ferc714_pudl_respondents.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_pudl__assn_ferc714_pudl_respondents.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_pudl__assn_ferc714_pudl_respondents.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_pudl__assn_ferc714_pudl_respondents.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
respondent_id_ferc714 |
integer |
PUDL-assigned identifying a respondent to FERC Form 714. This ID associates natively reported respondent IDs from the original CSV and XBRL data sources. |
core_pudl__assn_ferc714_xbrl_pudl_respondents#
Noticed any irregularities with this table? File a report.
Association table providing connections between PUDL-assigned FERC714 respondent IDs and native FERC714 XBRL respondent IDs originally reported as entity_id.
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
The Public Utility Data Liberation (PUDL) Project
- Primary key:
respondent_id_ferc714, respondent_id_ferc714_xbrl
Usage Warnings#
Contains information from multiple raw inputs.
FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_pudl__assn_ferc714_xbrl_pudl_respondents.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_pudl__assn_ferc714_xbrl_pudl_respondents.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_pudl__assn_ferc714_xbrl_pudl_respondents.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_pudl__assn_ferc714_xbrl_pudl_respondents.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
respondent_id_ferc714 |
integer |
PUDL-assigned identifying a respondent to FERC Form 714. This ID associates natively reported respondent IDs from the original CSV and XBRL data sources. |
respondent_id_ferc714_xbrl |
string |
FERC Form 714 respondent ID from XBRL reported data - published from years: 2021-present. This ID is linked to the older years of reported CSV data through the PUDL-assigned respondent_id_ferc714 ID. This ID was originally reported as entity_id. Note that this ID does not correspond to FERC respondent IDs from other forms. |
core_pudl__assn_utilities_plants#
Noticed any irregularities with this table? File a report.
Association table providing connections between PUDL utility IDs and PUDL plant IDs.
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
The Public Utility Data Liberation (PUDL) Project
- Primary key:
utility_id_pudl, plant_id_pudl
Additional Details#
This table is read in from a spreadsheet stored in the PUDL repository: src/pudl/package_data/glue/pudl_id_mapping.xlsx
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_pudl__assn_utilities_plants.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_pudl__assn_utilities_plants.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_pudl__assn_utilities_plants.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_pudl__assn_utilities_plants.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
plant_id_pudl |
integer |
A manually assigned PUDL plant ID. May not be constant over time. |
utility_id_pudl |
integer |
A manually assigned PUDL utility ID. May not be stable over time. |
core_pudl__codes_data_maturities#
Noticed any irregularities with this table? File a report.
Code table containing descriptions of categorical codes for maturity levels of data records. Some data sources report less-than-final data. PUDL sometimes includes this data, but use at your own risk.
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
The Public Utility Data Liberation (PUDL) Project
- Primary key:
code
Usage Warnings#
Contains information from multiple raw inputs.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_pudl__codes_data_maturities.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_pudl__codes_data_maturities.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_pudl__codes_data_maturities.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_pudl__codes_data_maturities.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
code |
string |
Originally reported short code. |
description |
string |
Long human-readable description of the meaning of a code/label. |
core_pudl__codes_datasources#
Noticed any irregularities with this table? File a report.
Code table containing descriptions of categorical codes for the data sources that PUDL integrates.
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
The Public Utility Data Liberation (PUDL) Project
- Primary key:
datasource
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_pudl__codes_datasources.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_pudl__codes_datasources.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_pudl__codes_datasources.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_pudl__codes_datasources.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
datasource |
string |
Code identifying a dataset available within PUDL. |
doi |
string |
Unique digital object identifier of Zenodo archive. |
partitions |
string |
The data partitions used to generate this instance of the database. |
pudl_version |
string |
The version of PUDL used to generate this database. |
core_pudl__codes_imputation_reasons#
Noticed any irregularities with this table? File a report.
Code table containing descriptions of categorical codes for the reasons a value might be flagged for imputation.
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
The Public Utility Data Liberation (PUDL) Project
- Primary key:
code
Additional Details#
Any time a column is imputed, there should be a corresponding column which contains codes for every value in the column that was imputed.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_pudl__codes_imputation_reasons.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_pudl__codes_imputation_reasons.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_pudl__codes_imputation_reasons.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_pudl__codes_imputation_reasons.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
code |
string |
Originally reported short code. |
description |
string |
Long human-readable description of the meaning of a code/label. |
core_pudl__codes_subdivisions#
Noticed any irregularities with this table? File a report.
Code table containing descriptions of categorical codes for states, provinces, and other sub-national political subdivisions.
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
The Public Utility Data Liberation (PUDL) Project
- Primary key:
country_code, subdivision_code
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_pudl__codes_subdivisions.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_pudl__codes_subdivisions.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_pudl__codes_subdivisions.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_pudl__codes_subdivisions.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
country_code |
string |
Three letter ISO-3166 country code (e.g. USA or CAN). |
country_name |
string |
Full country name (e.g. United States of America). |
division_code_us_census |
string |
Three-letter US Census division code as it appears in the bulk electricity data published by the EIA. Note that EIA splits the Pacific division into distinct contiguous (CA, OR, WA) and non-contiguous (AK, HI) states. For reference see this US Census region and division map: https://www2.census.gov/geo/pdfs/maps-data/maps/reference/us_regdiv.pdf |
division_name_us_census |
string |
Longer human readable name describing the US Census division. |
is_epacems_state |
boolean |
Indicates whether the associated state reports data within the EPA’s Continuous Emissions Monitoring System. |
region_name_us_census |
string |
Human-readable name of a US Census region. |
state_id_fips |
string |
Two digit state FIPS code. |
subdivision_code |
string |
Two-letter ISO-3166 political subdivision code (e.g. US state or Canadian province abbreviations like CA or AB). |
subdivision_name |
string |
Full name of political subdivision (e.g. US state or Canadian province names like California or Alberta. |
subdivision_type |
string |
ISO-3166 political subdivision type. E.g. state, province, outlying_area. |
timezone_approx |
string |
IANA timezone name of the timezone which encompasses the largest portion of the population in the associated geographic area. |
core_pudl__entity_plants_pudl#
Noticed any irregularities with this table? File a report.
Entity table containing static information about PUDL-assigned plant IDs.
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
The Public Utility Data Liberation (PUDL) Project
- Primary key:
plant_id_pudl
Additional Details#
These IDs are manually generated each year when new FERC and EIA reporting is integrated, and any newly identified plants are added to the list with a new ID. Each ID maps to a power plant which is reported in at least one FERC or EIA data set. This table is read in from a spreadsheet stored in the PUDL repository: src/pudl/package_data/glue/pudl_id_mapping.xlsx
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_pudl__entity_plants_pudl.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_pudl__entity_plants_pudl.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_pudl__entity_plants_pudl.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_pudl__entity_plants_pudl.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
plant_id_pudl |
integer |
A manually assigned PUDL plant ID. May not be constant over time. |
plant_name_pudl |
string |
Plant name, chosen arbitrarily from the several possible plant names available in the plant matching process. Included for human readability only. |
core_pudl__entity_utilities_pudl#
Noticed any irregularities with this table? File a report.
Entity table containing static information about PUDL-assigned utility IDs.
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
The Public Utility Data Liberation (PUDL) Project
- Primary key:
utility_id_pudl
Additional Details#
These IDs are manually generated each year when new FERC and EIA reporting is integrated, and any newly found utilities are added to the list with a new ID. Each ID maps to a power plant owning or operating entity which is reported in at least one FERC or EIA data set. This table is read in from a spreadsheet stored in the PUDL repository: src/pudl/package_data/glue/pudl_id_mapping.xlsx
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_pudl__entity_utilities_pudl.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_pudl__entity_utilities_pudl.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_pudl__entity_utilities_pudl.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_pudl__entity_utilities_pudl.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
utility_id_pudl |
integer |
A manually assigned PUDL utility ID. May not be stable over time. |
utility_name_pudl |
string |
Utility name, chosen arbitrarily from the several possible utility names available in the utility matching process. Included for human readability only. |
core_rus12__entity_borrowers#
Noticed any irregularities with this table? File a report.
Entity table containing static information about active RUS borrowers.
- Most-recent data:
2021
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply
- Primary key:
borrower_id_rus
Usage Warnings#
Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.
Additional Details#
This table contains canonical values for borrowers are set. It contains values which are expected to remain fixed over time.RUS reports many attributes in many different tables
across throughout RUS-7 and RUS-12. In order to compile tidy, well-normalized database
tables, PUDL collects all instances of these values and and chooses a canonical value.
By default, PUDL chooses the most consistently reported value of a given attribute as
long as it is at least 70% of the given instances reported. For the borrower_name_rus
PUDL chooses the most consistently reported value regardless of if it meets this 70%
threshold so that all borrowers will have a name. We chose this because most name
changes were insignificant (eg. “and” changed to “&” or “coop” changed to “cooperative”).
All tables downstream of this one inherit the canonical values established
here..
This table contains all of the Active Distribution Borrowers as of each report year who were eligible to report to RUS Form 12. If these Borrowers have reported to RUS they will have records in the enclosed data tables, however a small number of these Borrowers did not report for various reasons and these Borrowers will not be represented in any of the other tables.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus12__entity_borrowers.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus12__entity_borrowers.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus12__entity_borrowers.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus12__entity_borrowers.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
borrower_name_rus |
string |
The name of the RUS (Rural Utilities Service) borrower. |
state |
string |
Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands. |
core_rus12__monthly_demand_and_energy_at_delivery_points#
Noticed any irregularities with this table? File a report.
Monthly time series of demand and energy delivered to distribution utilities.
- Most-recent data:
2021
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part H - Section E)
- Primary key:
report_date, borrower_id_rus, delivery_recipient
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus12__monthly_demand_and_energy_at_delivery_points.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus12__monthly_demand_and_energy_at_delivery_points.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus12__monthly_demand_and_energy_at_delivery_points.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus12__monthly_demand_and_energy_at_delivery_points.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
delivered_demand_mw |
number |
The amount of demand delivered in MW. |
delivered_energy_mwh |
number |
The amount of energy delivered in MWh. |
delivery_recipient |
string |
The recipient of the delivered energy or demand. |
report_date |
date |
Date reported. |
core_rus12__monthly_demand_and_energy_at_power_sources#
Noticed any irregularities with this table? File a report.
Monthly time series of demand and energy at point of generation.
- Most-recent data:
2021
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part H - Section D)
- Primary key:
report_date, borrower_id_rus
Additional Details#
There are some peak demand dates that have an hour component and others that don’t. This is because there is inconsistent reporting of peak_demand_hour in the raw data. If there is no hour reported, we record the hour as 0 and only the date will show in the data.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus12__monthly_demand_and_energy_at_power_sources.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus12__monthly_demand_and_energy_at_power_sources.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus12__monthly_demand_and_energy_at_power_sources.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus12__monthly_demand_and_energy_at_power_sources.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
energy_output_mwh |
number |
The amount of energy output in MWh. |
is_peak_coincident |
boolean |
Whether or not the peak_mw is coincident or non-coincident peak. |
peak_demand_date |
datetime |
The date of the peak demand. |
peak_demand_mw |
number |
peak demand in a given timeframe. |
report_date |
date |
Date reported. |
core_rus12__yearly_balance_sheet_assets#
Noticed any irregularities with this table? File a report.
Annual time series of assets and other debts from the balance sheet.
- Most-recent data:
2021
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part A - Section B)
- Primary key:
report_date, borrower_id_rus, asset_type
Usage Warnings#
Some columns contain subtotals; use caution when choosing columns to aggregate.
Additional Details#
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus12__yearly_balance_sheet_assets.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus12__yearly_balance_sheet_assets.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus12__yearly_balance_sheet_assets.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus12__yearly_balance_sheet_assets.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
asset_type |
string |
Type of asset being reported to the core_rus12__yearly_balance_sheet_assets table. |
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
ending_balance |
number |
Account balance at end of year. |
is_total |
boolean |
Whether or not this record represents a total. |
report_date |
date |
Date reported. |
core_rus12__yearly_balance_sheet_liabilities#
Noticed any irregularities with this table? File a report.
Annual time series of liabilities and other credits from the balance sheet.
- Most-recent data:
2021
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part A - Section B)
- Primary key:
report_date, borrower_id_rus, liability_type
Usage Warnings#
Some columns contain subtotals; use caution when choosing columns to aggregate.
Additional Details#
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus12__yearly_balance_sheet_liabilities.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus12__yearly_balance_sheet_liabilities.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus12__yearly_balance_sheet_liabilities.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus12__yearly_balance_sheet_liabilities.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
ending_balance |
number |
Account balance at end of year. |
is_total |
boolean |
Whether or not this record represents a total. |
liability_type |
string |
Type of liability being reported to the core_rus12__yearly_balance_sheet_liabilities table. |
report_date |
date |
Date reported. |
core_rus12__yearly_depreciation_changes#
Noticed any irregularities with this table? File a report.
Annual time series of changes in accumulated depreciation.
- Most-recent data:
2021
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part H - Section B)
- Primary key:
report_date, borrower_id_rus, depreciation_and_amortization_group, depreciation_and_amortization_item
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus12__yearly_depreciation_changes.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus12__yearly_depreciation_changes.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus12__yearly_depreciation_changes.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus12__yearly_depreciation_changes.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
accruals |
number |
Value of additions into an asset class a.k.a accruals. |
adjustments_and_transfers |
number |
Amount of adjustments and transfers within a class of assets. |
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
composite_depreciation_rate |
number |
The composite depreciation rate within a given category. This is typically expressed as a number between 0 and 100. |
depreciation_and_amortization_group |
string |
High-level category of depreciation and amortization items. |
depreciation_and_amortization_item |
string |
Category of depreciation and amortization items. |
ending_balance |
number |
Account balance at end of year. |
is_total |
boolean |
Whether or not this record represents a total. |
report_date |
date |
Date reported. |
retirements_less_net_salvage |
number |
Cost of retirements minus any net salvage value. |
core_rus12__yearly_depreciation_misc#
Noticed any irregularities with this table? File a report.
Annual time series of accumulated depreciation.
- Most-recent data:
2021
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part H - 2nd part of Section B)
- Primary key:
report_date, borrower_id_rus, depreciation_and_amortization_item
Additional Details#
This is the second half of the depreciation table. It includes ending balances of six depreciation and amortization values.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus12__yearly_depreciation_misc.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus12__yearly_depreciation_misc.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus12__yearly_depreciation_misc.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus12__yearly_depreciation_misc.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
depreciation_and_amortization_item |
string |
Category of depreciation and amortization items. |
ending_balance |
number |
Account balance at end of year. |
report_date |
date |
Date reported. |
core_rus12__yearly_external_financial_risk_ratio#
Noticed any irregularities with this table? File a report.
Annual time series of ratio of investments and loan guarantee balances to total utility plant assets.
- Most-recent data:
2021
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part H - Section F - Subsection III)
- Primary key:
report_date, borrower_id_rus
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus12__yearly_external_financial_risk_ratio.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus12__yearly_external_financial_risk_ratio.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus12__yearly_external_financial_risk_ratio.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus12__yearly_external_financial_risk_ratio.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
external_financial_risk_ratio |
number |
total investments + loan guarantee balances / total utility plant assets. This ratio shows how much a utility is financially exposed to outside entities relative to its own assets. |
report_date |
date |
Date reported. |
core_rus12__yearly_investments#
Noticed any irregularities with this table? File a report.
Annual time series of investments, loan guarantees and loans.
- Most-recent data:
2021
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part H - Section F, Sub-section I)
- Primary key:
This table has no primary key. This is a list of all investments or loans in each year and borrowers can have multiple records with the same
investment_description.
Usage Warnings#
Some columns contain subtotals; use caution when choosing columns to aggregate.
Additional Details#
Reporting of investments is required by 7 CFR 1717, Subpart N. Investment categories reported on this Part correspond to Balance Sheet items in Part A Section B in table core_rus12__yearly_balance_sheet_assets.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus12__yearly_investments.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus12__yearly_investments.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus12__yearly_investments.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus12__yearly_investments.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
excluded_investments |
number |
Excluded investment. |
for_rural_development |
boolean |
Whether or not the investment or loan is for rural development. This includes investments in any/all types of projects or products that were made to improve the economy and/or quality of life in the specified area. |
included_investments |
number |
Included investment. |
income_or_loss |
number |
Income or loss from investment. |
investment_description |
string |
Description of investment. This is a free-form text field and thus contains a wide variety of values. |
investment_type_code |
integer |
Investment type code. |
report_date |
date |
Date reported. |
core_rus12__yearly_lines_stations_labor_materials_cost#
Noticed any irregularities with this table? File a report.
Annual time series of labor and material cost for lines and stations operated by RUS borrowers.
- Most-recent data:
2021
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part I - Section C)
- Primary key:
report_date, borrower_id_rus, labor_or_material, operation_or_maintenance, lines_or_stations
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus12__yearly_lines_stations_labor_materials_cost.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus12__yearly_lines_stations_labor_materials_cost.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus12__yearly_lines_stations_labor_materials_cost.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus12__yearly_lines_stations_labor_materials_cost.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
cost |
number |
Cost value. |
labor_or_material |
string |
Indicates whether the cost reported is for labor or material. |
lines_or_stations |
string |
Indicates whether the cost reported pertains to lines or stations. |
operation_or_maintenance |
string |
Indicates whether the expenditure is for operation or maintenance. |
report_date |
date |
Date reported. |
core_rus12__yearly_loans#
Noticed any irregularities with this table? File a report.
Annual time series of loans provided by RUS borrowers.
- Most-recent data:
2021
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part H - Section F - Subsections II & IV)
- Primary key:
This table has no primary key. This table has no primary key because some borrowers report multiple loan values from the same entity in a given year.
Additional Details#
This table also includes loan guarantees where the RUS borrower backs a loan from another entity and is therefore liable to pay any remaining balance should the original borrower default.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus12__yearly_loans.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus12__yearly_loans.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus12__yearly_loans.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus12__yearly_loans.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
for_rural_development |
boolean |
Whether or not the investment or loan is for rural development. This includes investments in any/all types of projects or products that were made to improve the economy and/or quality of life in the specified area. |
is_loan_guarantee |
boolean |
Indicates a third-party loan that the reporting utility (referred as a borrower) has co-signed, taking on responsibility for repayment if the primary borrower defaults. |
loan_balance |
number |
The amount of money still owned on a loan at the end of the reporting year. |
loan_maturity_date |
date |
The date on which a loan is scheduled to be fully paid. |
loan_original_amount |
number |
The original amount of a loan. |
loan_recipient |
string |
The organization that received a loan. |
report_date |
date |
Date reported. |
core_rus12__yearly_long_term_debt#
Noticed any irregularities with this table? File a report.
Annual time series of long-term debt and debt service requirements for RUS borrowers.
- Most-recent data:
2021
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part H - Section H)
- Primary key:
This table has no primary key. This table has no primary key because some borrowers report multiple debt values from the same entity in a given year.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus12__yearly_long_term_debt.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus12__yearly_long_term_debt.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus12__yearly_long_term_debt.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus12__yearly_long_term_debt.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
debt_description |
string |
Description of debt or loan. On the original form, there are nine provided descriptions and a section to add other free-form descriptions. |
debt_ending_balance |
number |
The amount of principal still owned on the debt at the end of the report year. |
debt_interest |
number |
The interest expense on the debt for the report year. |
debt_principal |
number |
The principal paid on the debt during the report year. |
debt_total |
number |
The total amount of debt. |
report_date |
date |
Date reported. |
core_rus12__yearly_meeting_and_board#
Noticed any irregularities with this table? File a report.
Annual time series of annual meeting and board information for RUS borrowers.
- Most-recent data:
2021
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part H - Section I)
- Primary key:
report_date, borrower_id_rus
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus12__yearly_meeting_and_board.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus12__yearly_meeting_and_board.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus12__yearly_meeting_and_board.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus12__yearly_meeting_and_board.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
board_members_num |
integer |
The total number of board members. |
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
does_manager_have_written_contract |
boolean |
Whether or not the RUS borrower’s manager has a written contract. |
fees_and_expenses_for_board_members |
integer |
The total amount of fees and expenses for board members. |
last_annual_meeting_date |
datetime |
The date of the last annual meeting. |
members_num |
integer |
The total number of members. |
members_present_at_meeting_num |
integer |
The number of members present at the last annual meeting. |
members_voting_by_proxy_or_mail_num |
integer |
The number of members voting by mail or by proxy. |
report_date |
date |
Date reported. |
was_quorum_present |
boolean |
Whether or not quorum was met. |
core_rus12__yearly_non_utility_plant_changes#
Noticed any irregularities with this table? File a report.
Annual time series of changes in non-utility plant.
- Most-recent data:
2021
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part H - Section C)
- Primary key:
report_date, borrower_id_rus, non_utility_plant_item
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus12__yearly_non_utility_plant_changes.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus12__yearly_non_utility_plant_changes.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus12__yearly_non_utility_plant_changes.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus12__yearly_non_utility_plant_changes.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
additions |
number |
Cost of acquisition of items classified within the account. |
adjustments_and_transfers |
number |
Amount of adjustments and transfers within a class of assets. |
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
ending_balance |
number |
Account balance at end of year. |
non_utility_plant_item |
string |
Category describing non-utility plant asset items. |
report_date |
date |
Date reported. |
retirements |
number |
Cost of disposal of items classified within the account. |
core_rus12__yearly_plant_costs#
Noticed any irregularities with this table? File a report.
Annual time series of costs of net energy generated by plant.
- Most-recent data:
2021
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part F - Section D)
- Primary key:
This table has no primary key. This table has no primary key because there is one plant (Walter Scott) that has duplicate records every year. Based on other RUS-12 tables, it is likely that these duplicate records are the borrowers’ ownership portion of the plant as well as the total plant but there is no clear indication in this table.
The primary key of this table otherwise would be: [‘report_date’, ‘borrower_id_rus’, ‘plant_name_rus’, ‘cost_group’, ‘cost_type’].
Usage Warnings#
Some columns contain subtotals; use caution when choosing columns to aggregate.
Additional Details#
The cost column in this table is expected to be largely non-null, the cost_per_mwh and cost_per_mmbtu columns only apply to some cost_type’s and even plant_type’s and thus are expected to contain many nulls.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus12__yearly_plant_costs.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus12__yearly_plant_costs.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus12__yearly_plant_costs.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus12__yearly_plant_costs.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
cost |
number |
Cost value. |
cost_group |
string |
High-level category of cost type. |
cost_per_mmbtu |
number |
Unit cost of energy production in cost per MMBTU |
cost_per_mwh |
number |
Unit cost of energy production in cost per MWh |
cost_type |
string |
Detailed category of cost type. |
is_total |
boolean |
Whether or not this record represents a total. |
plant_name_rus |
string |
Name of the plant as reported to RUS. |
plant_type |
string |
Type of plant. |
report_date |
date |
Date reported. |
core_rus12__yearly_plant_factors_and_maximum_demand#
Noticed any irregularities with this table? File a report.
Annual time series of plant factors and maximum demand for plants owned by RUS borrowers.
- Most-recent data:
2021
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Parts D, E, F IC, F CC, G - Section C)
- Primary key:
This table has no primary key. The primary key for this table would be report_date, borrower_id_rus, plant_name_rus, and plant_type but the plant_type column is not reported before 2009 causing lots of duplicate records. Because none of the other fields are static year-over-year, there is no reliable way to backfill the plant_type column in instances where there are different plant_types reported for the same plant in later years.
Additional Details#
The plant_type field was only included in the data starting in 2009. We backfill records with only one reported plant_type per borrower/plant name/year. Records with multiple plant_types per borrower/plant name/year are left with null plant_type because there is no reliable way to determine which plant_type is associated with which record back in time. Plants who’s names are reported differently pre and post 2009 will also miss being backfilled.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus12__yearly_plant_factors_and_maximum_demand.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus12__yearly_plant_factors_and_maximum_demand.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus12__yearly_plant_factors_and_maximum_demand.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus12__yearly_plant_factors_and_maximum_demand.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
capacity_factor |
number |
Fraction of potential generation that was actually reported for a plant part. Energy generated over time period / nameplate capacity * time period (hours/years/etc.). |
capacity_factor_running |
number |
Fraction of potential generation over the time period a plant was in operation. Energy generated over time period / nameplate capacity * time period (hours/years/etc.). |
load_factor |
number |
Fraction of consumption vs demand reported for a plant over a given timeframe. Energy consumed over time period / peak demand * time period (hours/years/etc.). |
peak_gross_demand_mw |
number |
The highest average power output recorded over any single 15 minute interval during the reporting period. |
peak_gross_demand_nameplate_mw |
number |
The theoretical or nameplate peak the plant could produce under the best operating conditions during the reporting period. |
plant_name_rus |
string |
Name of the plant as reported to RUS. |
plant_type |
string |
Type of plant. |
report_date |
date |
Date reported. |
core_rus12__yearly_plant_labor#
Noticed any irregularities with this table? File a report.
Annual time series of labor and payroll information for plants owned by RUS borrowers.
- Most-recent data:
2021
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Parts D, E, F, G - Section B)
- Primary key:
This table has no primary key. The primary key should be report_date, borrower_id_rus, plant_name_rus, and plant_type, but this table did not report plant_type before 2009 and there are respondents who report multiple rows per plant pre-2009. The data cannot be backfilled because there is no way to distinguish between duplicate rows pre-2009.
Usage Warnings#
Some columns contain subtotals; use caution when choosing columns to aggregate.
Additional Details#
Note the lack of plant_type pre-2009 leading to a lack of reliable primary keys.
Plant-level data from core_rus12__yearly_plant_operations_by_plant can be matched to data from this table, but you must also join this data usingthe borrower_id_rus column as well as the report_date and plant_name. This is because the core_rus12__yearly_plant_operations_by_plant table includes records from each plant owner. The RUS instructions note that:
you should use caution when using total plant data since there are cases where more than one Borrower shares units at the same plant which means that you will be getting duplicate plant total records (and there is no guarantee that the total plant records entered by two borrowers for the same plant will be identical).
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus12__yearly_plant_labor.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus12__yearly_plant_labor.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus12__yearly_plant_labor.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus12__yearly_plant_labor.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
employee_hours_worked_total |
number |
Total number of hours worked by employees. |
employees_full_time_num |
integer |
Number of employees hired full-time for normal operations of the system. |
employees_part_time_num |
integer |
Number employees regularly employed on a part-time basis. Exclude employees hired for short periods of time to complete special jobs. |
payroll_maintenance |
number |
The amount of payroll spent on plant maintenance. |
payroll_operations |
number |
The amount of payroll spent on plant operations. |
payroll_other_accounts |
number |
The amount of plant payroll spent on accounts other than maintenance and operations. |
plant_name_rus |
string |
Name of the plant as reported to RUS. |
plant_type |
string |
Type of plant. |
report_date |
date |
Date reported. |
core_rus12__yearly_plant_operations_by_borrower#
Noticed any irregularities with this table? File a report.
Annual time series of borrower portion of plant operational data including fuel consumption and operational hours.
- Most-recent data:
2021
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part D, E, F (CC), F (IC) & G - Section A)
- Primary key:
This table has no primary key. This table has no primary key because there are a handful of plants that have duplicate records. The primary key of this table otherwise would be: [report_date, borrower_id_rus, plant_name_rus, plant_name_rus, unit_id_rus, plant_type, is_full_ownership_portion, is_partly_owned_by_borrower].
Additional Details#
The data in this table comes from five different portions of RUS 12 corresponding to different plant types (steam, hydroelectric, combined_cycle, internal_combustion and nuclear). The original form includes plant operations data for each plant type reported in multiple ways: with records corresponding to the portion of plants that borrowers own as well as the whole plant. We split these records into two tables, core_rus12__yearly_plant_operations_by_borrower, which contains records reported at the borrower level, and core_rus12__yearly_plant_operations_by_plant, which contains records reported at the plant level.Records that are wholly owned by one borrower show up in both core_rus12__yearly_plant_operations_by_borrower and core_rus12__yearly_plant_operations_by_plant.
There are two boolean columns used to delineate which records are associated with the borrowers’ share vs the whole plant - which is documented in _OR_PowerSupply Plant File Documentation.rtf in the newer years in the RUS 12 archive. One of these two fields - is_partly_owned_by_borrower - was not reported before 2009. For the pre-2009 years, we assume that all records that report TRUE for is_full_ownership_portion should end up in the by-plant table while all records should end up in the by-borrower portion of the table.Like the post-2009 records, this involves records from the original tables ending up in both of these PUDL tables.
Also there are two cleaning steps that we performed that alter the original data slightly:
One dropped with unexpected ownership label and duplicate data: There is a Wisdom steam plant record that is labeled to be both fully owned by borrower and partly owned for one year, which is an unexpected combo based on the _OR_PowerSupply Plant File Documentation.rst documentation file in the rus12 archive. Luckily this plant has exactly the same records as the other Wisdom steam plant that year with more expected ownership labels.* Replaced two string values for unit_id_rus: There are two instances from 2018 of unit_id_rus’s that have string values in them -
WSL GT 12andWSL ST 10. Based on pre-cleaned data, we were able to clearly identify that we can use just the numeric values in these bad strings - 12 and 10 respectively. This enables us to have an integer type for this unit_id_rus column.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus12__yearly_plant_operations_by_borrower.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus12__yearly_plant_operations_by_borrower.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus12__yearly_plant_operations_by_borrower.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus12__yearly_plant_operations_by_borrower.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
capacity_mw |
number |
Total installed (nameplate) capacity, in megawatts. |
fuel_consumption_coal_lbs |
number |
Annual pounds of coal consumed for fuel.This field is only reported for plant_type steam. |
fuel_consumption_gas_cubic_feet |
number |
Annual cubic feet of natural gas consumed for fuel.This field is only reported for plant_type’s combined_cycle, combined_cycle and steam. |
fuel_consumption_oil_gallons |
number |
Annual gallons of oil consumed for fuel.This field is only reported for plant_type’s combined_cycle, combined_cycle and steam. |
fuel_consumption_other |
number |
Annual other fuel consumed. Neither units nor type of fuel are documented.This field is only reported for plant_type’s combined_cycle, combined_cycle and steam. |
gross_generation_mwh |
number |
Gross electricity generation for the specified period in megawatt-hours (MWh). |
is_full_ownership_portion |
boolean |
Whether or not the plant record represents the full plant - regardless of whether its fully owned by the borrower. |
is_partly_owned_by_borrower |
boolean |
Whether or not the plant record is partially owned by the borrower. This column was not reported before 2009. |
operating_hours_in_service |
number |
Number of operating hours in service. |
operating_hours_on_standby |
number |
Number of operating hours on standby. |
operating_hours_out_of_service_scheduled |
number |
Number of operating hours out of service which were scheduled. |
operating_hours_out_of_service_unscheduled |
number |
Number of operating hours out of service which were unscheduled. |
ownership_pct |
number |
Percentage of the plant owned by the respondent. |
plant_name_rus |
string |
Name of the plant as reported to RUS. |
plant_type |
string |
Type of plant. |
report_date |
date |
Date reported. |
times_started |
number |
Number of times the plant was started. This field is only reported for plant_type’s steam and nuclear. |
unit_id_rus |
integer |
RUS-assigned unit identification code. |
core_rus12__yearly_plant_operations_by_plant#
Noticed any irregularities with this table? File a report.
Annual time series of whole plant operational data including fuel consumption and operational hours.
- Most-recent data:
2021
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part D, E, F (CC), F (IC) & G - Section A)
- Primary key:
This table has no primary key. This table has no primary key because there are a handful of plants that have duplicate records. The primary key of this table otherwise would be: [report_date, borrower_id_rus, plant_name_rus, plant_name_rus, unit_id_rus, plant_type, is_full_ownership_portion, is_partly_owned_by_borrower].
Additional Details#
The data in this table comes from five different portions of RUS 12 corresponding to different plant types (steam, hydroelectric, combined_cycle, internal_combustion and nuclear). The original form includes plant operations data for each plant type reported in multiple ways: with records corresponding to the portion of plants that borrowers own as well as the whole plant. We split these records into two tables, core_rus12__yearly_plant_operations_by_borrower, which contains records reported at the borrower level, and core_rus12__yearly_plant_operations_by_plant, which contains records reported at the plant level.Records that are wholly owned by one borrower show up in both core_rus12__yearly_plant_operations_by_borrower and core_rus12__yearly_plant_operations_by_plant.
There are two boolean columns used to delineate which records are associated with the borrowers’ share vs the whole plant - which is documented in _OR_PowerSupply Plant File Documentation.rtf in the newer years in the RUS 12 archive. One of these two fields - is_partly_owned_by_borrower - was not reported before 2009. For the pre-2009 years, we assume that all records that report TRUE for is_full_ownership_portion should end up in the by-plant table while all records should end up in the by-borrower portion of the table.Like the post-2009 records, this involves records from the original tables ending up in both of these PUDL tables.
Also there are two cleaning steps that we performed that alter the original data slightly:
One dropped with unexpected ownership label and duplicate data: There is a Wisdom steam plant record that is labeled to be both fully owned by borrower and partly owned for one year, which is an unexpected combo based on the _OR_PowerSupply Plant File Documentation.rst documentation file in the rus12 archive. Luckily this plant has exactly the same records as the other Wisdom steam plant that year with more expected ownership labels.* Replaced two string values for unit_id_rus: There are two instances from 2018 of unit_id_rus’s that have string values in them -
WSL GT 12andWSL ST 10. Based on pre-cleaned data, we were able to clearly identify that we can use just the numeric values in these bad strings - 12 and 10 respectively. This enables us to have an integer type for this unit_id_rus column..
Plant-level data can be matched to the :ref:core_rus12__yearly_plant_labor` and forthcoming core_rus12__yearly_plant_factors_and_maximum_demand, which report Section B and C data for all plants. The RUS instructions note that:
you should use caution when using total plant data since there are cases where more than one Borrower shares units at the same plant which means that you will be getting duplicate plant total records (and there is no guarantee that the total plant records entered by two borrowers for the same plant will be identical).
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus12__yearly_plant_operations_by_plant.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus12__yearly_plant_operations_by_plant.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus12__yearly_plant_operations_by_plant.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus12__yearly_plant_operations_by_plant.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
capacity_mw |
number |
Total installed (nameplate) capacity, in megawatts. |
fuel_consumption_coal_lbs |
number |
Annual pounds of coal consumed for fuel.This field is only reported for plant_type steam. |
fuel_consumption_gas_cubic_feet |
number |
Annual cubic feet of natural gas consumed for fuel.This field is only reported for plant_type’s combined_cycle, combined_cycle and steam. |
fuel_consumption_oil_gallons |
number |
Annual gallons of oil consumed for fuel.This field is only reported for plant_type’s combined_cycle, combined_cycle and steam. |
fuel_consumption_other |
number |
Annual other fuel consumed. Neither units nor type of fuel are documented.This field is only reported for plant_type’s combined_cycle, combined_cycle and steam. |
gross_generation_mwh |
number |
Gross electricity generation for the specified period in megawatt-hours (MWh). |
is_partly_owned_by_borrower |
boolean |
Whether or not the plant record is partially owned by the borrower. This column was not reported before 2009. |
operating_hours_in_service |
number |
Number of operating hours in service. |
operating_hours_on_standby |
number |
Number of operating hours on standby. |
operating_hours_out_of_service_scheduled |
number |
Number of operating hours out of service which were scheduled. |
operating_hours_out_of_service_unscheduled |
number |
Number of operating hours out of service which were unscheduled. |
ownership_pct |
number |
Percentage of the plant owned by the respondent. |
plant_name_rus |
string |
Name of the plant as reported to RUS. |
plant_type |
string |
Type of plant. |
report_date |
date |
Date reported. |
times_started |
number |
Number of times the plant was started. This field is only reported for plant_type’s steam and nuclear. |
unit_id_rus |
integer |
RUS-assigned unit identification code. |
core_rus12__yearly_renewable_plants#
Noticed any irregularities with this table? File a report.
Annual time series of renewable energy plant generation information for RUS borrowers.
- Most-recent data:
2021
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part C RE)
- Primary key:
report_date, borrower_id_rus, plant_name_rus, prime_mover_id
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus12__yearly_renewable_plants.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus12__yearly_renewable_plants.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus12__yearly_renewable_plants.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus12__yearly_renewable_plants.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
capacity_factor |
number |
Fraction of potential generation that was actually reported for a plant part. Energy generated over time period / nameplate capacity * time period (hours/years/etc.). |
capacity_mw |
number |
Total installed (nameplate) capacity, in megawatts. |
comments |
string |
General comments field. |
employees_num |
integer |
Number of employees. |
invested |
number |
The amount of money invested. |
net_generation_mwh |
number |
Net electricity generation for the specified period in megawatt-hours (MWh). |
opex_per_mwh |
number |
Total production expenses (USD per MWh generated). |
ownership_pct |
number |
Percentage of the plant owned by the respondent. |
plant_name_rus |
string |
Name of the plant as reported to RUS. |
power_cost_per_mwh |
number |
The cost of power per mwh. |
primary_renewable_fuel_type |
string |
Primary renewable fuel type used by the plant. |
primary_renewable_fuel_type_id |
integer |
Unique numeric identifier for each renewable fuel type. |
prime_mover_id |
integer |
Unique numeric identifier for each prime mover type used by RUS borrowers. |
prime_mover_type |
string |
Type of prime mover (e.g. Hydro, Internal Combustion). |
renewable_fuel_pct |
number |
Percentage of renewable fuel used. |
report_date |
date |
Date reported. |
rus_funding |
number |
Amount of funding received from the Rural Utilities Service (RUS). |
core_rus12__yearly_sources_and_distribution#
Noticed any irregularities with this table? File a report.
Annual time series of MWh and cost of energy sources and distribution by RUS borrowers.
- Most-recent data:
2021
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part C)
- Primary key:
report_date, borrower_id_rus, source_of_energy
Additional Details#
See the sources_and_distribution_by_plant_type table for a breakdown of plant-type-specific cost, capacity, plant_num, and net_energy_received values. Also note that there are several source_of_energy values that don’t have a corresponding cost value.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus12__yearly_sources_and_distribution.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus12__yearly_sources_and_distribution.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus12__yearly_sources_and_distribution.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus12__yearly_sources_and_distribution.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
cost |
number |
Cost value. |
net_energy_received_mwh |
number |
The net amount of energy received into the system. |
report_date |
date |
Date reported. |
source_of_energy |
string |
The source of energy (not plant type). |
core_rus12__yearly_sources_and_distribution_by_plant_type#
Noticed any irregularities with this table? File a report.
Annual time series of capacity, plant num, MWh, and cost of energy by plant type for RUS borrowers.
- Most-recent data:
2021
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part C)
- Primary key:
report_date, borrower_id_rus, plant_type
Additional Details#
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus12__yearly_sources_and_distribution_by_plant_type.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus12__yearly_sources_and_distribution_by_plant_type.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus12__yearly_sources_and_distribution_by_plant_type.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus12__yearly_sources_and_distribution_by_plant_type.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
capacity_mw |
number |
Total installed (nameplate) capacity, in megawatts. |
cost |
number |
Cost value. |
net_energy_received_mwh |
number |
The net amount of energy received into the system. |
plant_num |
integer |
Number of plants. |
plant_type |
string |
Type of plant. |
report_date |
date |
Date reported. |
core_rus12__yearly_statement_of_operations#
Noticed any irregularities with this table? File a report.
Annual time series of opex and cost of electric service for RUS borrowers by time period.
- Most-recent data:
2021
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part A - Section A)
- Primary key:
report_date, borrower_id_rus, opex_group, opex_type
Usage Warnings#
Some columns contain subtotals; use caution when choosing columns to aggregate.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus12__yearly_statement_of_operations.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus12__yearly_statement_of_operations.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus12__yearly_statement_of_operations.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus12__yearly_statement_of_operations.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
is_total |
boolean |
Whether or not this record represents a total. |
opex_group |
string |
High level section from the statement of operations table. Most of these types have subcomponents broken out in the |
opex_report_month |
number |
Amount of operational expense, cost or income during the report month. |
opex_type |
string |
Type of item from the statement of operations. |
opex_ytd |
number |
The year-to-date amount of operational expense, cost or income. |
opex_ytd_budget |
number |
The year-to-date budget for amount of operational expense, cost or income. |
report_date |
date |
Date reported. |
core_rus12__yearly_utility_plant_changes#
Noticed any irregularities with this table? File a report.
Annual time series of changes in utility plant
- Most-recent data:
2021
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part H - Section A)
- Primary key:
report_date, borrower_id_rus, utility_plant_group, utility_plant_item
Usage Warnings#
Some columns contain subtotals; use caution when choosing columns to aggregate.
Additional Details#
There are a few layers of nesting in this table, which can be distinguished by the categories in the utility_plant_group column:
intangible_plant: only has one total utility_plant_item
production_plant: has several sub-component utility_plant_item and a standard total
transmission_plant: has several sub-component utility_plant_item and a standard total
distribution_plant: has several sub-component utility_plant_item and a standard total
electric_plant_in_service: has two sub-components and a total which includes those sub-components and the totals of the above categories
utility_plant_in_service: has several sub-components and the total is the sum of those sub-components as well as the total of electric_plant_in_service
total_utility_plant: has one sub-component and the total is the sum of the sub-component and utility_plant_in_service
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus12__yearly_utility_plant_changes.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus12__yearly_utility_plant_changes.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus12__yearly_utility_plant_changes.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus12__yearly_utility_plant_changes.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
additions |
number |
Cost of acquisition of items classified within the account. |
adjustments_and_transfers |
number |
Amount of adjustments and transfers within a class of assets. |
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
ending_balance |
number |
Account balance at end of year. |
is_total |
boolean |
Whether or not this record represents a total. |
report_date |
date |
Date reported. |
retirements |
number |
Cost of disposal of items classified within the account. |
utility_plant_group |
string |
High-level category of utility plant asset type. |
utility_plant_item |
string |
Sub-category of utility_plant_group describing utility plant asset item. |
core_rus7__entity_borrowers#
Noticed any irregularities with this table? File a report.
Entity table containing static information about active RUS borrowers.
- Most-recent data:
2021
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
USDA RUS Form 7 – Financial and Operating Report: Electric Distribution
- Primary key:
borrower_id_rus
Usage Warnings#
Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.
Additional Details#
This table contains canonical values for borrowers are set. It contains values which are expected to remain fixed over time.RUS reports many attributes in many different tables
across throughout RUS-7 and RUS-12. In order to compile tidy, well-normalized database
tables, PUDL collects all instances of these values and and chooses a canonical value.
By default, PUDL chooses the most consistently reported value of a given attribute as
long as it is at least 70% of the given instances reported. For the borrower_name_rus
PUDL chooses the most consistently reported value regardless of if it meets this 70%
threshold so that all borrowers will have a name. We chose this because most name
changes were insignificant (eg. “and” changed to “&” or “coop” changed to “cooperative”).
All tables downstream of this one inherit the canonical values established
here..
This table contains all of the Active Distribution Borrowers as of each report year who were eligible to report to RUS Form 7. If these Borrowers have reported to RUS they will have records in the enclosed data tables, however a small number of these Borrowers did not report for various reasons and these Borrowers will not be represented in any of the other tables.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus7__entity_borrowers.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus7__entity_borrowers.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus7__entity_borrowers.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus7__entity_borrowers.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
borrower_name_rus |
string |
The name of the RUS (Rural Utilities Service) borrower. |
state |
string |
Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands. |
core_rus7__yearly_balance_sheet_assets#
Noticed any irregularities with this table? File a report.
Annual time series of assets and other debts from the balance sheet.
- Most-recent data:
2021
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part C)
- Primary key:
report_date, borrower_id_rus, asset_type
Usage Warnings#
Some columns contain subtotals; use caution when choosing columns to aggregate.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus7__yearly_balance_sheet_assets.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus7__yearly_balance_sheet_assets.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus7__yearly_balance_sheet_assets.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus7__yearly_balance_sheet_assets.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
asset_type |
string |
Type of asset being reported to the core_rus7__yearly_balance_sheet_assets table. |
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
ending_balance |
number |
Account balance at end of year. |
is_total |
boolean |
Whether or not this record represents a total. |
report_date |
date |
Date reported. |
core_rus7__yearly_balance_sheet_liabilities#
Noticed any irregularities with this table? File a report.
Annual time series of liabilities and other credits from the balance sheet.
- Most-recent data:
2021
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part C)
- Primary key:
report_date, borrower_id_rus, liability_type
Usage Warnings#
Some columns contain subtotals; use caution when choosing columns to aggregate.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus7__yearly_balance_sheet_liabilities.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus7__yearly_balance_sheet_liabilities.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus7__yearly_balance_sheet_liabilities.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus7__yearly_balance_sheet_liabilities.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
ending_balance |
number |
Account balance at end of year. |
is_total |
boolean |
Whether or not this record represents a total. |
liability_type |
string |
Type of liability being reported to the core_rus7__yearly_balance_sheet_liabilities table. |
report_date |
date |
Date reported. |
core_rus7__yearly_customer_energy_efficiency_and_conservation_loans#
Noticed any irregularities with this table? File a report.
Annual time series of the repayment status of loans made by a borrower to customers for investments in energy efficiency and conservation initiatives.
- Most-recent data:
2021
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part J)
- Primary key:
report_date, borrower_id_rus, loan_status
Additional Details#
Energy Resources Conservation (ERC) loans are defined by the RUS as loans made by an RUS borrower to its consumers for the cost of labor and materials for the following energy conservation measures: caulking, weather-stripping, ceiling insulation, wall insulation, floor insulation, duct insulation, pipe insulation, water heater insulation storm windows, thermal windows, storm or thermal doors, clock thermostats and attic ventilation fans.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus7__yearly_customer_energy_efficiency_and_conservation_loans.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus7__yearly_customer_energy_efficiency_and_conservation_loans.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus7__yearly_customer_energy_efficiency_and_conservation_loans.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus7__yearly_customer_energy_efficiency_and_conservation_loans.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
actual_pct |
number |
Observed percentage. |
anticipated_pct |
number |
Expected percentage. |
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
loan_status |
string |
The repayment status of a loan. |
report_date |
date |
Date reported. |
ytd_dollars |
number |
Balance this current year, in U.S. dollars. |
core_rus7__yearly_distribution_services#
Noticed any irregularities with this table? File a report.
Annual time series of distribution services.
- Most-recent data:
2021
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part B)
- Primary key:
report_date, borrower_id_rus, service_status
Usage Warnings#
Some columns contain subtotals; use caution when choosing columns to aggregate.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus7__yearly_distribution_services.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus7__yearly_distribution_services.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus7__yearly_distribution_services.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus7__yearly_distribution_services.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
is_total |
boolean |
Whether or not this record represents a total. |
report_date |
date |
Date reported. |
service_status |
string |
Status of services (e.g., idle, retired) in report period. Idle services exclude seasonals. |
services |
number |
Number of services in system at end of year. |
core_rus7__yearly_employee_statistics#
Noticed any irregularities with this table? File a report.
Annual time series of statistics about employment and payroll.
- Most-recent data:
2021
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part H)
- Primary key:
report_date, borrower_id_rus
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus7__yearly_employee_statistics.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus7__yearly_employee_statistics.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus7__yearly_employee_statistics.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus7__yearly_employee_statistics.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
employee_hours_worked_over_time |
integer |
The number of overtime hours worked by employees. |
employee_hours_worked_regular_time |
integer |
The number of regular (non-overtime) hours worked by employees. |
employees_fte_num |
integer |
The number of full time employees. |
payroll_capitalized |
integer |
The amount of payroll spent that was capitalized. |
payroll_expensed |
integer |
The amount of payroll spent that was expensed. |
payroll_other |
integer |
The amount of payroll spent that was funded by other means - not capitalized or expensed. |
report_date |
date |
Date reported. |
core_rus7__yearly_energy_efficiency#
Noticed any irregularities with this table? File a report.
Annual time series of investments in and impacts of energy efficiency programs.
- Most-recent data:
2021
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part P)
- Primary key:
report_date, borrower_id_rus, customer_class, observation_period
Usage Warnings#
The savings_mmbtu likely contains values with incorrect units.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus7__yearly_energy_efficiency.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus7__yearly_energy_efficiency.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus7__yearly_energy_efficiency.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus7__yearly_energy_efficiency.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
customer_class |
string |
High level categorization of customer type. |
customers_num |
number |
Number of customers. |
invested |
number |
The amount of money invested. |
observation_period |
string |
The date range that any given record pertains to. Ex: ‘december’ implies that this record covers the month of December only, while ‘avg’ implies this record pertains to the average of the reporting period. |
report_date |
date |
Date reported. |
savings_mmbtu |
number |
The estimated amount of energy savings from energy efficiency programs. Warning: We found values much larger than expected that we have not yet cleaned - this is likely a reporting unit error. |
core_rus7__yearly_energy_purchased#
Noticed any irregularities with this table? File a report.
Annual time series of energy purchased by RUS borrowers.
- Most-recent data:
2020
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part K)
- Primary key:
This table has no primary key. The primary key would probably be report_date, borrower_id_rus, fuel_type_code, supplier_code_rus, renewable_energy_program if not for certain EIA utilities represented as Miscellaneous (supplier code 700000).
Additional Details#
A number of records report purchases from an unspecified entity. These fields all have reported utility names beginning with an *, including *Miscellaneous, *Adjustments, and *Residential Renewable Supplier.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus7__yearly_energy_purchased.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus7__yearly_energy_purchased.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus7__yearly_energy_purchased.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus7__yearly_energy_purchased.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
average_energy_cost_dollars_per_mwh |
number |
The average cost of energy per MWh. |
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
comments |
string |
General comments field. |
fuel_cost_adjustment |
number |
The variable fuel surcharge component of a distribution cooperative’s wholesale purchased power bill, reflecting pass-through of actual fuel cost fluctuations from the supplying utility, reported separately from base power charges and wheeling costs. Included in the total cost. |
fuel_type_code_rus |
integer |
Unique numeric identifier for RUS fuel types. |
is_supplier_eia_respondent |
boolean |
Whether the utility supplying energy to a RUS borrower is an EIA respondent. |
purchased_energy_cost_total |
number |
The total cost of purchased energy. Includes fuel cost adjustment and wheeling and other charges. |
purchased_mwh |
number |
The total electricity purchased. |
report_date |
date |
Date reported. |
supplier_code_rus |
string |
Unique numeric identifier for the utility supplying energy to a RUS borrower. |
utility_name_eia |
string |
The name of the utility. |
wheeling_and_other_charges |
number |
The cost of wheeling and other charges or credits related to fuel. Included in the total cost. |
core_rus7__yearly_external_financial_risk_ratio#
Noticed any irregularities with this table? File a report.
Annual time series of ratio of investments and loan guarantee balances to total utility plant assets.
- Most-recent data:
2021
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part Q - Section III)
- Primary key:
report_date, borrower_id_rus
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus7__yearly_external_financial_risk_ratio.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus7__yearly_external_financial_risk_ratio.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus7__yearly_external_financial_risk_ratio.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus7__yearly_external_financial_risk_ratio.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
external_financial_risk_ratio |
number |
total investments + loan guarantee balances / total utility plant assets. This ratio shows how much a utility is financially exposed to outside entities relative to its own assets. |
report_date |
date |
Date reported. |
core_rus7__yearly_investments#
Noticed any irregularities with this table? File a report.
Annual time series of investments, loan guarantees and loans.
- Most-recent data:
2021
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part P - Section I)
- Primary key:
This table has no primary key. This is a list of all investments or loans in each year and borrowers can have multiple records with the same
investment_description.
Additional Details#
Reporting of investments is required by 7 CFR 1717, Subpart N. Investment categories reported on this Part correspond to Balance Sheet items in Part C.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus7__yearly_investments.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus7__yearly_investments.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus7__yearly_investments.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus7__yearly_investments.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
excluded_investments |
number |
Excluded investment. |
for_rural_development |
boolean |
Whether or not the investment or loan is for rural development. This includes investments in any/all types of projects or products that were made to improve the economy and/or quality of life in the specified area. |
included_investments |
number |
Included investment. |
income_or_loss |
number |
Income or loss from investment. |
investment_description |
string |
Description of investment. This is a free-form text field and thus contains a wide variety of values. |
investment_type_code |
integer |
Investment type code. |
report_date |
date |
Date reported. |
core_rus7__yearly_loans#
Noticed any irregularities with this table? File a report.
Annual time series of loans provided by RUS borrowers.
- Most-recent data:
2021
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part Q - Sections II & IV)
- Primary key:
This table has no primary key. Borrowers may receive multiple loans from
lending_organizationsin a given year.
Additional Details#
This table also includes loan guarantees where the RUS borrower backs a loan from another entity and is therefore liable to pay any remaining balance should the original borrower default.
In 2006, the loan maturity date for borrower ND0051’s loan from ERC - Paulson, David was reported as 2/8/2820. There is no clear way to determine the correct maturity date given that 2006 is the first year of data we have and the same loan does not appear in future years. For this reason we’ve nulled the date.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus7__yearly_loans.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus7__yearly_loans.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus7__yearly_loans.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus7__yearly_loans.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
for_rural_development |
boolean |
Whether or not the investment or loan is for rural development. This includes investments in any/all types of projects or products that were made to improve the economy and/or quality of life in the specified area. |
is_loan_guarantee |
boolean |
Indicates a third-party loan that the reporting utility (referred as a borrower) has co-signed, taking on responsibility for repayment if the primary borrower defaults. |
loan_balance |
number |
The amount of money still owned on a loan at the end of the reporting year. |
loan_maturity_date |
date |
The date on which a loan is scheduled to be fully paid. |
loan_original_amount |
number |
The original amount of a loan. |
loan_recipient |
string |
The organization that received a loan. |
report_date |
date |
Date reported. |
core_rus7__yearly_long_term_debt#
Noticed any irregularities with this table? File a report.
Annual time series of long term debt and debt service requirements.
- Most-recent data:
2021
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part N)
- Primary key:
This table has no primary key. This table has no native primary key. It is a list of all debts in each year and borrowers can have multiple records with the same
investment_description.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus7__yearly_long_term_debt.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus7__yearly_long_term_debt.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus7__yearly_long_term_debt.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus7__yearly_long_term_debt.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
debt_description |
string |
Description of debt or loan. On the original form, there are nine provided descriptions and a section to add other free-form descriptions. |
debt_ending_balance |
number |
The amount of principal still owned on the debt at the end of the report year. |
debt_interest |
number |
The interest expense on the debt for the report year. |
debt_principal |
number |
The principal paid on the debt during the report year. |
debt_total |
number |
The total amount of debt. |
report_date |
date |
Date reported. |
core_rus7__yearly_long_term_leases#
Noticed any irregularities with this table? File a report.
Annual time series of long term leases by property type.
- Most-recent data:
2021
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part L)
- Primary key:
This table has no primary key. Borrowers may receive multiple leases from
lending_organizationsin a given year.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus7__yearly_long_term_leases.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus7__yearly_long_term_leases.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus7__yearly_long_term_leases.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus7__yearly_long_term_leases.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
lending_organization |
string |
The organization that provided a lease or loan. |
property_type |
string |
The type of property leased. |
rental_cost_ytd |
number |
Year-to-date rental cost for leased property. |
report_date |
date |
Date reported. |
core_rus7__yearly_materials_and_supplies#
Noticed any irregularities with this table? File a report.
Annual time series of cost of electric vs. other materials that were purchased, salvaged, used, or sold.
- Most-recent data:
2021
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part F)
- Primary key:
report_date, borrower_id_rus, electric_or_other_materials
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus7__yearly_materials_and_supplies.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus7__yearly_materials_and_supplies.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus7__yearly_materials_and_supplies.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus7__yearly_materials_and_supplies.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
electric_or_other_materials |
string |
Whether the cost is for electric materials or other materials |
materials_adjustment |
number |
An adjustment value for the cost of materials and supplies. |
materials_ending_balance |
number |
The balance at the end of the report year for materials and supplies. |
materials_purchased |
number |
The cost of materials and supplies purchased. |
materials_salvaged |
number |
The cost of materials and supplies salvaged. |
materials_sold |
number |
The cost of materials and supplies sold. |
materials_used |
number |
The cost of materials and supplies used. |
report_date |
date |
Date reported. |
core_rus7__yearly_meeting_and_board#
Noticed any irregularities with this table? File a report.
Annual time series of governance information about RUS borrowers’ annual member meetings as well as information about their board.
- Most-recent data:
2021
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part M)
- Primary key:
report_date, borrower_id_rus
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus7__yearly_meeting_and_board.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus7__yearly_meeting_and_board.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus7__yearly_meeting_and_board.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus7__yearly_meeting_and_board.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
board_members_num |
integer |
The total number of board members. |
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
does_manager_have_written_contract |
boolean |
Whether or not the RUS borrower’s manager has a written contract. |
fees_and_expenses_for_board_members |
integer |
The total amount of fees and expenses for board members. |
last_annual_meeting_date |
datetime |
The date of the last annual meeting. |
members_num |
integer |
The total number of members. |
members_present_at_meeting_num |
integer |
The number of members present at the last annual meeting. |
members_voting_by_proxy_or_mail_num |
integer |
The number of members voting by mail or by proxy. |
report_date |
date |
Date reported. |
was_quorum_present |
boolean |
Whether or not quorum was met. |
core_rus7__yearly_owed_by_customers#
Noticed any irregularities with this table? File a report.
Annual time series of debt owed by customers.
- Most-recent data:
2021
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part J)
- Primary key:
report_date, borrower_id_rus
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus7__yearly_owed_by_customers.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus7__yearly_owed_by_customers.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus7__yearly_owed_by_customers.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus7__yearly_owed_by_customers.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
amount_due_over_60_days |
number |
Reported amount of dollars due over 60 days from consumers for electric service. Includes both connected and disconnected customers. |
amount_written_off_ytd |
number |
Total charges due from consumers for electric service written off during the current year to Account 144.1, representing the write-off of uncollectible accounts. |
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
report_date |
date |
Date reported. |
core_rus7__yearly_patronage_capital#
Noticed any irregularities with this table? File a report.
Annual time series of patronage capital distributed and received.
- Most-recent data:
2021
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part I)
- Primary key:
report_date, borrower_id_rus, patronage_type
Usage Warnings#
Some columns contain subtotals; use caution when choosing columns to aggregate.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus7__yearly_patronage_capital.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus7__yearly_patronage_capital.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus7__yearly_patronage_capital.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus7__yearly_patronage_capital.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
is_total |
boolean |
Whether or not this record represents a total. |
patronage_cumulative |
number |
Amount of patronage distributed or received cumulatively. Received patronage capital is not reported cumulatively and thus will be null. |
patronage_report_year |
number |
Amount of patronage distributed or received within report year. |
patronage_type |
string |
Type of patronage capital distributed or received. |
report_date |
date |
Date reported. |
core_rus7__yearly_power_requirements#
Noticed any irregularities with this table? File a report.
Annual time series of power requirements - revenue and generation summary.
- Most-recent data:
2021
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part O)
- Primary key:
report_date, borrower_id_rus
Additional Details#
This table includes totals of electric sales and revenue which also appear in core_rus7__yearly_power_requirements_electric_sales with a customer_class of total. This table includes all other power requirements - not broken out by customer class, so we include these electric requirements in this table as well.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus7__yearly_power_requirements.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus7__yearly_power_requirements.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus7__yearly_power_requirements.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus7__yearly_power_requirements.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
electric_sales_mwh |
number |
Total MWh Sold to electric sales. Total of lines 1b thru 9b on the original form. |
electric_sales_revenue |
integer |
Total Revenue Received From Sales of Electric Energy. Total of lines 1c thru 9c on the original form. |
generated_mwh |
number |
The total electricity generated. |
interchange_mwh |
number |
The net interchange of electricity. The net amount of electricity exchanged in purchases and sales. |
is_peak_coincident |
boolean |
Whether or not the peak_mw is coincident or non-coincident peak. |
other_electric_revenue |
integer |
Electric revenue other than electric_sales_revenue. |
own_use_mwh |
number |
The electricity in MWh used for the borrower’s own internal use. |
peak_mw |
number |
The peak system MWh - the sum of all MW. |
purchased_mwh |
number |
The total electricity purchased. |
purchases_and_generation_cost |
integer |
The cost of purchases and generation of electricity. |
report_date |
date |
Date reported. |
transmission_revenue |
integer |
Transmission revenue. |
core_rus7__yearly_power_requirements_electric_customers#
Noticed any irregularities with this table? File a report.
Annual time series of power requirements - number of customers served by customer type.
- Most-recent data:
2021
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part O)
- Primary key:
report_date, borrower_id_rus, customer_class, observation_period
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus7__yearly_power_requirements_electric_customers.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus7__yearly_power_requirements_electric_customers.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus7__yearly_power_requirements_electric_customers.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus7__yearly_power_requirements_electric_customers.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
customer_class |
string |
High level categorization of customer type. |
customers_num |
number |
Number of customers. |
observation_period |
string |
The date range that any given record pertains to. Ex: ‘december’ implies that this record covers the month of December only, while ‘avg’ implies this record pertains to the average of the reporting period. |
report_date |
date |
Date reported. |
core_rus7__yearly_power_requirements_electric_sales#
Noticed any irregularities with this table? File a report.
Annual time series of power requirements - revenue and energy sold by customer type.
- Most-recent data:
2021
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part O)
- Primary key:
report_date, borrower_id_rus, customer_class
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus7__yearly_power_requirements_electric_sales.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus7__yearly_power_requirements_electric_sales.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus7__yearly_power_requirements_electric_sales.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus7__yearly_power_requirements_electric_sales.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
customer_class |
string |
High level categorization of customer type. |
report_date |
date |
Date reported. |
revenue |
number |
Amount of revenue. |
sales_mwh |
number |
Quantity of electricity sold in MWh. |
core_rus7__yearly_service_interruptions#
Noticed any irregularities with this table? File a report.
Annual time series of service interruptions by cause.
- Most-recent data:
2021
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part G)
- Primary key:
report_date, borrower_id_rus, service_interruption_cause, observation_period
Usage Warnings#
Some columns contain subtotals; use caution when choosing columns to aggregate.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus7__yearly_service_interruptions.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus7__yearly_service_interruptions.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus7__yearly_service_interruptions.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus7__yearly_service_interruptions.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
is_total |
boolean |
Whether or not this record represents a total. |
observation_period |
string |
The date range that any given record pertains to. Ex: ‘december’ implies that this record covers the month of December only, while ‘avg’ implies this record pertains to the average of the reporting period. |
report_date |
date |
Date reported. |
saidi_minutes |
number |
Cumulative duration (minutes) of interruption for the average customer during the report year. |
service_interruption_cause |
string |
Source of service interruption. |
core_rus7__yearly_statement_of_operations#
Noticed any irregularities with this table? File a report.
Annual time series of opex and cost of electric service for RUS borrowers by time period.
- Most-recent data:
2021
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part A)
- Primary key:
report_date, borrower_id_rus, opex_group, opex_type
Usage Warnings#
Some columns contain subtotals; use caution when choosing columns to aggregate.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus7__yearly_statement_of_operations.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus7__yearly_statement_of_operations.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus7__yearly_statement_of_operations.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus7__yearly_statement_of_operations.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
is_total |
boolean |
Whether or not this record represents a total. |
opex_group |
string |
High level section from the statement of operations table. Most of these types have subcomponents broken out in the |
opex_report_month |
number |
Amount of operational expense, cost or income during the report month. |
opex_type |
string |
Type of item from the statement of operations. |
opex_ytd |
number |
The year-to-date amount of operational expense, cost or income. |
opex_ytd_budget |
number |
The year-to-date budget for amount of operational expense, cost or income. |
report_date |
date |
Date reported. |
core_rus7__yearly_transmission_and_distribution_mileage#
Noticed any irregularities with this table? File a report.
Annual time series of miles of transmission and distribution infrastructure.
- Most-recent data:
2021
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part B)
- Primary key:
report_date, borrower_id_rus, line_type
Usage Warnings#
Some columns contain subtotals; use caution when choosing columns to aggregate.
Additional Details#
Note that according to RUS, ‘underbuild in overhead lines or joint runs in underground installations do not increase the number of line miles except for distribution underbuild on transmission poles. In such cases, distribution pole line miles would be increased by the number of underbuild miles involved.’
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus7__yearly_transmission_and_distribution_mileage.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus7__yearly_transmission_and_distribution_mileage.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus7__yearly_transmission_and_distribution_mileage.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus7__yearly_transmission_and_distribution_mileage.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
is_total |
boolean |
Whether or not this record represents a total. |
line_type |
string |
The type of line mileage reported (e.g., transmission, overhead distribution). |
miles |
number |
Line length at the end of the reported period, in miles. |
report_date |
date |
Date reported. |
core_rus7__yearly_utility_plant_changes#
Noticed any irregularities with this table? File a report.
Annual time series of changes in utility plant
- Most-recent data:
2021
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part E)
- Primary key:
report_date, borrower_id_rus, utility_plant_group, utility_plant_item
Usage Warnings#
Some columns contain subtotals; use caution when choosing columns to aggregate.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus7__yearly_utility_plant_changes.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus7__yearly_utility_plant_changes.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus7__yearly_utility_plant_changes.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus7__yearly_utility_plant_changes.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
additions |
number |
Cost of acquisition of items classified within the account. |
adjustments_and_transfers |
number |
Amount of adjustments and transfers within a class of assets. |
borrower_id_rus |
string |
Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits. |
ending_balance |
number |
Account balance at end of year. |
is_total |
boolean |
Whether or not this record represents a total. |
report_date |
date |
Date reported. |
retirements |
number |
Cost of disposal of items classified within the account. |
utility_plant_group |
string |
High-level category of utility plant asset type. |
utility_plant_item |
string |
Sub-category of utility_plant_group describing utility plant asset item. |
core_rus__codes_fuel_types#
Noticed any irregularities with this table? File a report.
Code table containing descriptions of categorical codes for fuel types in RUS.
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
RUS – Mix of multiple RUS forms
- Primary key:
code
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus__codes_fuel_types.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus__codes_fuel_types.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus__codes_fuel_types.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus__codes_fuel_types.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
code |
string |
Originally reported short code. |
description |
string |
Long human-readable description of the meaning of a code/label. |
core_rus__codes_investment_types#
Noticed any irregularities with this table? File a report.
Code table containing descriptions of categorical codes for investment types.
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
RUS – Mix of multiple RUS forms
- Primary key:
code
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus__codes_investment_types.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus__codes_investment_types.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus__codes_investment_types.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_rus__codes_investment_types.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
code |
string |
Originally reported short code. |
description |
string |
Long human-readable description of the meaning of a code/label. |
core_sec10k__assn_exhibit_21_subsidiaries_and_eia_utilities#
Noticed any irregularities with this table? File a report.
Association table providing connections between subsidiaries listed in Exhibit 21 and EIA utilities.
- Most-recent data:
2023
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
U.S. Securities and Exchange Commission (SEC) Form 10-K
- Primary key:
subsidiary_company_id_sec10k
Usage Warnings#
PUDL does not currently update its copy of this data.
Additional Details#
An Exhibit 21 subsidiary is considered matched to an EIA utility if their names are identical. Only subsidiaries that don’t file SEC 10-K themselves are included in this table. SEC 10-K filers have much more information available and are matched to EIA utilities using probabilistic record linkage in core_sec10k__assn_sec10k_filers_and_eia_utilities.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_sec10k__assn_exhibit_21_subsidiaries_and_eia_utilities.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_sec10k__assn_exhibit_21_subsidiaries_and_eia_utilities.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_sec10k__assn_exhibit_21_subsidiaries_and_eia_utilities.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_sec10k__assn_exhibit_21_subsidiaries_and_eia_utilities.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
subsidiary_company_id_sec10k |
string |
PUDL-assigned ID for subsidiaries found in SEC 10-K Exhibit 21. The ID is created by concatenating the CIK of the company whose filing the subsidiary was found in, the subsidiary company’s name, and location of incorporation. It is not guaranteed to be stable across different releases of PUDL and so should never be hard-coded in analyses. |
utility_id_eia |
integer |
The EIA Utility Identification number. |
core_sec10k__assn_exhibit_21_subsidiaries_and_filers#
Noticed any irregularities with this table? File a report.
Association table providing connections between subsidiaries listed in Exhibit 21 and their SEC central index key, if the subsidiary also files Form 10-K.
- Most-recent data:
2023
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
U.S. Securities and Exchange Commission (SEC) Form 10-K
- Primary key:
subsidiary_company_id_sec10k
Usage Warnings#
PUDL does not currently update its copy of this data.
Additional Details#
Exhibit 21 subsidiaries and SEC 10-K filers are considered matched if they have identical names and the same location of incorporation.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_sec10k__assn_exhibit_21_subsidiaries_and_filers.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_sec10k__assn_exhibit_21_subsidiaries_and_filers.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_sec10k__assn_exhibit_21_subsidiaries_and_filers.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_sec10k__assn_exhibit_21_subsidiaries_and_filers.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
central_index_key |
string |
Identifier of the company in SEC database. |
subsidiary_company_id_sec10k |
string |
PUDL-assigned ID for subsidiaries found in SEC 10-K Exhibit 21. The ID is created by concatenating the CIK of the company whose filing the subsidiary was found in, the subsidiary company’s name, and location of incorporation. It is not guaranteed to be stable across different releases of PUDL and so should never be hard-coded in analyses. |
core_sec10k__assn_sec10k_filers_and_eia_utilities#
Noticed any irregularities with this table? File a report.
Association table providing connections between SEC 10-K filers and EIA utilities.
- Most-recent data:
2023
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
U.S. Securities and Exchange Commission (SEC) Form 10-K
- Primary key:
central_index_key, utility_id_eia
Usage Warnings#
This data has been matched probabilistically using a machine learning model and contains some incompletions and errors. It should not be treated as ground truth data.
PUDL does not currently update its copy of this data.
Additional Details#
SEC central index keys are matched to EIA
utility IDs using probabilistic record linkage based on associated company information
like company name, business and mailing addresses, and state of incorporation.
The match between central_index_key and utility_id_eia is one to one and is not
allowed to change over time. In cases where there were multiple candidate matches,
the match with the highest probability is selected.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_sec10k__assn_sec10k_filers_and_eia_utilities.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_sec10k__assn_sec10k_filers_and_eia_utilities.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_sec10k__assn_sec10k_filers_and_eia_utilities.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_sec10k__assn_sec10k_filers_and_eia_utilities.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
central_index_key |
string |
Identifier of the company in SEC database. |
utility_id_eia |
integer |
The EIA Utility Identification number. |
core_sec10k__changelog_company_name#
Noticed any irregularities with this table? File a report.
Changelog table tracking changes in the names each SEC 10-K filer has used.
- Most-recent data:
2023
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
U.S. Securities and Exchange Commission (SEC) Form 10-K
- Primary key:
This table has no primary key.
Usage Warnings#
PUDL does not currently update its copy of this data.
Additional Details#
This table is extracted from the same SEC 10-K filing header information as core_sec10k__quarterly_company_information. Each filing reports the full history of name change associated with a company up to the date of that filing. Because individual companies may appear in multiple filings in the same year, and the same historical name changes will be reported in multiple years, the raw input data contains many duplicate entries, which are deduplicated to create this table. The original name change data only contains the former name and the date of the change.
Roughly 2% of all records describe multiple name changes happening on the same date
(they are duplicates on the basis of central_index_key and name_change_date).
This may be due to company name reporting inconsistencies or reporting errors in which
the old and new company names have been swapped.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_sec10k__changelog_company_name.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_sec10k__changelog_company_name.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_sec10k__changelog_company_name.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_sec10k__changelog_company_name.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
central_index_key |
string |
Identifier of the company in SEC database. |
company_name |
string |
Name of the reporting company. |
company_name_old |
string |
Name of company prior to name change. |
name_change_date |
date |
Date of last name change of the company. |
core_sec10k__quarterly_company_information#
Noticed any irregularities with this table? File a report.
Quarterly time series of company and filing information extracted from SEC 10-K filing headers.
- Most-recent data:
2023
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
U.S. Securities and Exchange Commission (SEC) Form 10-K
- Primary key:
filename_sec10k, central_index_key
Usage Warnings#
PUDL does not currently update its copy of this data.
Additional Details#
While the SEC 10-K filings are submitted by a single company, they often contain
references to many other related companies. Information about these companies and the
filing itself are contained in text headers at the beginning of the filing. This table
contains data extracted from those headers. Each record in this table represents a
single observation of a company in a single filing. The filer_count indicates which
referenced company within a filing header the record corresponds to.
Because the same company may be referenced in a number of different filings submitted in
the same reporting period or even on the same day, this table contains apparently
duplicative records about many companies, that may be distinguished only by the filename
associated with the filing they appeared in and their filer count. Note that all
references to a particular company may not be perfectly consistent across all filings in
which they appear. The various company names, addresses, and other information
associated with the company’s unique and permanent central_index_key are later used
as inputs into the probabilistic record linkage process.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_sec10k__quarterly_company_information.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_sec10k__quarterly_company_information.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_sec10k__quarterly_company_information.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_sec10k__quarterly_company_information.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
business_city |
string |
City where the company’s place of business is located. |
business_postal_code |
string |
Non-US postal code of the company’s place of business. |
business_state |
string |
State where the company’s place of business is located. |
business_street_address |
string |
Street address of the company’s place of business. |
business_street_address_2 |
string |
Second line of the street address of the company’s place of business. |
business_zip_code |
string |
Zip code of the company’s place of business. |
business_zip_code_4 |
string |
Zip code suffix of the company’s place of business. |
central_index_key |
string |
Identifier of the company in SEC database. |
company_name |
string |
Name of the reporting company. |
filename_sec10k |
string |
Unique portion of the filename associated with the SEC 10-K filing in the EDGAR database. The full source URL can be reconstructed by prepending https://www.sec.gov/Archives/edgar/data/ and adding the .txt file type extension. |
filer_count |
integer |
A counter indicating which observation of company data within an SEC 10-K filing header the record pertains to. |
filing_number_sec |
string |
Filing number used internally by the SEC commission to track filing. |
film_number |
string |
Document control number used in the SEC EDGAR database. The first four digits can be used to access scans of the document in the SEC’s Virtual Private Reference Room. |
fiscal_year_end |
string |
The end date of an SEC filing company’s fiscal year, in MMDD format. |
incorporation_state |
string |
Two letter state code where company is incorporated. |
industry_id_sic |
string |
Four-digit Standard Industrial Classification (SIC) code identifying the company’s primary industry. SIC codes have been replaced by NAICS codes in many applications, but are still used by the SEC. See e.g. https://www.osha.gov/data/sic-manual for code definitions. |
industry_name_sic |
string |
Text description of Standard Industrial Classification (SIC) |
mail_city |
string |
City of the company’s mailing address. |
mail_postal_code |
string |
Non-US postal code of the company’s mailing address. |
mail_state |
string |
State of the company’s mailing address. |
mail_street_address |
string |
Street portion of the company’s mailing address. |
mail_street_address_2 |
string |
Second line of the street portion of the company’s mailing address. |
mail_zip_code |
string |
Zip code of the company’s mailing address. |
mail_zip_code_4 |
string |
Zip code suffix of the company’s mailing address. |
phone_number |
string |
Phone number for utility contact 1. |
sec10k_type |
string |
Specific version of SEC 10-K that was filed. 10-k: the standard annual report. 10-k/a: an amended version of the annual report. 10-k405: filed to report insider trading that was not reported in a timely fashion. 10-k405/a: an amended version of the 10-k405. 10-kt: submitted in lieu of or in addition to a standard 10-K annual report when a company changes the end of its fiscal year (e.g. due to a merger) leaving the company with a longer or shorter reporting period. 10-kt/a: an amended version of the 10-kt. 10-ksb: the annual report for small businesses, also known as penny stocks. 10-ksb/a: an amended version of the 10-ksb. |
sec_act |
string |
SEC Act through which the form was enacted, e.g. 1934 act. |
taxpayer_id_irs |
string |
Taxpayer ID of the company with the IRS. |
core_sec10k__quarterly_exhibit_21_company_ownership#
Noticed any irregularities with this table? File a report.
Quarterly time series of subsidiary company ownership data from the SEC 10-K Exhibit 21 attachments.
- Most-recent data:
2023
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
U.S. Securities and Exchange Commission (SEC) Form 10-K
- Primary key:
filename_sec10k, subsidiary_company_id_sec10k
Usage Warnings#
This data has been extracted probabilistically using a machine learning model and contains some incompletions and errors. It should not be treated as ground truth data.
PUDL does not currently update its copy of this data.
Additional Details#
Exhibit 21 is an unstructured text or PDF attachment to the main SEC 10-K filing that is used to describe the subsidiaries owned by the filing company. It may or may not provide the percentage of the subsidiary that is owned by the filing company, or the location of the subsidiary.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_sec10k__quarterly_exhibit_21_company_ownership.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_sec10k__quarterly_exhibit_21_company_ownership.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_sec10k__quarterly_exhibit_21_company_ownership.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_sec10k__quarterly_exhibit_21_company_ownership.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
filename_sec10k |
string |
Unique portion of the filename associated with the SEC 10-K filing in the EDGAR database. The full source URL can be reconstructed by prepending https://www.sec.gov/Archives/edgar/data/ and adding the .txt file type extension. |
fraction_owned |
number |
Proportion of generator ownership attributable to this utility. |
subsidiary_company_id_sec10k |
string |
PUDL-assigned ID for subsidiaries found in SEC 10-K Exhibit 21. The ID is created by concatenating the CIK of the company whose filing the subsidiary was found in, the subsidiary company’s name, and location of incorporation. It is not guaranteed to be stable across different releases of PUDL and so should never be hard-coded in analyses. |
subsidiary_company_location |
string |
Location of subsidiary company. This is the full US state name or country name and occasionally a two digit code that was not mapped to a full name during cleaning. |
subsidiary_company_name |
string |
Name of subsidiary company. |
core_sec10k__quarterly_filings#
Noticed any irregularities with this table? File a report.
Quarterly time series of metadata describing SEC 10-K filings.
- Most-recent data:
2023
- Processing:
Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.
- Source:
U.S. Securities and Exchange Commission (SEC) Form 10-K
- Primary key:
filename_sec10k
Usage Warnings#
PUDL does not currently update its copy of this data.
Additional Details#
Each SEC 10-K filing is submitted by a single company, but may contain information about numerous other companies. This table indicates the company submitting the filing, as well as some information about the overall filing. Each filing is guaranteed to have a unique filename, but ~1% of all filings are one company submitting the same form multiple times on the same day, so the filename is the only available natural primary key.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_sec10k__quarterly_filings.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/core_sec10k__quarterly_filings.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_sec10k__quarterly_filings.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/core_sec10k__quarterly_filings.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
central_index_key |
string |
Identifier of the company in SEC database. |
company_name |
string |
Name of the reporting company. |
exhibit_21_version |
string |
Version of exhibit 21 submitted (if applicable). |
filename_sec10k |
string |
Unique portion of the filename associated with the SEC 10-K filing in the EDGAR database. The full source URL can be reconstructed by prepending https://www.sec.gov/Archives/edgar/data/ and adding the .txt file type extension. |
filing_date |
date |
Date on which the filing was submitted. |
report_date |
date |
Date reported. |
sec10k_type |
string |
Specific version of SEC 10-K that was filed. 10-k: the standard annual report. 10-k/a: an amended version of the annual report. 10-k405: filed to report insider trading that was not reported in a timely fashion. 10-k405/a: an amended version of the 10-k405. 10-kt: submitted in lieu of or in addition to a standard 10-K annual report when a company changes the end of its fiscal year (e.g. due to a merger) leaving the company with a longer or shorter reporting period. 10-kt/a: an amended version of the 10-kt. 10-ksb: the annual report for small businesses, also known as penny stocks. 10-ksb/a: an amended version of the 10-ksb. |
_core_eia860__cooling_equipment#
Noticed any irregularities with this table? File a report.
Information about cooling equipment at generation facilities.
- Most-recent data:
2024
- Processing:
Data has been cleaned but not tidied/normalized. Published only temporarily and may be removed without notice.
- Source:
EIA Form 860 – Annual Electric Generator Report (Schedule 6D)
- Primary key:
plant_id_eia, utility_id_eia, cooling_id_eia, report_date
Usage Warnings#
This table has not been harvested with other EIA 923 or 860 data. The same variables present in this table may show up in other _core tables in other years.
Additional Details#
A single plant can have several collections of cooling equipment, so each line in this table represents one cooling system at a given plant. You can determine how to link each set of cooling equipment to specific boilers in a plant via core_eia860__assn_boiler_cooling.
EIA-860 respondents are required to submit information about cooling equipment for plants over 100 MW - including steam, combined cycle, nuclear or solar thermal generators.
This table has been cleaned, but not tidied/normalized with other EIA 923 or 860 data. The same variables present in this table may show up in other tables and/or in other years. Once this table has been harvested, it will be converted into a``core`` table.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/_core_eia860__cooling_equipment.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/_core_eia860__cooling_equipment.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/_core_eia860__cooling_equipment.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/_core_eia860__cooling_equipment.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
chlorine_equipment_cost |
number |
Actual installed cost for the existing chlorine discharge control system or the anticipated cost to bring the chlorine discharge control system into commercial operation |
chlorine_equipment_operating_date |
date |
Actual or projected in-service date for chlorine discharge control structures and equipment |
cooling_equipment_total_cost |
number |
Actual installed cost for the existing system or the anticipated cost to bring the total system into commercial operation |
cooling_id_eia |
string |
EIA Identification code for cooling system (if multiple cooling systems are not distinguished by separate IDs, the word ‘PLANT’ is listed to encompass the cooling system for the entire plant) |
cooling_status_code |
string |
Operating status of cooling system |
cooling_system_operating_date |
date |
The actual or projected in-service datetime of this cooling system |
cooling_type_1 |
string |
Type of cooling system |
cooling_type_2 |
string |
Type of cooling system |
cooling_type_3 |
string |
Type of cooling system |
cooling_type_4 |
string |
Type of cooling system |
cooling_water_discharge |
string |
Name of river, lake, or water source that cooling water is discharged into |
cooling_water_source |
string |
Name of river, lake, or water source that provides cooling water |
county |
string |
County name. |
intake_distance_shore_feet |
number |
Maximum distance from shore to intake |
intake_distance_surface_feet |
number |
Average distance below water surface to intake |
intake_rate_100pct_gallons_per_minute |
number |
Design cooling water flow rate at 100 percent load at in-take |
outlet_distance_shore_feet |
number |
Maximum distance from shore to outlet |
outlet_distance_surface_feet |
number |
Average distance below water surface to outlet |
percent_dry_cooling |
number |
Percent of cooling load served by dry cooling components |
plant_id_eia |
integer |
The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration. |
plant_name_eia |
string |
Plant name. |
plant_summer_capacity_mw |
number |
The plant summer capacity associated with the operating generators at the plant |
pond_cost |
number |
Actual installed cost for the existing cooling ponds or the anticipated cost to bring the cooling ponds into commercial operation |
pond_operating_date |
date |
Cooling ponds actual or projected in-service date |
pond_surface_area_acres |
number |
Total surface area of cooling pond |
pond_volume_acre_feet |
number |
Total volume of water in cooling pond |
power_requirement_mw |
number |
Maximum power requirement for cooling towers at 100 percent load |
report_date |
date |
Date reported. |
state |
string |
Two letter US state abbreviation. |
steam_plant_type_code |
integer |
Code that describes types of steam plants from EIA 860. See steam_plant_types_eia table for more details. |
tower_cost |
number |
Actual installed cost for the existing cooling towers or the anticipated cost to bring the cooling towers into commercial operation |
tower_operating_date |
date |
Cooling towers actual or projected in-service date |
tower_type_1 |
string |
Types of cooling towers at this plant |
tower_type_2 |
string |
Types of cooling towers at this plant |
tower_type_3 |
string |
Types of cooling towers at this plant |
tower_type_4 |
string |
Types of cooling towers at this plant |
tower_water_rate_100pct_gallons_per_minute |
number |
Maximum design rate of water flow at 100 percent load for the cooling towers |
utility_id_eia |
integer |
The EIA Utility Identification number. |
utility_name_eia |
string |
The name of the utility. |
water_source |
string |
Name of water source associated with the plant. |
water_source_code |
string |
Type of cooling water source |
water_type_code |
string |
Type of cooling water |
_core_eia860__fgd_equipment#
Noticed any irregularities with this table? File a report.
Information about flue gas desulfurization equipment at generation facilities.
- Most-recent data:
2024
- Processing:
Data has been cleaned but not tidied/normalized. Published only temporarily and may be removed without notice.
- Source:
EIA Form 860 – Annual Electric Generator Report (Schedule 6E)
- Primary key:
plant_id_eia, so2_control_id_eia, report_date
Usage Warnings#
This table has not been harvested with other EIA 923 or 860 data. The same variables present in this table may show up in other _core tables in other years.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/_core_eia860__fgd_equipment.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/_core_eia860__fgd_equipment.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/_core_eia860__fgd_equipment.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/_core_eia860__fgd_equipment.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
byproduct_recovery |
boolean |
Is saleable byproduct recovered by the unit? |
county |
string |
County name. |
county_id_fips |
string |
County ID from the Federal Information Processing Standard Publication 6-4. |
data_maturity |
string |
Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity. |
fgd_manufacturer |
string |
Name of flue gas desulfurization equipment manufacturer. |
fgd_manufacturer_code |
string |
Code corresponding to name of flue gas desulfurization equipment manufacturer. |
fgd_operating_date |
date |
The actual or projected in-service datetime of this flue gas desulfurization system |
fgd_operational_status_code |
string |
Operating status code for flue gas desulfurization equipment. |
fgd_other_cost |
number |
Other actual installed costs for installation of a flue gas desulfurization unit or the anticipated other costs pertaining to the installation of a flue gas desulfurization unit. |
fgd_structure_cost |
number |
Actual installed costs for the existing systems or the anticipated costs of structures and equipment to bring a planned flue gas desulfurization system into commercial operation. |
fgd_trains_100pct |
number |
Total number of flue gas desulfurization unit scrubber trains operated at 100 percent load. |
fgd_trains_total |
number |
Total number of flue gas desulfurization unit scrubber trains. |
flue_gas_bypass_fgd |
boolean |
Indicates whether flue gas can bypass the flue gas desulfurization unit. |
flue_gas_entering_fgd_pct_of_total |
number |
Ratio of all flue gas that is entering the flue gas desulfurization unit. |
flue_gas_exit_rate_cubic_feet_per_minute |
number |
Actual flue gas exit rate, in cubic feet per minute. |
flue_gas_exit_temperature_fahrenheit |
number |
Flue gas exit temperature, in degrees Fahrenheit. |
plant_id_eia |
integer |
The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration. |
plant_summer_capacity_mw |
number |
The plant summer capacity associated with the operating generators at the plant |
pond_landfill_requirements_acre_foot_per_year |
number |
Annual pond and land fill requirements for flue gas desulfurization equipment. |
report_date |
date |
Date reported. |
sludge_disposal_cost |
number |
Actual installed costs for the existing sludge transport and disposal systems or the anticipated costs of sludge transport and disposal systems to bring a planned system into commercial operation. |
sludge_pond |
boolean |
Indicates if there is a sludge pond associated with this unit. |
sludge_pond_lined |
boolean |
Indicates whether the sludge pond is lined. |
so2_control_id_eia |
string |
Sulfur dioxide control identification number. This ID is not a unique identifier. |
so2_emission_rate_lbs_per_hour |
number |
Sulfur dioxide emission rate when operating at 100 percent load (pounds per hour). |
so2_equipment_type_1 |
string |
Type of sulfur dioxide control equipment. |
so2_equipment_type_2 |
string |
Type of sulfur dioxide control equipment. |
so2_equipment_type_3 |
string |
Type of sulfur dioxide control equipment. |
so2_equipment_type_4 |
string |
Type of sulfur dioxide control equipment. |
so2_removal_efficiency_design |
number |
Designed removal efficiency for sulfur dioxide when operating at 100 percent load. Reported at the nearest 0.1 percent by weight of gases removed from the flue gas. |
sorbent_type_1 |
string |
Type of sorbent used by this sulfur dioxide control equipment. |
sorbent_type_2 |
string |
Type of sorbent used by this sulfur dioxide control equipment. |
sorbent_type_3 |
string |
Type of sorbent used by this sulfur dioxide control equipment. |
sorbent_type_4 |
string |
Type of sorbent used by this sulfur dioxide control equipment. |
specifications_of_coal_ash |
number |
Design fuel specifications for ash when burning coal or petroleum coke (nearest 0.1 percent by weight). |
specifications_of_coal_sulfur |
number |
Design fuel specifications for sulfur when burning coal or petroleum coke (nearest 0.1 percent by weight). |
state |
string |
Two letter US state abbreviation. |
state_id_fips |
string |
Two digit state FIPS code. |
steam_plant_type_code |
integer |
Code that describes types of steam plants from EIA 860. See steam_plant_types_eia table for more details. |
total_fgd_equipment_cost |
number |
Total actual installed costs for the existing flue gas desulfurization unit or the anticipated costs to bring a planned flue gas desulfurization unit into commercial operation. |
utility_id_eia |
integer |
The EIA Utility Identification number. |
utility_name_eia |
string |
The name of the utility. |
water_source |
string |
Name of water source associated with the plant. |
_core_eia923__monthly_cooling_system_information#
Noticed any irregularities with this table? File a report.
Monthly time series of cooling system information at generation facilities - mainly water volumes and temperatures.
- Most-recent data:
2024
- Processing:
Data has been cleaned but not tidied/normalized. Published only temporarily and may be removed without notice.
- Source:
EIA Form 923 – Power Plant Operations Report (Schedule 8D)
- Primary key:
plant_id_eia, report_date, cooling_id_eia
Usage Warnings#
Some years use a slightly different data definition.
Additional Details#
In 2008 and 2009, EIA only reports annual averages, but in later years all data is monthly.
Note: This table has been cleaned, but not harvested with other EIA 923 or 860 data. The same variables present in this table may show up in other _core tables in other years. Once this table has been harvested, it will be removed from the PUDL database.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/_core_eia923__monthly_cooling_system_information.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/_core_eia923__monthly_cooling_system_information.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/_core_eia923__monthly_cooling_system_information.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/_core_eia923__monthly_cooling_system_information.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
annual_average_consumption_rate_gallons_per_minute |
number |
Annual average consumption rate of cooling water |
annual_average_discharge_rate_gallons_per_minute |
number |
Annual average discharge rate of cooling water |
annual_average_withdrawal_rate_gallons_per_minute |
number |
Annual average withdrawal rate of cooling water |
annual_maximum_intake_summer_temperature_fahrenheit |
number |
Maximum cooling water temperature at intake during the summer |
annual_maximum_intake_winter_temperature_fahrenheit |
number |
Maximum cooling water temperature at intake in winter |
annual_maximum_outlet_summer_temperature_fahrenheit |
number |
Maximum cooling water temperature at outlet in summer |
annual_maximum_outlet_winter_temperature_fahrenheit |
number |
Maximum cooling water temperature at outlet in winter |
annual_total_chlorine_lbs |
number |
Amount of elemental chlorine added to cooling water annually. May be just the amount of chlorine-containing compound if schedule 9 is filled out. |
cooling_id_eia |
string |
EIA Identification code for cooling system (if multiple cooling systems are not distinguished by separate IDs, the word ‘PLANT’ is listed to encompass the cooling system for the entire plant) |
cooling_status_code |
string |
Operating status of cooling system |
cooling_type |
string |
Type of cooling system |
data_maturity |
string |
Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity. |
flow_rate_method |
string |
Calculation method for flow rates (actual or method of estimation) |
monthly_average_consumption_rate_gallons_per_minute |
number |
Monthly average consumption rate of cooling water |
monthly_average_discharge_rate_gallons_per_minute |
number |
Monthly average discharge rate of cooling water |
monthly_average_discharge_temperature_fahrenheit |
number |
Average cooling water temperature at discharge point |
monthly_average_diversion_rate_gallons_per_minute |
number |
Monthly average diversion rate of cooling water |
monthly_average_intake_temperature_fahrenheit |
number |
Average cooling water temperature at intake point |
monthly_average_withdrawal_rate_gallons_per_minute |
number |
Monthly average withdrawal rate of cooling water |
monthly_maximum_discharge_temperature_fahrenheit |
number |
Maximum cooling water temperature at discharge |
monthly_maximum_intake_temperature_fahrenheit |
number |
Maximum cooling water temperature at intake |
monthly_total_chlorine_lbs |
number |
Amount of elemental chlorine added to cooling water monthly. May be just the amount of chlorine-containing compound if schedule 9 is filled out. |
monthly_total_consumption_volume_gallons |
number |
Monthly volume of water consumed at consumption point (accurate to 0.1 million gal) |
monthly_total_cooling_hours_in_service |
integer |
Total hours the system operated during the month |
monthly_total_discharge_volume_gallons |
number |
Monthly volume of water discharged at discharge point (accurate to 0.1 million gal) |
monthly_total_diversion_volume_gallons |
number |
Monthly volume of water diverted at diversion point (accurate to 0.1 million gal) |
monthly_total_withdrawal_volume_gallons |
number |
Monthly volume of water withdrawn at withdrawal point (accurate to 0.1 million gal) |
plant_id_eia |
integer |
The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration. |
report_date |
date |
Date reported. |
temperature_method |
string |
Method for measurement of temperatures |
_core_eia923__yearly_byproduct_disposition#
Noticed any irregularities with this table? File a report.
Annual time series of by-products reported by thermoelectric power plants with total steam turbine capacity of 100 megawatts or greater.
- Most-recent data:
2024
- Processing:
Data has been cleaned but not tidied/normalized. Published only temporarily and may be removed without notice.
- Source:
EIA Form 923 – Power Plant Operations Report (Schedule 8A)
- Primary key:
plant_id_eia, report_date, byproduct_description
Usage Warnings#
This table is experimental and/or a work in progress and may change in the future.
Additional Details#
All by-products other than steam sales are reported in thousand tons and are rounded to whole numbers. Steam sales are reported in MMBTu.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/_core_eia923__yearly_byproduct_disposition.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/_core_eia923__yearly_byproduct_disposition.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/_core_eia923__yearly_byproduct_disposition.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/_core_eia923__yearly_byproduct_disposition.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
byproduct_description |
string |
Description of combustion by-product. |
byproduct_units |
string |
Reported unit of measure for combustion byproduct. MMBTU for steam, tons for all other byproducts. |
data_maturity |
string |
Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity. |
disposal_landfill_units |
number |
Disposed by-products in landfill, to the nearest hundred tons or in MMBTU for steam sales. |
disposal_offsite_units |
number |
Disposed by-products offsite, to the nearest hundred tons or in MMBTU for steam sales. |
disposal_ponds_units |
number |
Disposed by-products in ponds, to the nearest hundred tons or in MMBTU for steam sales. |
no_byproducts_to_report |
string |
Whether any combustion by-products were produced by a plant. ‘Y’ indicates no byproducts to report. The ‘Y’ and ‘N’ values do not align with expected values of reported byproducts. This column is messy and requires standardization. |
plant_id_eia |
integer |
The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration. |
report_date |
date |
Date reported. |
sold_units |
number |
Sold by-products, in tons (to the nearest 100 tons) or, for Steam, MMBTU. |
stored_offsite_units |
number |
Stored by-products offsite, to the nearest hundred tons or in MMBTU for steam sales. |
stored_onsite_units |
number |
Stored by-products onsite, to the nearest hundred tons or in MMBTU for steam sales. |
total_disposal_units |
number |
Total by-product disposal, to the nearest hundred tons or in MMBTU for steam sales. |
used_offsite_units |
number |
Used offsite by-products, to the nearest hundred tons or in MMBTU for steam sales. |
used_onsite_units |
number |
Used onsite by-products, to the nearest hundred tons or in MMBTU for steam sales. |
_core_eia923__yearly_byproduct_expenses_and_revenues#
Noticed any irregularities with this table? File a report.
Annual time series of financial information related to combustion by-products reported by thermoelectric power plants with total steam turbine capacity of 100 megawatts or greater and that produced combustion by-products during the reporting year.
- Most-recent data:
2024
- Processing:
Data has been cleaned but not tidied/normalized. Published only temporarily and may be removed without notice.
- Source:
EIA Form 923 – Power Plant Operations Report (Schedule 8B)
- Primary key:
plant_id_eia, report_date
Usage Warnings#
This table is experimental and/or a work in progress and may change in the future.
Additional Details#
Cost data must be entered for all entries on Schedule 8A. Financial information includes operational and maintenance expenditures (opex), capital expenditures (capex), and by-products sales revenue.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/_core_eia923__yearly_byproduct_expenses_and_revenues.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/_core_eia923__yearly_byproduct_expenses_and_revenues.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/_core_eia923__yearly_byproduct_expenses_and_revenues.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/_core_eia923__yearly_byproduct_expenses_and_revenues.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
capex_air_abatement |
number |
Cost of new structures and/or equipment purchased to reduce, monitor, or eliminate airborne pollutants. |
capex_other_abatement |
number |
Other amortizable expenses and purchases of new structures and or equipment when such purchases are not allocated to a particular unit or item. |
capex_solid_waste |
number |
Cost of structures or equipment purchased to collect and dispose of objectionable solids or contained liquids. |
capex_water_abatement |
number |
Cost of new structures and/or equipment purchased to reduce, monitor, or eliminate waterborne pollutants, including chlorine, phosphates, acids, bases, hydrocarbons, sewage, and other pollutants. |
data_maturity |
string |
Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity. |
opex_bottom_ash_collection |
number |
Costs of materials and labor associated with the collection of bottom ash from all sources. |
opex_bottom_ash_disposal |
number |
Costs of materials and labor associated with the disposal of bottom ash from all sources. |
opex_bottom_ash_other |
number |
Other costs associated with the collection and disposal of bottom ash. |
opex_fgd_byproduct_collection |
number |
Costs of materials and labor associated with the collection of sulfur by-product (flue gas desulfurization). |
opex_fgd_byproduct_disposal |
number |
Costs of materials and labor associated with the disposal of sulfur by-product (flue gas desulfurization). |
opex_fgd_byproduct_other |
number |
Other costs associated with the collection and disposal of sulfur by-product (flue gas desulfurization). |
opex_fly_ash_collection |
number |
Costs of materials and labor associated with the collection of fly ash from all sources. |
opex_fly_ash_disposal |
number |
Costs of materials and labor associated with the disposal of fly ash from all sources. |
opex_fly_ash_other |
number |
Other costs associated with the collection and disposal of fly ash. |
opex_other_abatement_collection |
number |
Abatement costs of by-product collection that are not allocated to a particular expenditure, e.g., costs of operating an environmental protection office. |
opex_other_abatement_disposal |
number |
Abatement costs of by-product disposal that are not allocated to a particular expenditure. |
opex_other_abatement_other |
number |
Other abatement costs that are not allocated to a particular expenditure. |
opex_total_collection_abatement |
number |
Sum of abatement costs associated with by-product collection. |
opex_total_disposal_abatement |
number |
Sum of abatement costs associated with by-product disposal. |
opex_total_other_abatement |
number |
Sum of other abatement costs associated with the collection and disposal of byproducts. |
opex_water_abatement_collection |
number |
Costs associated with the collection/abatement of water pollution, e.g., equipment operation and maintenance of pumps, pipes, and settling ponds. |
opex_water_abatement_disposal |
number |
Costs associated with the disposal of water pollutants. |
opex_water_abatement_other |
number |
Other abatement costs associated with water pollutants. |
plant_id_eia |
integer |
The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration. |
report_date |
date |
Date reported. |
revenues_bottom_ash |
number |
Revenue from the sale of bottom ash by-product. |
revenues_fgd_byproducts |
number |
Revenue from the sale of flue gas desulfurization by-product. |
revenues_fly_ash |
number |
Revenue from the sale of fly ash by-product. |
revenues_fly_bottom_ash_intermingled |
number |
Revenue from the sale of intermingled fly and bottom ash by-product. |
revenues_other_byproducts |
number |
Revenue from the sale of other by-products. |
revenues_total_byproduct |
number |
Total revenue from the sale of by-products. |
_core_eia923__yearly_emissions_control#
Noticed any irregularities with this table? File a report.
Annual time series of Actual rate and removal efficiency for air emissions reported by thermoelectric or combined-cycle power plants with a total steam turbine capacity greater than or equal to 10 megawatts.
- Most-recent data:
2024
- Processing:
Data has been cleaned but not tidied/normalized. Published only temporarily and may be removed without notice.
- Source:
EIA Form 923 – Power Plant Operations Report (Schedule 8C)
- Primary key:
This table has no primary key. This table is not yet normalized, and contains information about several different types of emissions control equipment, each of which has its own ID, and not all of which are present at every plant. As a result there is currently no unique, non-null primary key. Conceptually, the primary key should consist of plant_id_eia, report_date, mercury_control_id_eia, nox_control_id_eia, so2_control_id_eia, and particulate_control_id_eia. In practice many records have NULL values for all of the control equipment IDs.
Usage Warnings#
This table is experimental and/or a work in progress and may change in the future.
Additional Details#
In the raw data, the so2_test_date and particulate_test_date columns contained a wide variety of non-standard date formats. They have been standardized to ISO-8601 (YYYY-MM-DD) where possible. Missing days and months have been filled in with 01. A handful of manual spot-fixes have been applied to particularly strange formats. Two-digit years have been assumed to fall between 1950 and the last year observed in the report_date column.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/_core_eia923__yearly_emissions_control.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/_core_eia923__yearly_emissions_control.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/_core_eia923__yearly_emissions_control.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/_core_eia923__yearly_emissions_control.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
acid_gas_removal_efficiency |
number |
Removal efficiency for acid gas emissions. Ranges from 0 to 1. |
annual_nox_emission_rate_lb_per_mmbtu |
number |
Actual controlled (or uncontrolled) nitrogen oxides emission rate. Based on data from CEMS where possible. |
data_maturity |
string |
Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity. |
environmental_equipment_name |
string |
Type of equipment or strategy for the control of air emissions. |
fgd_electricity_consumption_mwh |
number |
Electric power consumed by the flue gas desulfurization unit (in MWh). |
fgd_sorbent_consumption_tons |
number |
Quantity of flue gas desulfurization sorbent used, to the nearest 100 ton. |
hours_in_service |
integer |
Total hours the emissions control was in service during the reporting year, rounded to the nearest hour. |
mercury_control_id_eia |
string |
Mercury control identification number. This ID is not a unique identifier. |
mercury_emission_rate_lb_per_trillion_btu |
number |
Actual controlled (or uncontrolled) mercury emission rate, based on data from CEMS, where possible. |
mercury_removal_efficiency |
number |
Removal efficiency for mercury emissions. Ranges from 0 to 1. |
nox_control_id_eia |
string |
Nitrogen oxide control identification number. This ID is not a unique identifier. |
operational_status |
string |
The operating status of the asset. For generators this is based on which tab the generator was listed in in EIA 860. |
ozone_season_nox_emission_rate_lb_per_mmbtu |
number |
Actual controlled (or uncontrolled) nitrogen oxides emission rate during the ozone season (May to September) |
particulate_control_id_eia |
string |
Particulate matter control identification number. This ID is not a unique identifier. |
particulate_emission_rate_lb_per_mmbtu |
number |
Average annual emission removal rate for particulate matter. |
particulate_removal_efficiency_annual |
number |
Particulate removal efficiency, based on the annual operating factor, which is defined as annual fuel consumption (MMBTU) divided by the product of the boiler design firing rate (MMBTU per hour) and hours of operation per year.When actual data are not available, estimates are provided based on equipment design performance specifications. Ranges from 0 to 1. |
particulate_removal_efficiency_tested |
number |
The tested efficiency for the removal of particulate matter at 100 percent load. If not tested at 100 percent load, then the load at which the test was conducted is included as a comment on Schedule 9. If no test was conducted, the test date and tested efficiency field should be blank. Ranges from 0 to 1. |
particulate_test_date |
date |
Date of the latest efficiency test for the removal of particulate matter. |
plant_id_eia |
integer |
The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration. |
report_date |
date |
Date reported. |
so2_control_id_eia |
string |
Sulfur dioxide control identification number. This ID is not a unique identifier. |
so2_removal_efficiency_annual |
number |
Sulfur dioxide removal efficiency, based on the annual operating factor, which is defined as annual fuel consumption (MMBTU) divided by the product of the boiler design firing rate (MMBTU per hour) and hours of operation per year.When actual data are not available, estimates are provided based on equipment design performance specifications. Ranges from 0 to 1. |
so2_removal_efficiency_tested |
number |
The tested efficiency for the removal of sulfur dioxide at 100 percent load. If not tested at 100 percent load, then the load at which the test was conducted is included as a comment on Schedule 9. If no test was conducted, the test date and tested efficiency field should be blank. Ranges from 0 to 1. |
so2_test_date |
date |
Date of most recent test for sulfur dioxide removal efficiency. |
_core_eia923__yearly_fgd_operation_maintenance#
Noticed any irregularities with this table? File a report.
Annual time series of flue gas desulfurization (FGD) operation and maintenance expenses at generation facilities.
- Most-recent data:
2024
- Processing:
Data has been cleaned but not tidied/normalized. Published only temporarily and may be removed without notice.
- Source:
EIA Form 923 – Power Plant Operations Report (Schedule 8C)
- Primary key:
plant_id_eia, report_date, so2_control_id_eia
Usage Warnings#
Some years use a slightly different data definition.
Additional Details#
From 2008-2011 this table also reported operational characteristics that are now reported in 923 Air Emissions Control Info.
Note: This table has been cleaned, but not harvested with other EIA 923 or 860 data. The same variables present in this table may show up in other _core tables in other years. Once this table has been harvested, it will be removed from the PUDL database.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/_core_eia923__yearly_fgd_operation_maintenance.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/_core_eia923__yearly_fgd_operation_maintenance.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/_core_eia923__yearly_fgd_operation_maintenance.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/_core_eia923__yearly_fgd_operation_maintenance.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
data_maturity |
string |
Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity. |
fgd_control_flag |
boolean |
Indicates whether or not a plant has a flue gas desulfurization control unit. |
fgd_electricity_consumption_mwh |
number |
Electric power consumed by the flue gas desulfurization unit (in MWh). |
fgd_hours_in_service |
integer |
Number of hours the flue gas desulfurization equipment was in operation during the year. |
fgd_operational_status_code |
string |
Operating status code for flue gas desulfurization equipment. |
fgd_sorbent_consumption_tons |
number |
Quantity of flue gas desulfurization sorbent used, to the nearest 100 ton. |
opex_fgd_feed_materials_chemical |
integer |
Annual operation and maintenance expenditures for feed materials and chemicals for flue gas desulfurization equipment, excluding electricity. |
opex_fgd_labor_supervision |
integer |
Annual operation and maintenance expenditures for labor and supervision of flue gas desulfurization equipment, excluding electricity. |
opex_fgd_land_acquisition |
integer |
Annual operation and maintenance expenditures for land acquisition for flue gas desulfurization equipment, excluding electricity. |
opex_fgd_maintenance_material_other |
integer |
Annual operation and maintenance expenditures for maintenance, materials and all other costs of flue gas desulfurization equipment, excluding electricity |
opex_fgd_total_cost |
integer |
Annual total cost of operation and maintenance expenditures on flue gas desulfurization equipment, excluding electricity |
opex_fgd_waste_disposal |
integer |
Annual operation and maintenance expenditures for waste disposal, excluding electricity. |
plant_id_eia |
integer |
The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration. |
report_date |
date |
Date reported. |
so2_control_id_eia |
string |
Sulfur dioxide control identification number. This ID is not a unique identifier. |
so2_removal_efficiency_annual |
number |
Sulfur dioxide removal efficiency, based on the annual operating factor, which is defined as annual fuel consumption (MMBTU) divided by the product of the boiler design firing rate (MMBTU per hour) and hours of operation per year.When actual data are not available, estimates are provided based on equipment design performance specifications. Ranges from 0 to 1. |
so2_removal_efficiency_tested |
number |
The tested efficiency for the removal of sulfur dioxide at 100 percent load. If not tested at 100 percent load, then the load at which the test was conducted is included as a comment on Schedule 9. If no test was conducted, the test date and tested efficiency field should be blank. Ranges from 0 to 1. |
so2_test_date |
date |
Date of most recent test for sulfur dioxide removal efficiency. |
_core_phmsagas__yearly_distribution_by_install_decade#
Noticed any irregularities with this table? File a report.
Annual time series of miles of mains and the number of services in operation at the end of the year by install decade.
- Most-recent data:
2024
- Processing:
Data has been cleaned but not tidied/normalized. Published only temporarily and may be removed without notice.
- Source:
Pipelines and Hazardous Materials Safety Administration (PHMSA) Annual Natural Gas Report (Part B - System Description / Section 4)
- Primary key:
report_id, report_date, operator_id_phmsa, operating_state, install_decade
Usage Warnings#
This table has been concatenated across all years and re-organized into a logical structure, but the data has not been fully cleaned. Except some inconsistent units, data types and values over the years of reported data. Once fully cleaned, this table will be deprecated and replaced with a core table.
Some columns contain subtotals; use caution when choosing columns to aggregate.
Additional Details#
The records with an install decade of total_decade are a total - beware of aggregating these values.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/_core_phmsagas__yearly_distribution_by_install_decade.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/_core_phmsagas__yearly_distribution_by_install_decade.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/_core_phmsagas__yearly_distribution_by_install_decade.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/_core_phmsagas__yearly_distribution_by_install_decade.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
commodity |
string |
The type of gas delivered by the distribution pipeline. |
install_decade |
string |
The decade the distribution pipeline was installed. |
mains_miles |
number |
The miles of mains distribution pipeline. |
operating_state |
string |
State that the distribution utility is reporting for. Prior to 2004, this may be a list of states. |
operator_id_phmsa |
integer |
PHMSA unique operator ID. A value of zero represents an unknown operator ID. |
report_date |
date |
Date reported. |
report_id |
integer |
Report number of the PHMSA Gas utility submission. |
services |
number |
Number of services in system at end of year. |
_core_phmsagas__yearly_distribution_by_material#
Noticed any irregularities with this table? File a report.
Annual time series of miles of mains and the number of services in operation at the end of the year by material for each gas distribution operator.
- Most-recent data:
2024
- Processing:
Data has been cleaned but not tidied/normalized. Published only temporarily and may be removed without notice.
- Source:
Pipelines and Hazardous Materials Safety Administration (PHMSA) Annual Natural Gas Report (Part B - System Description / Section 1 - General)
- Primary key:
This table has no primary key. We expect the primary key for this table should be report_id, operator_id_phmsa, operating_state and material. However, there are nulls in the operating_state across several years of reporting.
Usage Warnings#
This table has been concatenated across all years and re-organized into a logical structure, but the data has not been fully cleaned. Except some inconsistent units, data types and values over the years of reported data. Once fully cleaned, this table will be deprecated and replaced with a core table.
Some columns contain subtotals; use caution when choosing columns to aggregate.
The categories of material types have changed slightly over the years (ex: cast and wrought iron were broken up in two categories before 1984).
Beginning in 2004, companies file one report per state. The operating_state column has not been normalized and may contain more than one state in earlier years of data.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/_core_phmsagas__yearly_distribution_by_material.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/_core_phmsagas__yearly_distribution_by_material.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/_core_phmsagas__yearly_distribution_by_material.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/_core_phmsagas__yearly_distribution_by_material.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
commodity |
string |
The type of gas delivered by the distribution pipeline. |
mains_miles |
number |
The miles of mains distribution pipeline. |
material |
string |
The material of the gas distribution pipe. The categories of material types have changed slightly over the years (ex: cast and wrought iron were broken up in two categories before 1984). |
operating_state |
string |
State that the distribution utility is reporting for. Prior to 2004, this may be a list of states. |
operator_id_phmsa |
integer |
PHMSA unique operator ID. A value of zero represents an unknown operator ID. |
report_date |
date |
Date reported. |
report_id |
integer |
Report number of the PHMSA Gas utility submission. |
services |
number |
Number of services in system at end of year. |
_core_phmsagas__yearly_distribution_by_material_and_size#
Noticed any irregularities with this table? File a report.
Annual time series of miles of mains and the number of services in operation at the end of the year by material and size of pipe.
- Most-recent data:
2024
- Processing:
Data has been cleaned but not tidied/normalized. Published only temporarily and may be removed without notice.
- Source:
Pipelines and Hazardous Materials Safety Administration (PHMSA) Annual Natural Gas Report (Part B - System Description / Section 3)
- Primary key:
This table has no primary key. We expect the primary key for this table should be report_id, operator_id_phmsa, operating_state, main_size and material. There are nulls in the operating_state across several years of reporting.
Usage Warnings#
This table has been concatenated across all years and re-organized into a logical structure, but the data has not been fully cleaned. Except some inconsistent units, data types and values over the years of reported data. Once fully cleaned, this table will be deprecated and replaced with a core table.
Some columns contain subtotals; use caution when choosing columns to aggregate.
The size ranges in main_size have changed slightly over the years (ex: before 1984 they reported 0.5_in_or_less whereas after they reported 1_in_or_less)
The categories of material types have changed slightly over the years (ex: cast and wrought iron were broken up in two categories before 1984).
Beginning in 2004, companies file one report per state. The operating_state column has not been normalized and may contain more than one state in earlier years of data.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/_core_phmsagas__yearly_distribution_by_material_and_size.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/_core_phmsagas__yearly_distribution_by_material_and_size.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/_core_phmsagas__yearly_distribution_by_material_and_size.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/_core_phmsagas__yearly_distribution_by_material_and_size.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
commodity |
string |
The type of gas delivered by the distribution pipeline. |
main_other_material_detail |
string |
A free-form text field containing notes about the other material type. This column should only contain values in it for rows with other as the material type listed. |
main_size |
string |
Size range of mains. The size ranges have changed slightly over the years (ex: before 1984 they reported 0.5_in_or_less whereas after they reported 1_in_or_less). |
mains_miles |
number |
The miles of mains distribution pipeline. |
material |
string |
The material of the gas distribution pipe. The categories of material types have changed slightly over the years (ex: cast and wrought iron were broken up in two categories before 1984). |
operating_state |
string |
State that the distribution utility is reporting for. Prior to 2004, this may be a list of states. |
operator_id_phmsa |
integer |
PHMSA unique operator ID. A value of zero represents an unknown operator ID. |
report_date |
date |
Date reported. |
report_id |
integer |
Report number of the PHMSA Gas utility submission. |
services |
number |
Number of services in system at end of year. |
_core_phmsagas__yearly_distribution_excavation_damages#
Noticed any irregularities with this table? File a report.
Annual time series of excavation damages from various sources.
- Most-recent data:
2024
- Processing:
Data has been cleaned but not tidied/normalized. Published only temporarily and may be removed without notice.
- Source:
Pipelines and Hazardous Materials Safety Administration (PHMSA) Annual Natural Gas Report (Part D - Excavation Damage)
- Primary key:
report_id, damage_type, damage_sub_type
Usage Warnings#
This table has been concatenated across all years and re-organized into a logical structure, but the data has not been fully cleaned. Except some inconsistent units, data types and values over the years of reported data. Once fully cleaned, this table will be deprecated and replaced with a core table.
Some columns contain subtotals; use caution when choosing columns to aggregate.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/_core_phmsagas__yearly_distribution_excavation_damages.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/_core_phmsagas__yearly_distribution_excavation_damages.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/_core_phmsagas__yearly_distribution_excavation_damages.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/_core_phmsagas__yearly_distribution_excavation_damages.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
commodity |
string |
The type of gas delivered by the distribution pipeline. |
damage_sub_type |
string |
A sub-category of damage_type of excavation damage causes. |
damage_type |
string |
A high level category of excavation damage causes. |
damages |
number |
Number of instances of excavation damage. |
operating_state |
string |
State that the distribution utility is reporting for. Prior to 2004, this may be a list of states. |
operator_id_phmsa |
integer |
PHMSA unique operator ID. A value of zero represents an unknown operator ID. |
report_date |
date |
Date reported. |
report_id |
integer |
Report number of the PHMSA Gas utility submission. |
_core_phmsagas__yearly_distribution_filings#
Noticed any irregularities with this table? File a report.
Annual time series of filings (aka submissions) from gas distribution system operators.
- Most-recent data:
2024
- Processing:
Data has been cleaned but not tidied/normalized. Published only temporarily and may be removed without notice.
- Source:
Pipelines and Hazardous Materials Safety Administration (PHMSA) Annual Natural Gas Report
- Primary key:
report_id, report_date, operator_id_phmsa
Usage Warnings#
This table has been concatenated across all years and re-organized into a logical structure, but the data has not been fully cleaned. Except some inconsistent units, data types and values over the years of reported data. Once fully cleaned, this table will be deprecated and replaced with a core table.
Additional Details#
This table contains information about the filer and filing type. This includes information about who filed but also whether this was an original filing or a correction.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/_core_phmsagas__yearly_distribution_filings.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/_core_phmsagas__yearly_distribution_filings.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/_core_phmsagas__yearly_distribution_filings.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/_core_phmsagas__yearly_distribution_filings.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
data_date |
date |
When the data source was last updated. |
filing_correction_date |
date |
Date when a correction filing was submitted. |
filing_date |
date |
Date on which the filing was submitted. |
form_revision_id |
string |
PHMSA form revision identifier. |
initial_filing_date |
date |
Initial date when filing was originally submitted. |
operator_id_phmsa |
integer |
PHMSA unique operator ID. A value of zero represents an unknown operator ID. |
preparer_email |
string |
Email address of representative who filed report. |
preparer_fax |
string |
Fax number of representative who filed report. |
preparer_name |
string |
Name of representative who filed report. |
preparer_phone |
string |
Phone number of representative who filed report. |
preparer_title |
string |
Title of representative who filed report. |
report_date |
date |
Date reported. |
report_filing_type |
string |
Type of report submitted, either Initial or Supplemental. |
report_id |
integer |
Report number of the PHMSA Gas utility submission. |
_core_phmsagas__yearly_distribution_leaks#
Noticed any irregularities with this table? File a report.
Annual time series of total and hazardous leaks eliminated or repaired during the report year.
- Most-recent data:
2024
- Processing:
Data has been cleaned but not tidied/normalized. Published only temporarily and may be removed without notice.
- Source:
Pipelines and Hazardous Materials Safety Administration (PHMSA) Annual Natural Gas Report (Part C)
- Primary key:
This table has no primary key. We expect the primary key for this table should be report_id, operator_id_phmsa, operating_state, leak_severity and leak_source. There are nulls in the operating_state across several years of reporting.
Usage Warnings#
This table has been concatenated across all years and re-organized into a logical structure, but the data has not been fully cleaned. Except some inconsistent units, data types and values over the years of reported data. Once fully cleaned, this table will be deprecated and replaced with a core table.
Some columns contain subtotals; use caution when choosing columns to aggregate.
Beginning in 2004, companies file one report per state. The operating_state column has not been normalized and may contain more than one state in earlier years of data.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/_core_phmsagas__yearly_distribution_leaks.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/_core_phmsagas__yearly_distribution_leaks.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/_core_phmsagas__yearly_distribution_leaks.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/_core_phmsagas__yearly_distribution_leaks.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
commodity |
string |
The type of gas delivered by the distribution pipeline. |
leak_severity |
string |
Whether or not the leak described in this record are all leaks or hazardous leaks. |
leak_source |
string |
The cause of the leaks. |
mains |
number |
The number of mains distribution pipeline. |
operating_state |
string |
State that the distribution utility is reporting for. Prior to 2004, this may be a list of states. |
operator_id_phmsa |
integer |
PHMSA unique operator ID. A value of zero represents an unknown operator ID. |
report_date |
date |
Date reported. |
report_id |
integer |
Report number of the PHMSA Gas utility submission. |
services |
number |
Number of services in system at end of year. |
_core_phmsagas__yearly_distribution_misc#
Noticed any irregularities with this table? File a report.
Annual time series of miscellaneous distribution information.
- Most-recent data:
2024
- Processing:
Data has been cleaned but not tidied/normalized. Published only temporarily and may be removed without notice.
- Source:
Pipelines and Hazardous Materials Safety Administration (PHMSA) Annual Natural Gas Report (Part B & C)
- Primary key:
This table has no primary key. We expect the primary key for this table should be report_id, operator_id_phmsa, and operating_state. There are nulls in the operating_state across several years of reporting.
Usage Warnings#
This table has been concatenated across all years and re-organized into a logical structure, but the data has not been fully cleaned. Except some inconsistent units, data types and values over the years of reported data. Once fully cleaned, this table will be deprecated and replaced with a core table.
Beginning in 2004, companies file one report per state. The operating_state column has not been normalized and may contain more than one state in earlier years of data.
Access methods#
import pandas as pd
df = pd.read_parquet(
"s3://pudl.catalyst.coop/nightly/_core_phmsagas__yearly_distribution_misc.parquet",
dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
"s3://pudl.catalyst.coop/nightly/_core_phmsagas__yearly_distribution_misc.parquet",
storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/_core_phmsagas__yearly_distribution_misc.parquet';
library(arrow)
df <- read_parquet(
"s3://pudl.catalyst.coop/nightly/_core_phmsagas__yearly_distribution_misc.parquet"
)
Columns#
Field Name |
Type |
Description |
|---|---|---|
all_known_leaks_scheduled_for_repair |
number |
The number of known system leaks at the end of the report year scheduled for repair. |
all_known_leaks_scheduled_for_repair_main |
number |
The number of known leaks on main at the end of the report year scheduled for repair. |
average_service_length_feet |
number |
The average system service length in feet. |
excavation_tickets |
integer |
Number of Excavation Tickets received by the operator during the year, (i.e., receipt of information by the operator from the notification center). |
federal_land_leaks_repaired_or_scheduled |
integer |
Total number of leaks repaired, eliminated, or scheduled for repair on federal land during the reporting year. |
hazardous_leaks_mechanical_joint_failure |
number |
The total number of hazardous leaks caused by a mechanical joint failure. |
operating_state |
string |
State that the distribution utility is reporting for. Prior to 2004, this may be a list of states. |
operator_id_phmsa |
integer |
PHMSA unique operator ID. A value of zero represents an unknown operator ID. |
report_date |
date |
Date reported. |
report_id |
integer |
Report number of the PHMSA Gas utility submission. |
services_efv_in_system |
integer |
Estimated number of services with Excess Flow Valve in the system at end of reported year related to natural gas distribution. |
services_efv_installed |
integer |
Total number of services with Excess Flow Valve installed during reported year related to natural gas distribution. |
services_shutoff_valve_in_system |
integer |
Estimated number of services with manual service line shut-off valves installed in the system at end of report year related to natural gas distribution. |
services_shutoff_valve_installed |
integer |
Total number of manual service line shut-off valves installed during reported year related to natural gas distribution. |
unaccounted_for_gas_fraction |
number |
Unaccounted for gas as a fraction of total consumption for the 12 months ending June 30 of the reporting year. Calculated as follows: Take the sum of: (purchased gas + produced gas) minus (customer use + company use + appropriate adjustments). Then divide by the sum of (customer use + company use + appropriate adjustments). Prior to 2017, this field was calculated with a different deonominator (purchased gas + produced gas). The time period between 2010-2017 having this different calculation method ensured that there was no records that had a negative fraction. For all the other reporting years there are known and expected negative values in this column. |