PUDL Data Dictionary#

The following data tables have been cleaned and transformed by our ETL process.

out_censusdp1tract__counties#

Noticed any irregularities with this table? File a report.

County-level summarized US Census demographic information, along with a geometry describing county boundaries.

Most-recent data:

2010

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

Census DP1 – Profile of General Demographic Characteristics

Primary key:

county_id_fips

Usage Warnings#

  • The original data is no longer being collected or reported in this way.

Additional Details#

Based on the 2010 Census. Used as a convenient source of geospatial information within PUDL, but should not be treated as authoritative.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_censusdp1tract__counties.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_censusdp1tract__counties.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_censusdp1tract__counties.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_censusdp1tract__counties.parquet"
)

Columns#

Field Name

Type

Description

county

string

County name.

county_id_fips

string

County ID from the Federal Information Processing Standard Publication 6-4.

dp0010001

integer

DPSF1. Sex and age - Total population

dp0010002

integer

DPSF1. Sex and age - Total population under 5 years

dp0010003

integer

DPSF1. Sex and age - Total population 5 to 9 years

dp0010004

integer

DPSF1. Sex and age - Total population 10 to 14 years

dp0010005

integer

DPSF1. Sex and age - Total population 15 to 19 years

dp0010006

integer

DPSF1. Sex and age - Total population 20 to 24 years

dp0010007

integer

DPSF1. Sex and age - Total population 25 to 29 years

dp0010008

integer

DPSF1. Sex and age - Total population 30 to 34 years

dp0010009

integer

DPSF1. Sex and age - Total population 35 to 39 years

dp0010010

integer

DPSF1. Sex and age - Total population 40 to 44 years

dp0010011

integer

DPSF1. Sex and age - Total population 45 to 49 years

dp0010012

integer

DPSF1. Sex and age - Total population 50 to 54 years

dp0010013

integer

DPSF1. Sex and age - Total population 55 to 59 years

dp0010014

integer

DPSF1. Sex and age - Total population 60 to 64 years

dp0010015

integer

DPSF1. Sex and age - Total population 65 to 69 years

dp0010016

integer

DPSF1. Sex and age - Total population 70 to 74 years

dp0010017

integer

DPSF1. Sex and age - Total population 75 to 79 years

dp0010018

integer

DPSF1. Sex and age - Total population 80 to 84 years

dp0010019

integer

DPSF1. Sex and age - Total population 85 years and over

dp0010020

integer

DPSF1. Sex and age - Male population total

dp0010021

integer

DPSF1. Sex and age - Male population under 5 years

dp0010022

integer

DPSF1. Sex and age - Male population 5 to 9 years

dp0010023

integer

DPSF1. Sex and age - Male population 10 to 14 years

dp0010024

integer

DPSF1. Sex and age - Male population 15 to 19 years

dp0010025

integer

DPSF1. Sex and age - Male population 20 to 24 years

dp0010026

integer

DPSF1. Sex and age - Male population 25 to 29 years

dp0010027

integer

DPSF1. Sex and age - Male population 30 to 34 years

dp0010028

integer

DPSF1. Sex and age - Male population 35 to 39 years

dp0010029

integer

DPSF1. Sex and age - Male population 40 to 44 years

dp0010030

integer

DPSF1. Sex and age - Male population 45 to 49 years

dp0010031

integer

DPSF1. Sex and age - Male population 50 to 54 years

dp0010032

integer

DPSF1. Sex and age - Male population 55 to 59 years

dp0010033

integer

DPSF1. Sex and age - Male population 60 to 64 years

dp0010034

integer

DPSF1. Sex and age - Male population 65 to 69 years

dp0010035

integer

DPSF1. Sex and age - Male population 70 to 74 years

dp0010036

integer

DPSF1. Sex and age - Male population 75 to 79 years

dp0010037

integer

DPSF1. Sex and age - Male population 80 to 84 years

dp0010038

integer

DPSF1. Sex and age - Male population 85 years and over

dp0010039

integer

DPSF1. Sex and age - Female population total

dp0010040

integer

DPSF1. Sex and age - Female population under 5 years

dp0010041

integer

DPSF1. Sex and age - Female population 5 to 9 years

dp0010042

integer

DPSF1. Sex and age - Female population 10 to 14 years

dp0010043

integer

DPSF1. Sex and age - Female population 15 to 19 years

dp0010044

integer

DPSF1. Sex and age - Female population 20 to 24 years

dp0010045

integer

DPSF1. Sex and age - Female population 25 to 29 years

dp0010046

integer

DPSF1. Sex and age - Female population 30 to 34 years

dp0010047

integer

DPSF1. Sex and age - Female population 35 to 39 years

dp0010048

integer

DPSF1. Sex and age - Female population 40 to 44 years

dp0010049

integer

DPSF1. Sex and age - Female population 45 to 49 years

dp0010050

integer

DPSF1. Sex and age - Female population 50 to 54 years

dp0010051

integer

DPSF1. Sex and age - Female population 55 to 59 years

dp0010052

integer

DPSF1. Sex and age - Female population 60 to 64 years

dp0010053

integer

DPSF1. Sex and age - Female population 65 to 69 years

dp0010054

integer

DPSF1. Sex and age - Female population 70 to 74 years

dp0010055

integer

DPSF1. Sex and age - Female population 75 to 79 years

dp0010056

integer

DPSF1. Sex and age - Female population 80 to 84 years

dp0010057

integer

DPSF1. Sex and age - Female population 85 years and over

dp0020001

number

DPSF2. Median age by sex - Both sexes

dp0020002

number

DPSF2. Median age by sex - Male

dp0020003

number

DPSF2. Median age by sex - Female

dp0030001

integer

DPSF3. Sex for population 16 years and over - Total

dp0030002

integer

DPSF3. Sex for population 16 years and over - Male

dp0030003

integer

DPSF3. Sex for population 16 years and over - Female

dp0040001

integer

DPSF4. Sex for population 18 years and over - Total

dp0040002

integer

DPSF4. Sex for population 18 years and over - Male

dp0040003

integer

DPSF4. Sex for population 18 years and over - Female

dp0050001

integer

DPSF5. Sex for population 21 years and over - Total

dp0050002

integer

DPSF5. Sex for population 21 years and over - Male

dp0050003

integer

DPSF5. Sex for population 21 years and over - Female

dp0060001

integer

DPSF6. Sex for population 62 years and over - Total

dp0060002

integer

DPSF6. Sex for population 62 years and over - Male

dp0060003

integer

DPSF6. Sex for population 62 years and over - Female

dp0070001

integer

DPSF7. Sex for population 65 years and over - Total

dp0070002

integer

DPSF7. Sex for population 65 years and over - Male

dp0070003

integer

DPSF7. Sex for population 65 years and over - Female

dp0080001

integer

DPSF8. Race - Total population

dp0080002

integer

DPSF8. Race - Population of one race

dp0080003

integer

DPSF8. Race - White

dp0080004

integer

DPSF8. Race - Black or African American

dp0080005

integer

DPSF8. Race - American Indian and Alaska Native

dp0080006

integer

DPSF8. Race - Asian total

dp0080007

integer

DPSF8. Race - Asian Indian

dp0080008

integer

DPSF8. Race - Chinese

dp0080009

integer

DPSF8. Race - Filipino

dp0080010

integer

DPSF8. Race - Japanese

dp0080011

integer

DPSF8. Race - Korean

dp0080012

integer

DPSF8. Race - Vietnamese

dp0080013

integer

DPSF8. Race - Other Asian

dp0080014

integer

DPSF8. Race - Native Hawaiian and Other Pacific Islander total

dp0080015

integer

DPSF8. Race - Native Hawaiian

dp0080016

integer

DPSF8. Race - Guamanian or Chamorro

dp0080017

integer

DPSF8. Race - Samoan

dp0080018

integer

DPSF8. Race - Other Pacific Islander

dp0080019

integer

DPSF8. Race - Some Other Race

dp0080020

integer

DPSF8. Race - Population of Two or More Races

dp0080021

integer

DPSF8. Race - White; American Indian and Alaska Native

dp0080022

integer

DPSF8. Race - White; Asian

dp0080023

integer

DPSF8. Race - White; Black or African American

dp0080024

integer

DPSF8. Race - White; Some Other Race

dp0090001

integer

DPSF9. Race (total races tallied) - White alone or in combination with one or more other races

dp0090002

integer

DPSF9. Race (total races tallied) - Black or African American alone or in combination with one or more other races

dp0090003

integer

DPSF9. Race (total races tallied) - American Indian and Alaska Native alone or in combination with one or more other races

dp0090004

integer

DPSF9. Race (total races tallied) - Asian alone or in combination with one or more other races

dp0090005

integer

DPSF9. Race (total races tallied) - Native Hawaiian and Other Pacific Islander alone or in combination with one or more other races

dp0090006

integer

DPSF9. Race (total races tallied) - Some Other Race alone or in combination with one or more other races

dp0100001

integer

DPSF10. Hispanic or Latino by specific origin - Total population

dp0100002

integer

DPSF10. Hispanic or Latino by specific origin - Hispanic or Latino (of any race)

dp0100003

integer

DPSF10. Hispanic or Latino by specific origin - Mexican

dp0100004

integer

DPSF10. Hispanic or Latino by specific origin - Puerto Rican

dp0100005

integer

DPSF10. Hispanic or Latino by specific origin - Cuban

dp0100006

integer

DPSF10. Hispanic or Latino by specific origin - Other Hispanic or Latino

dp0100007

integer

DPSF10. Hispanic or Latino by specific origin - Not Hispanic or Latino

dp0110001

integer

DPSF11. Hispanic or Latino and race - Total population

dp0110002

integer

DPSF11. Hispanic or Latino and race - Hispanic or Latino total

dp0110003

integer

DPSF11. Hispanic or Latino and race - Hispanic or Latino: White alone

dp0110004

integer

DPSF11. Hispanic or Latino and race - Hispanic or Latino: Black or African American alone

dp0110005

integer

DPSF11. Hispanic or Latino and race - Hispanic or Latino: American Indian and Alaska Native alone

dp0110006

integer

DPSF11. Hispanic or Latino and race - Hispanic or Latino: Asian alone

dp0110007

integer

DPSF11. Hispanic or Latino and race - Hispanic or Latino: Native Hawaiian and Other Pacific Islander alone

dp0110008

integer

DPSF11. Hispanic or Latino and race - Hispanic or Latino: Some Other Race alone

dp0110009

integer

DPSF11. Hispanic or Latino and race - Hispanic or Latino: Two or More Races

dp0110010

integer

DPSF11. Hispanic or Latino and race - Not Hispanic or Latino total

dp0110011

integer

DPSF11. Hispanic or Latino and race - Not Hispanic or Latino: White alone

dp0110012

integer

DPSF11. Hispanic or Latino and race - Not Hispanic or Latino: Black or African American alone

dp0110013

integer

DPSF11. Hispanic or Latino and race - Not Hispanic or Latino: American Indian and Alaska Native alone

dp0110014

integer

DPSF11. Hispanic or Latino and race - Not Hispanic or Latino: Asian alone

dp0110015

integer

DPSF11. Hispanic or Latino and race - Not Hispanic or Latino: Native Hawaiian and Other Pacific Islander alone

dp0110016

integer

DPSF11. Hispanic or Latino and race - Not Hispanic or Latino: Some Other Race alone

dp0110017

integer

DPSF11. Hispanic or Latino and race - Not Hispanic or Latino: Two or More Races

dp0120001

integer

DPSF12. Relationship - Total population

dp0120002

integer

DPSF12. Relationship - In households

dp0120003

integer

DPSF12. Relationship - Householder

dp0120004

integer

DPSF12. Relationship - Spouse

dp0120005

integer

DPSF12. Relationship - Child

dp0120006

integer

DPSF12. Relationship - Own child under 18 years

dp0120007

integer

DPSF12. Relationship - Other relatives

dp0120008

integer

DPSF12. Relationship - Other relatives under 18 years

dp0120009

integer

DPSF12. Relationship - Other relatives 65 years and over

dp0120010

integer

DPSF12. Relationship - Nonrelatives

dp0120011

integer

DPSF12. Relationship - Nonrelatives under 18 years

dp0120012

integer

DPSF12. Relationship - Nonrelatives 65 years and over

dp0120013

integer

DPSF12. Relationship - Unmarried partner

dp0120014

integer

DPSF12. Relationship - In group quarters

dp0120015

integer

DPSF12. Relationship - Institutionalized population

dp0120016

integer

DPSF12. Relationship - Institutionalized population: Male

dp0120017

integer

DPSF12. Relationship - Institutionalized population: Female

dp0120018

integer

DPSF12. Relationship - Noninstitutionalized population

dp0120019

integer

DPSF12. Relationship - Noninstitutionalized population: Male

dp0120020

integer

DPSF12. Relationship - Noninstitutionalized population: Female

dp0130001

integer

DPSF13. Households by type - Total households

dp0130002

integer

DPSF13. Households by type - Family households (families)

dp0130003

integer

DPSF13. Households by type - Family households with own children under 18 years

dp0130004

integer

DPSF13. Households by type - Husband-wife family

dp0130005

integer

DPSF13. Households by type - Husband-wife family with own children under 18 years

dp0130006

integer

DPSF13. Households by type - Male householder, no wife present

dp0130007

integer

DPSF13. Households by type - Male householder, no wife present, with own children under 18 years

dp0130008

integer

DPSF13. Households by type - Female householder, no husband present

dp0130009

integer

DPSF13. Households by type - Female householder, no husband present, with own children under 18 years

dp0130010

integer

DPSF13. Households by type - Nonfamily households

dp0130011

integer

DPSF13. Households by type - Householder living alone

dp0130012

integer

DPSF13. Households by type - Householder living alone: Male

dp0130013

integer

DPSF13. Households by type - Householder living alone: Male 65 years and over

dp0130014

integer

DPSF13. Households by type - Householder living alone: Female

dp0130015

integer

DPSF13. Households by type - Householder living alone: Female 65 years and over

dp0140001

integer

DPSF14. Households with individuals under 18 years - Total

dp0150001

integer

DPSF15. Households with individuals 65 years and over - Total

dp0160001

number

DPSF16. Average household size - Average household size

dp0170001

number

DPSF17. Average family size - Average family size

dp0180001

integer

DPSF18. Housing occupancy - Total housing units

dp0180002

integer

DPSF18. Housing occupancy - Occupied housing units

dp0180003

integer

DPSF18. Housing occupancy - Vacant housing units

dp0180004

integer

DPSF18. Housing occupancy - Vacant housing units for rent

dp0180005

integer

DPSF18. Housing occupancy - Vacant housing units rented, not occupied

dp0180006

integer

DPSF18. Housing occupancy - Vacant housing units for sale only

dp0180007

integer

DPSF18. Housing occupancy - Vacant housing units sold, not occupied

dp0180008

integer

DPSF18. Housing occupancy - Vacant housing units for seasonal, recreational, or occasional use

dp0180009

integer

DPSF18. Housing occupancy - All other vacant housing units

dp0190001

number

DPSF19. Homeowner vacancy rate - Homeowner vacancy rate (percent)

dp0200001

number

DPSF20. Rental vacancy rate - Rental vacancy rate (percent)

dp0210001

integer

DPSF21. Housing tenure - Total occupied housing units

dp0210002

integer

DPSF21. Housing tenure - Owner-occupied housing units

dp0210003

integer

DPSF21. Housing tenure - Renter-occupied housing units

dp0220001

integer

DPSF22. Population in occupied housing units by tenure - Owner-occupied housing units

dp0220002

integer

DPSF22. Population in occupied housing units by tenure - Renter-occupied housing units

dp0230001

number

DPSF23. Average household size by tenure - Owner occupied

dp0230002

number

DPSF23. Average household size by tenure - Renter occupied

functional_status_code_census

string

The functional status (FUNCSTAT) code defines the current functional status of a geographic entity. These codes can be found in the TIGER/Line products, gazetteer files, and other products.

geometry

geometry

Geospatial representation of the feature.

internal_point_latitude

number

Internal point latitude in decimal degrees.

internal_point_longitude

number

Internal point longitude in decimal degrees.

land_area

number

Land area in square meters.

shape_area

number

Area of the feature in square degrees.

shape_length

number

Length of the feature’s perimeter in degrees.

water_area

number

Water area in square meters.

out_censusdp1tract__states#

Noticed any irregularities with this table? File a report.

State-level summarized US Census demographic information, along with a geometry describing state boundaries.

Most-recent data:

2010

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

Census DP1 – Profile of General Demographic Characteristics

Primary key:

state_id_fips

Usage Warnings#

  • The original data is no longer being collected or reported in this way.

Additional Details#

Based on the 2010 Census. Used as a convenient source of geospatial information within PUDL, but should not be treated as authoritative.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_censusdp1tract__states.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_censusdp1tract__states.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_censusdp1tract__states.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_censusdp1tract__states.parquet"
)

Columns#

Field Name

Type

Description

dp0010001

integer

DPSF1. Sex and age - Total population

dp0010002

integer

DPSF1. Sex and age - Total population under 5 years

dp0010003

integer

DPSF1. Sex and age - Total population 5 to 9 years

dp0010004

integer

DPSF1. Sex and age - Total population 10 to 14 years

dp0010005

integer

DPSF1. Sex and age - Total population 15 to 19 years

dp0010006

integer

DPSF1. Sex and age - Total population 20 to 24 years

dp0010007

integer

DPSF1. Sex and age - Total population 25 to 29 years

dp0010008

integer

DPSF1. Sex and age - Total population 30 to 34 years

dp0010009

integer

DPSF1. Sex and age - Total population 35 to 39 years

dp0010010

integer

DPSF1. Sex and age - Total population 40 to 44 years

dp0010011

integer

DPSF1. Sex and age - Total population 45 to 49 years

dp0010012

integer

DPSF1. Sex and age - Total population 50 to 54 years

dp0010013

integer

DPSF1. Sex and age - Total population 55 to 59 years

dp0010014

integer

DPSF1. Sex and age - Total population 60 to 64 years

dp0010015

integer

DPSF1. Sex and age - Total population 65 to 69 years

dp0010016

integer

DPSF1. Sex and age - Total population 70 to 74 years

dp0010017

integer

DPSF1. Sex and age - Total population 75 to 79 years

dp0010018

integer

DPSF1. Sex and age - Total population 80 to 84 years

dp0010019

integer

DPSF1. Sex and age - Total population 85 years and over

dp0010020

integer

DPSF1. Sex and age - Male population total

dp0010021

integer

DPSF1. Sex and age - Male population under 5 years

dp0010022

integer

DPSF1. Sex and age - Male population 5 to 9 years

dp0010023

integer

DPSF1. Sex and age - Male population 10 to 14 years

dp0010024

integer

DPSF1. Sex and age - Male population 15 to 19 years

dp0010025

integer

DPSF1. Sex and age - Male population 20 to 24 years

dp0010026

integer

DPSF1. Sex and age - Male population 25 to 29 years

dp0010027

integer

DPSF1. Sex and age - Male population 30 to 34 years

dp0010028

integer

DPSF1. Sex and age - Male population 35 to 39 years

dp0010029

integer

DPSF1. Sex and age - Male population 40 to 44 years

dp0010030

integer

DPSF1. Sex and age - Male population 45 to 49 years

dp0010031

integer

DPSF1. Sex and age - Male population 50 to 54 years

dp0010032

integer

DPSF1. Sex and age - Male population 55 to 59 years

dp0010033

integer

DPSF1. Sex and age - Male population 60 to 64 years

dp0010034

integer

DPSF1. Sex and age - Male population 65 to 69 years

dp0010035

integer

DPSF1. Sex and age - Male population 70 to 74 years

dp0010036

integer

DPSF1. Sex and age - Male population 75 to 79 years

dp0010037

integer

DPSF1. Sex and age - Male population 80 to 84 years

dp0010038

integer

DPSF1. Sex and age - Male population 85 years and over

dp0010039

integer

DPSF1. Sex and age - Female population total

dp0010040

integer

DPSF1. Sex and age - Female population under 5 years

dp0010041

integer

DPSF1. Sex and age - Female population 5 to 9 years

dp0010042

integer

DPSF1. Sex and age - Female population 10 to 14 years

dp0010043

integer

DPSF1. Sex and age - Female population 15 to 19 years

dp0010044

integer

DPSF1. Sex and age - Female population 20 to 24 years

dp0010045

integer

DPSF1. Sex and age - Female population 25 to 29 years

dp0010046

integer

DPSF1. Sex and age - Female population 30 to 34 years

dp0010047

integer

DPSF1. Sex and age - Female population 35 to 39 years

dp0010048

integer

DPSF1. Sex and age - Female population 40 to 44 years

dp0010049

integer

DPSF1. Sex and age - Female population 45 to 49 years

dp0010050

integer

DPSF1. Sex and age - Female population 50 to 54 years

dp0010051

integer

DPSF1. Sex and age - Female population 55 to 59 years

dp0010052

integer

DPSF1. Sex and age - Female population 60 to 64 years

dp0010053

integer

DPSF1. Sex and age - Female population 65 to 69 years

dp0010054

integer

DPSF1. Sex and age - Female population 70 to 74 years

dp0010055

integer

DPSF1. Sex and age - Female population 75 to 79 years

dp0010056

integer

DPSF1. Sex and age - Female population 80 to 84 years

dp0010057

integer

DPSF1. Sex and age - Female population 85 years and over

dp0020001

number

DPSF2. Median age by sex - Both sexes

dp0020002

number

DPSF2. Median age by sex - Male

dp0020003

number

DPSF2. Median age by sex - Female

dp0030001

integer

DPSF3. Sex for population 16 years and over - Total

dp0030002

integer

DPSF3. Sex for population 16 years and over - Male

dp0030003

integer

DPSF3. Sex for population 16 years and over - Female

dp0040001

integer

DPSF4. Sex for population 18 years and over - Total

dp0040002

integer

DPSF4. Sex for population 18 years and over - Male

dp0040003

integer

DPSF4. Sex for population 18 years and over - Female

dp0050001

integer

DPSF5. Sex for population 21 years and over - Total

dp0050002

integer

DPSF5. Sex for population 21 years and over - Male

dp0050003

integer

DPSF5. Sex for population 21 years and over - Female

dp0060001

integer

DPSF6. Sex for population 62 years and over - Total

dp0060002

integer

DPSF6. Sex for population 62 years and over - Male

dp0060003

integer

DPSF6. Sex for population 62 years and over - Female

dp0070001

integer

DPSF7. Sex for population 65 years and over - Total

dp0070002

integer

DPSF7. Sex for population 65 years and over - Male

dp0070003

integer

DPSF7. Sex for population 65 years and over - Female

dp0080001

integer

DPSF8. Race - Total population

dp0080002

integer

DPSF8. Race - Population of one race

dp0080003

integer

DPSF8. Race - White

dp0080004

integer

DPSF8. Race - Black or African American

dp0080005

integer

DPSF8. Race - American Indian and Alaska Native

dp0080006

integer

DPSF8. Race - Asian total

dp0080007

integer

DPSF8. Race - Asian Indian

dp0080008

integer

DPSF8. Race - Chinese

dp0080009

integer

DPSF8. Race - Filipino

dp0080010

integer

DPSF8. Race - Japanese

dp0080011

integer

DPSF8. Race - Korean

dp0080012

integer

DPSF8. Race - Vietnamese

dp0080013

integer

DPSF8. Race - Other Asian

dp0080014

integer

DPSF8. Race - Native Hawaiian and Other Pacific Islander total

dp0080015

integer

DPSF8. Race - Native Hawaiian

dp0080016

integer

DPSF8. Race - Guamanian or Chamorro

dp0080017

integer

DPSF8. Race - Samoan

dp0080018

integer

DPSF8. Race - Other Pacific Islander

dp0080019

integer

DPSF8. Race - Some Other Race

dp0080020

integer

DPSF8. Race - Population of Two or More Races

dp0080021

integer

DPSF8. Race - White; American Indian and Alaska Native

dp0080022

integer

DPSF8. Race - White; Asian

dp0080023

integer

DPSF8. Race - White; Black or African American

dp0080024

integer

DPSF8. Race - White; Some Other Race

dp0090001

integer

DPSF9. Race (total races tallied) - White alone or in combination with one or more other races

dp0090002

integer

DPSF9. Race (total races tallied) - Black or African American alone or in combination with one or more other races

dp0090003

integer

DPSF9. Race (total races tallied) - American Indian and Alaska Native alone or in combination with one or more other races

dp0090004

integer

DPSF9. Race (total races tallied) - Asian alone or in combination with one or more other races

dp0090005

integer

DPSF9. Race (total races tallied) - Native Hawaiian and Other Pacific Islander alone or in combination with one or more other races

dp0090006

integer

DPSF9. Race (total races tallied) - Some Other Race alone or in combination with one or more other races

dp0100001

integer

DPSF10. Hispanic or Latino by specific origin - Total population

dp0100002

integer

DPSF10. Hispanic or Latino by specific origin - Hispanic or Latino (of any race)

dp0100003

integer

DPSF10. Hispanic or Latino by specific origin - Mexican

dp0100004

integer

DPSF10. Hispanic or Latino by specific origin - Puerto Rican

dp0100005

integer

DPSF10. Hispanic or Latino by specific origin - Cuban

dp0100006

integer

DPSF10. Hispanic or Latino by specific origin - Other Hispanic or Latino

dp0100007

integer

DPSF10. Hispanic or Latino by specific origin - Not Hispanic or Latino

dp0110001

integer

DPSF11. Hispanic or Latino and race - Total population

dp0110002

integer

DPSF11. Hispanic or Latino and race - Hispanic or Latino total

dp0110003

integer

DPSF11. Hispanic or Latino and race - Hispanic or Latino: White alone

dp0110004

integer

DPSF11. Hispanic or Latino and race - Hispanic or Latino: Black or African American alone

dp0110005

integer

DPSF11. Hispanic or Latino and race - Hispanic or Latino: American Indian and Alaska Native alone

dp0110006

integer

DPSF11. Hispanic or Latino and race - Hispanic or Latino: Asian alone

dp0110007

integer

DPSF11. Hispanic or Latino and race - Hispanic or Latino: Native Hawaiian and Other Pacific Islander alone

dp0110008

integer

DPSF11. Hispanic or Latino and race - Hispanic or Latino: Some Other Race alone

dp0110009

integer

DPSF11. Hispanic or Latino and race - Hispanic or Latino: Two or More Races

dp0110010

integer

DPSF11. Hispanic or Latino and race - Not Hispanic or Latino total

dp0110011

integer

DPSF11. Hispanic or Latino and race - Not Hispanic or Latino: White alone

dp0110012

integer

DPSF11. Hispanic or Latino and race - Not Hispanic or Latino: Black or African American alone

dp0110013

integer

DPSF11. Hispanic or Latino and race - Not Hispanic or Latino: American Indian and Alaska Native alone

dp0110014

integer

DPSF11. Hispanic or Latino and race - Not Hispanic or Latino: Asian alone

dp0110015

integer

DPSF11. Hispanic or Latino and race - Not Hispanic or Latino: Native Hawaiian and Other Pacific Islander alone

dp0110016

integer

DPSF11. Hispanic or Latino and race - Not Hispanic or Latino: Some Other Race alone

dp0110017

integer

DPSF11. Hispanic or Latino and race - Not Hispanic or Latino: Two or More Races

dp0120001

integer

DPSF12. Relationship - Total population

dp0120002

integer

DPSF12. Relationship - In households

dp0120003

integer

DPSF12. Relationship - Householder

dp0120004

integer

DPSF12. Relationship - Spouse

dp0120005

integer

DPSF12. Relationship - Child

dp0120006

integer

DPSF12. Relationship - Own child under 18 years

dp0120007

integer

DPSF12. Relationship - Other relatives

dp0120008

integer

DPSF12. Relationship - Other relatives under 18 years

dp0120009

integer

DPSF12. Relationship - Other relatives 65 years and over

dp0120010

integer

DPSF12. Relationship - Nonrelatives

dp0120011

integer

DPSF12. Relationship - Nonrelatives under 18 years

dp0120012

integer

DPSF12. Relationship - Nonrelatives 65 years and over

dp0120013

integer

DPSF12. Relationship - Unmarried partner

dp0120014

integer

DPSF12. Relationship - In group quarters

dp0120015

integer

DPSF12. Relationship - Institutionalized population

dp0120016

integer

DPSF12. Relationship - Institutionalized population: Male

dp0120017

integer

DPSF12. Relationship - Institutionalized population: Female

dp0120018

integer

DPSF12. Relationship - Noninstitutionalized population

dp0120019

integer

DPSF12. Relationship - Noninstitutionalized population: Male

dp0120020

integer

DPSF12. Relationship - Noninstitutionalized population: Female

dp0130001

integer

DPSF13. Households by type - Total households

dp0130002

integer

DPSF13. Households by type - Family households (families)

dp0130003

integer

DPSF13. Households by type - Family households with own children under 18 years

dp0130004

integer

DPSF13. Households by type - Husband-wife family

dp0130005

integer

DPSF13. Households by type - Husband-wife family with own children under 18 years

dp0130006

integer

DPSF13. Households by type - Male householder, no wife present

dp0130007

integer

DPSF13. Households by type - Male householder, no wife present, with own children under 18 years

dp0130008

integer

DPSF13. Households by type - Female householder, no husband present

dp0130009

integer

DPSF13. Households by type - Female householder, no husband present, with own children under 18 years

dp0130010

integer

DPSF13. Households by type - Nonfamily households

dp0130011

integer

DPSF13. Households by type - Householder living alone

dp0130012

integer

DPSF13. Households by type - Householder living alone: Male

dp0130013

integer

DPSF13. Households by type - Householder living alone: Male 65 years and over

dp0130014

integer

DPSF13. Households by type - Householder living alone: Female

dp0130015

integer

DPSF13. Households by type - Householder living alone: Female 65 years and over

dp0140001

integer

DPSF14. Households with individuals under 18 years - Total

dp0150001

integer

DPSF15. Households with individuals 65 years and over - Total

dp0160001

number

DPSF16. Average household size - Average household size

dp0170001

number

DPSF17. Average family size - Average family size

dp0180001

integer

DPSF18. Housing occupancy - Total housing units

dp0180002

integer

DPSF18. Housing occupancy - Occupied housing units

dp0180003

integer

DPSF18. Housing occupancy - Vacant housing units

dp0180004

integer

DPSF18. Housing occupancy - Vacant housing units for rent

dp0180005

integer

DPSF18. Housing occupancy - Vacant housing units rented, not occupied

dp0180006

integer

DPSF18. Housing occupancy - Vacant housing units for sale only

dp0180007

integer

DPSF18. Housing occupancy - Vacant housing units sold, not occupied

dp0180008

integer

DPSF18. Housing occupancy - Vacant housing units for seasonal, recreational, or occasional use

dp0180009

integer

DPSF18. Housing occupancy - All other vacant housing units

dp0190001

number

DPSF19. Homeowner vacancy rate - Homeowner vacancy rate (percent)

dp0200001

number

DPSF20. Rental vacancy rate - Rental vacancy rate (percent)

dp0210001

integer

DPSF21. Housing tenure - Total occupied housing units

dp0210002

integer

DPSF21. Housing tenure - Owner-occupied housing units

dp0210003

integer

DPSF21. Housing tenure - Renter-occupied housing units

dp0220001

integer

DPSF22. Population in occupied housing units by tenure - Owner-occupied housing units

dp0220002

integer

DPSF22. Population in occupied housing units by tenure - Renter-occupied housing units

dp0230001

number

DPSF23. Average household size by tenure - Owner occupied

dp0230002

number

DPSF23. Average household size by tenure - Renter occupied

geometry

geometry

Geospatial representation of the feature.

internal_point_latitude

number

Internal point latitude in decimal degrees.

internal_point_longitude

number

Internal point longitude in decimal degrees.

land_area

number

Land area in square meters.

shape_area

number

Area of the feature in square degrees.

shape_length

number

Length of the feature’s perimeter in degrees.

state

string

Two letter US state abbreviation.

state_id_fips

string

Two digit state FIPS code.

state_name

string

Full name of the state.

water_area

number

Water area in square meters.

out_censusdp1tract__tracts#

Noticed any irregularities with this table? File a report.

Census tract-level summarized US Census demographic information, along with a geometry describing tract boundaries.

Most-recent data:

2010

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

Census DP1 – Profile of General Demographic Characteristics

Primary key:

tract_id_fips

Usage Warnings#

  • The original data is no longer being collected or reported in this way.

Additional Details#

Based on the 2010 Census. Used as a convenient source of geospatial information within PUDL, but should not be treated as authoritative.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_censusdp1tract__tracts.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_censusdp1tract__tracts.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_censusdp1tract__tracts.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_censusdp1tract__tracts.parquet"
)

Columns#

Field Name

Type

Description

dp0010001

integer

DPSF1. Sex and age - Total population

dp0010002

integer

DPSF1. Sex and age - Total population under 5 years

dp0010003

integer

DPSF1. Sex and age - Total population 5 to 9 years

dp0010004

integer

DPSF1. Sex and age - Total population 10 to 14 years

dp0010005

integer

DPSF1. Sex and age - Total population 15 to 19 years

dp0010006

integer

DPSF1. Sex and age - Total population 20 to 24 years

dp0010007

integer

DPSF1. Sex and age - Total population 25 to 29 years

dp0010008

integer

DPSF1. Sex and age - Total population 30 to 34 years

dp0010009

integer

DPSF1. Sex and age - Total population 35 to 39 years

dp0010010

integer

DPSF1. Sex and age - Total population 40 to 44 years

dp0010011

integer

DPSF1. Sex and age - Total population 45 to 49 years

dp0010012

integer

DPSF1. Sex and age - Total population 50 to 54 years

dp0010013

integer

DPSF1. Sex and age - Total population 55 to 59 years

dp0010014

integer

DPSF1. Sex and age - Total population 60 to 64 years

dp0010015

integer

DPSF1. Sex and age - Total population 65 to 69 years

dp0010016

integer

DPSF1. Sex and age - Total population 70 to 74 years

dp0010017

integer

DPSF1. Sex and age - Total population 75 to 79 years

dp0010018

integer

DPSF1. Sex and age - Total population 80 to 84 years

dp0010019

integer

DPSF1. Sex and age - Total population 85 years and over

dp0010020

integer

DPSF1. Sex and age - Male population total

dp0010021

integer

DPSF1. Sex and age - Male population under 5 years

dp0010022

integer

DPSF1. Sex and age - Male population 5 to 9 years

dp0010023

integer

DPSF1. Sex and age - Male population 10 to 14 years

dp0010024

integer

DPSF1. Sex and age - Male population 15 to 19 years

dp0010025

integer

DPSF1. Sex and age - Male population 20 to 24 years

dp0010026

integer

DPSF1. Sex and age - Male population 25 to 29 years

dp0010027

integer

DPSF1. Sex and age - Male population 30 to 34 years

dp0010028

integer

DPSF1. Sex and age - Male population 35 to 39 years

dp0010029

integer

DPSF1. Sex and age - Male population 40 to 44 years

dp0010030

integer

DPSF1. Sex and age - Male population 45 to 49 years

dp0010031

integer

DPSF1. Sex and age - Male population 50 to 54 years

dp0010032

integer

DPSF1. Sex and age - Male population 55 to 59 years

dp0010033

integer

DPSF1. Sex and age - Male population 60 to 64 years

dp0010034

integer

DPSF1. Sex and age - Male population 65 to 69 years

dp0010035

integer

DPSF1. Sex and age - Male population 70 to 74 years

dp0010036

integer

DPSF1. Sex and age - Male population 75 to 79 years

dp0010037

integer

DPSF1. Sex and age - Male population 80 to 84 years

dp0010038

integer

DPSF1. Sex and age - Male population 85 years and over

dp0010039

integer

DPSF1. Sex and age - Female population total

dp0010040

integer

DPSF1. Sex and age - Female population under 5 years

dp0010041

integer

DPSF1. Sex and age - Female population 5 to 9 years

dp0010042

integer

DPSF1. Sex and age - Female population 10 to 14 years

dp0010043

integer

DPSF1. Sex and age - Female population 15 to 19 years

dp0010044

integer

DPSF1. Sex and age - Female population 20 to 24 years

dp0010045

integer

DPSF1. Sex and age - Female population 25 to 29 years

dp0010046

integer

DPSF1. Sex and age - Female population 30 to 34 years

dp0010047

integer

DPSF1. Sex and age - Female population 35 to 39 years

dp0010048

integer

DPSF1. Sex and age - Female population 40 to 44 years

dp0010049

integer

DPSF1. Sex and age - Female population 45 to 49 years

dp0010050

integer

DPSF1. Sex and age - Female population 50 to 54 years

dp0010051

integer

DPSF1. Sex and age - Female population 55 to 59 years

dp0010052

integer

DPSF1. Sex and age - Female population 60 to 64 years

dp0010053

integer

DPSF1. Sex and age - Female population 65 to 69 years

dp0010054

integer

DPSF1. Sex and age - Female population 70 to 74 years

dp0010055

integer

DPSF1. Sex and age - Female population 75 to 79 years

dp0010056

integer

DPSF1. Sex and age - Female population 80 to 84 years

dp0010057

integer

DPSF1. Sex and age - Female population 85 years and over

dp0020001

number

DPSF2. Median age by sex - Both sexes

dp0020002

number

DPSF2. Median age by sex - Male

dp0020003

number

DPSF2. Median age by sex - Female

dp0030001

integer

DPSF3. Sex for population 16 years and over - Total

dp0030002

integer

DPSF3. Sex for population 16 years and over - Male

dp0030003

integer

DPSF3. Sex for population 16 years and over - Female

dp0040001

integer

DPSF4. Sex for population 18 years and over - Total

dp0040002

integer

DPSF4. Sex for population 18 years and over - Male

dp0040003

integer

DPSF4. Sex for population 18 years and over - Female

dp0050001

integer

DPSF5. Sex for population 21 years and over - Total

dp0050002

integer

DPSF5. Sex for population 21 years and over - Male

dp0050003

integer

DPSF5. Sex for population 21 years and over - Female

dp0060001

integer

DPSF6. Sex for population 62 years and over - Total

dp0060002

integer

DPSF6. Sex for population 62 years and over - Male

dp0060003

integer

DPSF6. Sex for population 62 years and over - Female

dp0070001

integer

DPSF7. Sex for population 65 years and over - Total

dp0070002

integer

DPSF7. Sex for population 65 years and over - Male

dp0070003

integer

DPSF7. Sex for population 65 years and over - Female

dp0080001

integer

DPSF8. Race - Total population

dp0080002

integer

DPSF8. Race - Population of one race

dp0080003

integer

DPSF8. Race - White

dp0080004

integer

DPSF8. Race - Black or African American

dp0080005

integer

DPSF8. Race - American Indian and Alaska Native

dp0080006

integer

DPSF8. Race - Asian total

dp0080007

integer

DPSF8. Race - Asian Indian

dp0080008

integer

DPSF8. Race - Chinese

dp0080009

integer

DPSF8. Race - Filipino

dp0080010

integer

DPSF8. Race - Japanese

dp0080011

integer

DPSF8. Race - Korean

dp0080012

integer

DPSF8. Race - Vietnamese

dp0080013

integer

DPSF8. Race - Other Asian

dp0080014

integer

DPSF8. Race - Native Hawaiian and Other Pacific Islander total

dp0080015

integer

DPSF8. Race - Native Hawaiian

dp0080016

integer

DPSF8. Race - Guamanian or Chamorro

dp0080017

integer

DPSF8. Race - Samoan

dp0080018

integer

DPSF8. Race - Other Pacific Islander

dp0080019

integer

DPSF8. Race - Some Other Race

dp0080020

integer

DPSF8. Race - Population of Two or More Races

dp0080021

integer

DPSF8. Race - White; American Indian and Alaska Native

dp0080022

integer

DPSF8. Race - White; Asian

dp0080023

integer

DPSF8. Race - White; Black or African American

dp0080024

integer

DPSF8. Race - White; Some Other Race

dp0090001

integer

DPSF9. Race (total races tallied) - White alone or in combination with one or more other races

dp0090002

integer

DPSF9. Race (total races tallied) - Black or African American alone or in combination with one or more other races

dp0090003

integer

DPSF9. Race (total races tallied) - American Indian and Alaska Native alone or in combination with one or more other races

dp0090004

integer

DPSF9. Race (total races tallied) - Asian alone or in combination with one or more other races

dp0090005

integer

DPSF9. Race (total races tallied) - Native Hawaiian and Other Pacific Islander alone or in combination with one or more other races

dp0090006

integer

DPSF9. Race (total races tallied) - Some Other Race alone or in combination with one or more other races

dp0100001

integer

DPSF10. Hispanic or Latino by specific origin - Total population

dp0100002

integer

DPSF10. Hispanic or Latino by specific origin - Hispanic or Latino (of any race)

dp0100003

integer

DPSF10. Hispanic or Latino by specific origin - Mexican

dp0100004

integer

DPSF10. Hispanic or Latino by specific origin - Puerto Rican

dp0100005

integer

DPSF10. Hispanic or Latino by specific origin - Cuban

dp0100006

integer

DPSF10. Hispanic or Latino by specific origin - Other Hispanic or Latino

dp0100007

integer

DPSF10. Hispanic or Latino by specific origin - Not Hispanic or Latino

dp0110001

integer

DPSF11. Hispanic or Latino and race - Total population

dp0110002

integer

DPSF11. Hispanic or Latino and race - Hispanic or Latino total

dp0110003

integer

DPSF11. Hispanic or Latino and race - Hispanic or Latino: White alone

dp0110004

integer

DPSF11. Hispanic or Latino and race - Hispanic or Latino: Black or African American alone

dp0110005

integer

DPSF11. Hispanic or Latino and race - Hispanic or Latino: American Indian and Alaska Native alone

dp0110006

integer

DPSF11. Hispanic or Latino and race - Hispanic or Latino: Asian alone

dp0110007

integer

DPSF11. Hispanic or Latino and race - Hispanic or Latino: Native Hawaiian and Other Pacific Islander alone

dp0110008

integer

DPSF11. Hispanic or Latino and race - Hispanic or Latino: Some Other Race alone

dp0110009

integer

DPSF11. Hispanic or Latino and race - Hispanic or Latino: Two or More Races

dp0110010

integer

DPSF11. Hispanic or Latino and race - Not Hispanic or Latino total

dp0110011

integer

DPSF11. Hispanic or Latino and race - Not Hispanic or Latino: White alone

dp0110012

integer

DPSF11. Hispanic or Latino and race - Not Hispanic or Latino: Black or African American alone

dp0110013

integer

DPSF11. Hispanic or Latino and race - Not Hispanic or Latino: American Indian and Alaska Native alone

dp0110014

integer

DPSF11. Hispanic or Latino and race - Not Hispanic or Latino: Asian alone

dp0110015

integer

DPSF11. Hispanic or Latino and race - Not Hispanic or Latino: Native Hawaiian and Other Pacific Islander alone

dp0110016

integer

DPSF11. Hispanic or Latino and race - Not Hispanic or Latino: Some Other Race alone

dp0110017

integer

DPSF11. Hispanic or Latino and race - Not Hispanic or Latino: Two or More Races

dp0120001

integer

DPSF12. Relationship - Total population

dp0120002

integer

DPSF12. Relationship - In households

dp0120003

integer

DPSF12. Relationship - Householder

dp0120004

integer

DPSF12. Relationship - Spouse

dp0120005

integer

DPSF12. Relationship - Child

dp0120006

integer

DPSF12. Relationship - Own child under 18 years

dp0120007

integer

DPSF12. Relationship - Other relatives

dp0120008

integer

DPSF12. Relationship - Other relatives under 18 years

dp0120009

integer

DPSF12. Relationship - Other relatives 65 years and over

dp0120010

integer

DPSF12. Relationship - Nonrelatives

dp0120011

integer

DPSF12. Relationship - Nonrelatives under 18 years

dp0120012

integer

DPSF12. Relationship - Nonrelatives 65 years and over

dp0120013

integer

DPSF12. Relationship - Unmarried partner

dp0120014

integer

DPSF12. Relationship - In group quarters

dp0120015

integer

DPSF12. Relationship - Institutionalized population

dp0120016

integer

DPSF12. Relationship - Institutionalized population: Male

dp0120017

integer

DPSF12. Relationship - Institutionalized population: Female

dp0120018

integer

DPSF12. Relationship - Noninstitutionalized population

dp0120019

integer

DPSF12. Relationship - Noninstitutionalized population: Male

dp0120020

integer

DPSF12. Relationship - Noninstitutionalized population: Female

dp0130001

integer

DPSF13. Households by type - Total households

dp0130002

integer

DPSF13. Households by type - Family households (families)

dp0130003

integer

DPSF13. Households by type - Family households with own children under 18 years

dp0130004

integer

DPSF13. Households by type - Husband-wife family

dp0130005

integer

DPSF13. Households by type - Husband-wife family with own children under 18 years

dp0130006

integer

DPSF13. Households by type - Male householder, no wife present

dp0130007

integer

DPSF13. Households by type - Male householder, no wife present, with own children under 18 years

dp0130008

integer

DPSF13. Households by type - Female householder, no husband present

dp0130009

integer

DPSF13. Households by type - Female householder, no husband present, with own children under 18 years

dp0130010

integer

DPSF13. Households by type - Nonfamily households

dp0130011

integer

DPSF13. Households by type - Householder living alone

dp0130012

integer

DPSF13. Households by type - Householder living alone: Male

dp0130013

integer

DPSF13. Households by type - Householder living alone: Male 65 years and over

dp0130014

integer

DPSF13. Households by type - Householder living alone: Female

dp0130015

integer

DPSF13. Households by type - Householder living alone: Female 65 years and over

dp0140001

integer

DPSF14. Households with individuals under 18 years - Total

dp0150001

integer

DPSF15. Households with individuals 65 years and over - Total

dp0160001

number

DPSF16. Average household size - Average household size

dp0170001

number

DPSF17. Average family size - Average family size

dp0180001

integer

DPSF18. Housing occupancy - Total housing units

dp0180002

integer

DPSF18. Housing occupancy - Occupied housing units

dp0180003

integer

DPSF18. Housing occupancy - Vacant housing units

dp0180004

integer

DPSF18. Housing occupancy - Vacant housing units for rent

dp0180005

integer

DPSF18. Housing occupancy - Vacant housing units rented, not occupied

dp0180006

integer

DPSF18. Housing occupancy - Vacant housing units for sale only

dp0180007

integer

DPSF18. Housing occupancy - Vacant housing units sold, not occupied

dp0180008

integer

DPSF18. Housing occupancy - Vacant housing units for seasonal, recreational, or occasional use

dp0180009

integer

DPSF18. Housing occupancy - All other vacant housing units

dp0190001

number

DPSF19. Homeowner vacancy rate - Homeowner vacancy rate (percent)

dp0200001

number

DPSF20. Rental vacancy rate - Rental vacancy rate (percent)

dp0210001

integer

DPSF21. Housing tenure - Total occupied housing units

dp0210002

integer

DPSF21. Housing tenure - Owner-occupied housing units

dp0210003

integer

DPSF21. Housing tenure - Renter-occupied housing units

dp0220001

integer

DPSF22. Population in occupied housing units by tenure - Owner-occupied housing units

dp0220002

integer

DPSF22. Population in occupied housing units by tenure - Renter-occupied housing units

dp0230001

number

DPSF23. Average household size by tenure - Owner occupied

dp0230002

number

DPSF23. Average household size by tenure - Renter occupied

geometry

geometry

Geospatial representation of the feature.

internal_point_latitude

number

Internal point latitude in decimal degrees.

internal_point_longitude

number

Internal point longitude in decimal degrees.

land_area

number

Land area in square meters.

shape_area

number

Area of the feature in square degrees.

shape_length

number

Length of the feature’s perimeter in degrees.

tract_id_fips

string

Census tract 10-digit FIPS code

tract_name

string

Census tract legal/statistical area description

water_area

number

Water area in square meters.

out_eia860__yearly_emissions_control_equipment#

Noticed any irregularities with this table? File a report.

Annual time series of the cost, type, operating status, retirement date, and install year of emissions control equipment reported to EIA.

Most-recent data:

2024

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

EIA Form 860 – Annual Electric Generator Report

Primary key:

report_date, plant_id_eia, emission_control_id_pudl

Additional Details#

Includes control ids for sulfur dioxide (SO2), particulate matter, mercury, nitrogen oxide (NOX), and acid (HCl) gas monitoring.

This denormalized version includes plant name, utility id, pudl id, and utility name columns.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia860__yearly_emissions_control_equipment.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia860__yearly_emissions_control_equipment.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia860__yearly_emissions_control_equipment.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia860__yearly_emissions_control_equipment.parquet"
)

Columns#

Field Name

Type

Description

acid_gas_control

boolean

Indicates whether the emissions control equipment controls acid (HCl) gas.

data_maturity

string

Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.

emission_control_equipment_cost

number

The total cost to install a piece of emission control equipment.

emission_control_equipment_type_code

string

Short code indicating the type of emission control equipment installed.

emission_control_id_pudl

number

A PUDL-generated ID used to distinguish emission control units in the same report year and plant id. This ID should not be used to track units over time or between plants.

emission_control_operating_date

date

The date a piece of emissions control equipment began operating. Derived from month and year columns in the raw data.

emission_control_retirement_date

date

The expected or actual retirement date for a piece of emissions control equipment. Derived from month and year columns in the raw data.

mercury_control_id_eia

string

Mercury control identification number. This ID is not a unique identifier.

nox_control_id_eia

string

Nitrogen oxide control identification number. This ID is not a unique identifier.

operational_status

string

The operating status of the asset. For generators this is based on which tab the generator was listed in in EIA 860.

operational_status_code

string

The operating status of the asset.

particulate_control_id_eia

string

Particulate matter control identification number. This ID is not a unique identifier.

plant_id_eia

integer

The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.

plant_id_pudl

integer

A manually assigned PUDL plant ID. May not be constant over time.

plant_name_eia

string

Plant name.

report_date

date

Date reported.

so2_control_id_eia

string

Sulfur dioxide control identification number. This ID is not a unique identifier.

utility_id_eia

integer

The EIA Utility Identification number.

utility_id_pudl

integer

A manually assigned PUDL utility ID. May not be stable over time.

utility_name_eia

string

The name of the utility.

out_eia860__yearly_ownership#

Noticed any irregularities with this table? File a report.

Annual time series of generator ownership. Includes only jointly or third-party owned generators.

Most-recent data:

2024

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

EIA Form 860 – Annual Electric Generator Report (Schedule 4)

Primary key:

report_date, plant_id_eia, generator_id, owner_utility_id_eia

Additional Details#

Denormalized to include plant and utility names and other associated IDs.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia860__yearly_ownership.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia860__yearly_ownership.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia860__yearly_ownership.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia860__yearly_ownership.parquet"
)

Columns#

Field Name

Type

Description

data_maturity

string

Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.

fraction_owned

number

Proportion of generator ownership attributable to this utility.

generator_id

string

Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string!

operator_utility_id_eia

integer

The EIA utility Identification number for the operator utility.

owner_city

string

City of owner.

owner_country

string

Three letter ISO-3166 country code.

owner_state

string

Two letter ISO-3166 political subdivision code.

owner_street_address

string

Steet address of owner.

owner_utility_id_eia

integer

The EIA utility Identification number for the owner company that is responsible for the day-to-day operations of the generator, not the operator utility.

owner_utility_name_eia

string

The name of the EIA owner utility.

owner_zip_code

string

Zip code of owner.

plant_id_eia

integer

The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.

plant_id_pudl

integer

A manually assigned PUDL plant ID. May not be constant over time.

plant_name_eia

string

Plant name.

report_date

date

Date reported.

utility_id_pudl

integer

A manually assigned PUDL utility ID for the owner company that is responsible for the day-to-day operations of the generator, not the operator utility. May not be stable over time.

out_eia861__yearly_balancing_authority_service_territory#

Noticed any irregularities with this table? File a report.

Annual time series of counties in balancing authority service territories.

Most-recent data:

2024

Processing:

Data is ready for use in analyses, but for practical reasons has not been denormalized and remains in narrow format.

Source:

EIA Form 861 – Annual Electric Power Industry Report

Primary key:

balancing_authority_id_eia, report_date, county_id_fips, county

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia861__yearly_balancing_authority_service_territory.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia861__yearly_balancing_authority_service_territory.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia861__yearly_balancing_authority_service_territory.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia861__yearly_balancing_authority_service_territory.parquet"
)

Columns#

Field Name

Type

Description

area_km2

number

County area in km2.

balancing_authority_id_eia

integer

EIA balancing authority ID. This is often (but not always!) the same as the utility ID associated with the same legal entity.

county

string

County name.

county_id_fips

string

County ID from the Federal Information Processing Standard Publication 6-4.

county_name_census

string

County name as specified in Census DP1 Data.

population

number

County population, sourced from Census DP1 data.

report_date

date

Date reported.

state

string

Two letter US state abbreviation.

state_id_fips

string

Two digit state FIPS code.

out_eia861__yearly_utility_service_territory#

Noticed any irregularities with this table? File a report.

Annual time series of counties in utility service territories.

Most-recent data:

2024

Processing:

Data is ready for use in analyses, but for practical reasons has not been denormalized and remains in narrow format.

Source:

EIA Form 861 – Annual Electric Power Industry Report

Primary key:

utility_id_eia, report_date, county_id_fips

Additional Details#

Contains additional information about counties.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia861__yearly_utility_service_territory.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia861__yearly_utility_service_territory.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia861__yearly_utility_service_territory.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia861__yearly_utility_service_territory.parquet"
)

Columns#

Field Name

Type

Description

area_km2

number

County area in km2.

county

string

County name.

county_id_fips

string

County ID from the Federal Information Processing Standard Publication 6-4.

county_name_census

string

County name as specified in Census DP1 Data.

population

number

County population, sourced from Census DP1 data.

report_date

date

Date reported.

state

string

Two letter US state abbreviation.

state_id_fips

string

Two digit state FIPS code.

utility_id_eia

integer

The EIA Utility Identification number.

out_eia923__boiler_fuel#

Noticed any irregularities with this table? File a report.

Monthly time series of boiler fuel consumption and emissions.

Most-recent data:

2025

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

EIA Form 923 – Power Plant Operations Report (Schedule 3)

Primary key:

plant_id_eia, boiler_id, energy_source_code, prime_mover_code, report_date

Usage Warnings#

  • Date column arbitrarily uses the first of the month.

  • A small number of respondents only report annual fuel consumption, and all of it is reported in December.

  • Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.

Additional Details#

Reports the quantity of each type of fuel consumed by each boiler on a monthly basis, as well as the sulfur and ash content of those fuels. Fuel quantity is reported in standard EIA fuel units (tons, barrels, Mcf). Heat content per unit of fuel is also reported, making this table useful for calculating the thermal efficiency (heat rate) of various generation units.

This table provides better coverage of the entire fleet of generators than the core_eia923__monthly_generation_fuel table, but the fuel consumption reported here is not directly associated with a generator. This complicates the heat rate calculation, since the associations between individual boilers and generators are incomplete and can be complex.

This table inherits canonicalized values for plants and utilities. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__boiler_fuel.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__boiler_fuel.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia923__boiler_fuel.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__boiler_fuel.parquet"
)

Columns#

Field Name

Type

Description

ash_content_pct

number

Ash content percentage by weight to the nearest 0.1 percent.

boiler_id

string

Alphanumeric boiler ID.

data_maturity

string

Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.

energy_source_code

string

A 2-3 letter code indicating the energy source (e.g. fuel type) associated with the record.

fuel_consumed_mmbtu

number

Total consumption of fuel in physical unit, year to date. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production.

fuel_consumed_units

number

Consumption of the fuel type in physical unit. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production.

fuel_mmbtu_per_unit

number

Heat content of the fuel in millions of Btus per physical unit.

fuel_type_code_pudl

string

Simplified fuel type code used in PUDL

plant_id_eia

integer

The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.

plant_id_pudl

integer

A manually assigned PUDL plant ID. May not be constant over time.

plant_name_eia

string

Plant name.

prime_mover_code

string

Code for the type of prime mover (e.g. CT, CG)

report_date

date

Date reported.

sulfur_content_pct

number

Sulfur content percentage by weight to the nearest 0.01 percent.

unit_id_pudl

integer

Dynamically assigned PUDL unit id. WARNING: This ID is not guaranteed to be static long term as the input data and algorithm may evolve over time.

utility_id_eia

integer

The EIA Utility Identification number.

utility_id_pudl

integer

A manually assigned PUDL utility ID. May not be stable over time.

utility_name_eia

string

The name of the utility.

out_eia923__fuel_receipts_costs#

Noticed any irregularities with this table? File a report.

Individual fuel deliveries to power plants, organized by fuel type and supplier.

Most-recent data:

2025

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

EIA Form 923 – Power Plant Operations Report (Schedule 2 - Part A)

Primary key:

This table has no primary key. Each record describes an individual fuel delivery. There can be multiple deliveries of the same type of fuel from the same supplier to the same plant in a single month, so the table has no natural primary key.

Usage Warnings#

  • Date column arbitrarily uses the first of the month.

  • Some values have been redacted.

  • Time of fuel deliveries is not necessarily connected with time of fuel consumption.

  • Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.

  • Contains estimated values.

Additional Details#

There can be a significant delay between the receipt of fuel and its consumption, so using this table to infer monthly attributes associated with power generation may not be entirely accurate. However, this is the most granular data we have describing fuel costs, and we use it in calculating the marginal cost of electricity for individual generation units.

Under some circumstances utilities are allowed to treat the price of fuel as proprietary business data, meaning it is redacted from the publicly available spreadsheets. It’s still reported to EIA and influences the aggregated (state, region, annual, etc.) fuel prices they publish. From 2009-2021 about 1/3 of all prices are redacted. The missing data is not randomly distributed. Deregulated markets dominated by merchant generators (independent power producers) redact much more data, and natural gas is by far the most likely fuel to have its price redacted. This means, for instance, that the entire Northeastern US reports essentially no fine-grained data about its natural gas prices.

Additional data which we haven’t yet integrated is available in a similar format from 2002-2008 via the EIA-423, and going back as far as 1972 from the FERC-423.

This table inherits canonicalized values for plants and utilities. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__fuel_receipts_costs.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__fuel_receipts_costs.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia923__fuel_receipts_costs.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__fuel_receipts_costs.parquet"
)

Columns#

Field Name

Type

Description

ash_content_pct

number

Ash content percentage by weight to the nearest 0.1 percent.

bulk_agg_fuel_cost_per_mmbtu

number

Fuel cost per MMBTU reported in the EIA bulk electricity data. This is an aggregate average fuel price for a whole state, region, month, sector, etc. Used to fill in missing fuel prices.

chlorine_content_ppm

number

For coal only: the chlorine content in parts per million (ppm) to the nearest 0.001 ppm. If lab tests of the coal do not include the chlorine content, this field contains the amount specified in the contract with the supplier.

coalmine_county_id_fips

string

County ID from the Federal Information Processing Standard Publication 6-4. This is the county where the coal mine is located.

contract_expiration_date

date

Date contract expires.Format: MMYY.

contract_type_code

string

Purchase type under which receipts occurred in the reporting month. C: Contract, NC: New Contract, S: Spot Purchase, T: Tolling Agreement.

data_maturity

string

Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.

energy_source_code

string

A 2-3 letter code indicating the energy source (e.g. fuel type) associated with the record.

fuel_consumed_mmbtu

number

Total consumption of fuel in physical unit, year to date. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production.

fuel_cost_per_mmbtu

number

Average fuel cost per MMBTU of heat content in nominal USD.

fuel_cost_per_mmbtu_source

string

Indicates the source of the values in the fuel_cost_per_mmbtu column. The fuel cost either comes directly from the EIA forms (original), was filled in from the EIA’s API using state-level averages (eiaapi), was filled in using a rolling average (rolling_avg) or When the records get aggregated together and contain multiple sources (mixed).

fuel_group_code

string

Fuel groups used in the Electric Power Monthly

fuel_mmbtu_per_unit

number

Heat content of the fuel in millions of Btus per physical unit.

fuel_received_units

number

Quantity of fuel received in tons, barrel, or Mcf.

fuel_type_code_pudl

string

Simplified fuel type code used in PUDL

mercury_content_ppm

number

Mercury content in parts per million (ppm) to the nearest 0.001 ppm.

mine_id_msha

integer

MSHA issued mine identifier.

mine_name

string

Coal mine name.

mine_state

string

State where the coal mine is located. Two letter abbreviation.

mine_type_code

string

Type of coal mine.

moisture_content_pct

number

For coal only: the moisture content of the fuel in terms of moisture percentage by weight. Reported to the nearest 0.01 percent.

natural_gas_delivery_contract_type_code

string

Contract type for natural gas delivery service:

natural_gas_transport_code

string

Contract type for natural gas transportation service.

plant_id_eia

integer

The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.

plant_id_pudl

integer

A manually assigned PUDL plant ID. May not be constant over time.

plant_name_eia

string

Plant name.

primary_transportation_mode_code

string

Transportation mode for the longest distance transported.

report_date

date

Date reported.

secondary_transportation_mode_code

string

Transportation mode for the second longest distance transported.

state

string

Two letter US state abbreviation.

sulfur_content_pct

number

Sulfur content percentage by weight to the nearest 0.01 percent.

supplier_name

string

Company that sold the fuel to the plant or, in the case of Natural Gas, pipeline owner.

total_fuel_cost

number

Total annual reported fuel costs for the plant part. Includes costs from all fuels.

utility_id_eia

integer

The EIA Utility Identification number.

utility_id_pudl

integer

A manually assigned PUDL utility ID. May not be stable over time.

utility_name_eia

string

The name of the utility.

out_eia923__generation#

Noticed any irregularities with this table? File a report.

Monthly time series of net electricity generated by power plant generators.

Most-recent data:

2025

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

EIA Form 923 – Power Plant Operations Report (Schedule 3)

Primary key:

plant_id_eia, generator_id, report_date

Usage Warnings#

  • Date column arbitrarily uses the first of the month.

  • Not all IDs are present.

  • Table has known low coverage - either geographic or temporal or otherwise.

  • A small number of respondents only report annual fuel consumption, and all of it is reported in December.

  • Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.

Additional Details#

This is the most granular information we have about how much electricity individual generators are producing, but only about half of all the generation reported in the core_eia923__monthly_generation_fuel appears in this table due to the different reporting requirements imposed on different types and sizes of generators.

Whenever possible, we use this generator-level net generation to estimate the heat rates of generation units and the marginal cost of electricity on a per-generator basis, but those calculations depend on plant-level fuel costs and sometimes uncertain or incomplete boiler-generator associations.

This table inherits canonicalized values for plants and utilities. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__generation.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__generation.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia923__generation.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__generation.parquet"
)

Columns#

Field Name

Type

Description

data_maturity

string

Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.

generator_id

string

Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string!

net_generation_mwh

number

Net electricity generation for the specified period in megawatt-hours (MWh).

plant_id_eia

integer

The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.

plant_id_pudl

integer

A manually assigned PUDL plant ID. May not be constant over time.

plant_name_eia

string

Plant name.

report_date

date

Date reported.

unit_id_pudl

integer

Dynamically assigned PUDL unit id. WARNING: This ID is not guaranteed to be static long term as the input data and algorithm may evolve over time.

utility_id_eia

integer

The EIA Utility Identification number.

utility_id_pudl

integer

A manually assigned PUDL utility ID. May not be stable over time.

utility_name_eia

string

The name of the utility.

out_eia923__generation_fuel_combined#

Noticed any irregularities with this table? File a report.

Monthly time series of generation and fuel consumption of all generation units.

Most-recent data:

2025

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

EIA Form 923 – Power Plant Operations Report (Schedule 3)

Primary key:

plant_id_eia, report_date, prime_mover_code, energy_source_code

Usage Warnings#

  • Date column arbitrarily uses the first of the month.

  • A small number of respondents only report annual fuel consumption, and all of it is reported in December.

  • Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.

Additional Details#

Denormalized, combined data from the core_eia923__monthly_generation_fuel and core_eia923__monthly_generation_fuel_nuclear with nuclear generation aggregated from the nuclear generation unit level up to the plant prime mover level, so as to be compatible with fossil fuel generation data.

This data can’t be easily linked to individual boilers, generators, and generation units, but it is provides the most complete coverage of fuel consumption and electricity generation for the entire generation fleet. We use the primary fuels and prime movers reported for each generator along with their capacities to attribute fuel consumption and generation when it isn’t directly reported in the core_eia923__monthly_generation and core_eia923__monthly_boiler_fuel tables in order to calculate capacity factors, heat rates, and the marginal cost of electricity.

The table makes a distinction between all fuel consumed and fuel consumed for electricity generation because some units are also combined heat and power (CHP) units, and also provide high temperature process heat at the expense of net electricity generation.

This table inherits canonicalized values for plants and utilities. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__generation_fuel_combined.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__generation_fuel_combined.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia923__generation_fuel_combined.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__generation_fuel_combined.parquet"
)

Columns#

Field Name

Type

Description

data_maturity

string

Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.

energy_source_code

string

A 2-3 letter code indicating the energy source (e.g. fuel type) associated with the record.

fuel_consumed_for_electricity_mmbtu

number

Total consumption of fuel to produce electricity, in physical unit, year to date.

fuel_consumed_for_electricity_units

number

Consumption for electric generation of the fuel type in physical unit.

fuel_consumed_mmbtu

number

Total consumption of fuel in physical unit, year to date. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production.

fuel_consumed_units

number

Consumption of the fuel type in physical unit. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production.

fuel_mmbtu_per_unit

number

Heat content of the fuel in millions of Btus per physical unit.

fuel_type_code_agg

string

A partial aggregation of the reported fuel type codes into larger categories used by EIA in, for example, the Annual Energy Review (AER) or Monthly Energy Review (MER). Two or three letter alphanumeric.

fuel_type_code_pudl

string

Simplified fuel type code used in PUDL

net_generation_mwh

number

Net electricity generation for the specified period in megawatt-hours (MWh).

plant_id_eia

integer

The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.

plant_id_pudl

integer

A manually assigned PUDL plant ID. May not be constant over time.

plant_name_eia

string

Plant name.

prime_mover_code

string

Code for the type of prime mover (e.g. CT, CG)

report_date

date

Date reported.

utility_id_eia

integer

The EIA Utility Identification number.

utility_id_pudl

integer

A manually assigned PUDL utility ID. May not be stable over time.

utility_name_eia

string

The name of the utility.

out_eia923__monthly_boiler_fuel#

Noticed any irregularities with this table? File a report.

Monthly time series of boiler fuel consumption and emissions.

Most-recent data:

2025

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

EIA Form 923 – Power Plant Operations Report (Schedule 3)

Primary key:

plant_id_eia, boiler_id, energy_source_code, prime_mover_code, report_date

Usage Warnings#

  • Date column arbitrarily uses the first of the month.

  • A small number of respondents only report annual fuel consumption, and all of it is reported in December.

  • Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.

Additional Details#

Reports the quantity of each type of fuel consumed by each boiler on a monthly basis, as well as the sulfur and ash content of those fuels. Fuel quantity is reported in standard EIA fuel units (tons, barrels, Mcf). Heat content per unit of fuel is also reported, making this table useful for calculating the thermal efficiency (heat rate) of various generation units.

This table provides better coverage of the entire fleet of generators than the core_eia923__monthly_generation_fuel table, but the fuel consumption reported here is not directly associated with a generator. This complicates the heat rate calculation, since the associations between individual boilers and generators are incomplete and can be complex.

This table inherits canonicalized values for plants and utilities. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__monthly_boiler_fuel.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__monthly_boiler_fuel.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia923__monthly_boiler_fuel.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__monthly_boiler_fuel.parquet"
)

Columns#

Field Name

Type

Description

ash_content_pct

number

Ash content percentage by weight to the nearest 0.1 percent.

boiler_id

string

Alphanumeric boiler ID.

data_maturity

string

Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.

energy_source_code

string

A 2-3 letter code indicating the energy source (e.g. fuel type) associated with the record.

fuel_consumed_mmbtu

number

Total consumption of fuel in physical unit, year to date. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production.

fuel_consumed_units

number

Consumption of the fuel type in physical unit. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production.

fuel_mmbtu_per_unit

number

Heat content of the fuel in millions of Btus per physical unit.

fuel_type_code_pudl

string

Simplified fuel type code used in PUDL

plant_id_eia

integer

The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.

plant_id_pudl

integer

A manually assigned PUDL plant ID. May not be constant over time.

plant_name_eia

string

Plant name.

prime_mover_code

string

Code for the type of prime mover (e.g. CT, CG)

report_date

date

Date reported.

sulfur_content_pct

number

Sulfur content percentage by weight to the nearest 0.01 percent.

unit_id_pudl

integer

Dynamically assigned PUDL unit id. WARNING: This ID is not guaranteed to be static long term as the input data and algorithm may evolve over time.

utility_id_eia

integer

The EIA Utility Identification number.

utility_id_pudl

integer

A manually assigned PUDL utility ID. May not be stable over time.

utility_name_eia

string

The name of the utility.

out_eia923__monthly_fuel_receipts_costs#

Noticed any irregularities with this table? File a report.

Monthly time series of fuel deliveries to power plants aggregated by plant, fuel type, and time period.

Most-recent data:

2025

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

EIA Form 923 – Power Plant Operations Report (Schedule 2 - Part A)

Primary key:

plant_id_eia, fuel_type_code_pudl, report_date

Usage Warnings#

  • Date column arbitrarily uses the first of the month.

  • Contains estimated values.

  • Some values have been redacted.

  • Time of fuel deliveries is not necessarily connected with time of fuel consumption.

  • Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.

Additional Details#

This table is an aggregation of the more detailed data in the core_eia923__fuel_receipts_costs table. It provides a tidy timeseries of deliveries by fuel type for each plant. However, not all values in the original table can be aggregated meaningfully, so this table contains only a subset of the source table columns – primarily numerical values and a handful of categorical variables, plus additional attributes that are constant within each plant-fuel-time period grouping and associated with the plant or utility. When aggregating numerical values any sum that contains an NA value is treated as NA.

This table inherits canonicalized values for plants and utilities. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__monthly_fuel_receipts_costs.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__monthly_fuel_receipts_costs.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia923__monthly_fuel_receipts_costs.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__monthly_fuel_receipts_costs.parquet"
)

Columns#

Field Name

Type

Description

ash_content_pct

number

Ash content percentage by weight to the nearest 0.1 percent.

chlorine_content_ppm

number

For coal only: the chlorine content in parts per million (ppm) to the nearest 0.001 ppm. If lab tests of the coal do not include the chlorine content, this field contains the amount specified in the contract with the supplier.

data_maturity

string

Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.

fuel_consumed_mmbtu

number

Total consumption of fuel in physical unit, year to date. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production.

fuel_cost_per_mmbtu

number

Average fuel cost per MMBTU of heat content in nominal USD.

fuel_cost_per_mmbtu_source

string

Indicates the source of the values in the fuel_cost_per_mmbtu column. The fuel cost either comes directly from the EIA forms (original), was filled in from the EIA’s API using state-level averages (eiaapi), was filled in using a rolling average (rolling_avg) or When the records get aggregated together and contain multiple sources (mixed).

fuel_mmbtu_per_unit

number

Heat content of the fuel in millions of Btus per physical unit.

fuel_received_units

number

Quantity of fuel received in tons, barrel, or Mcf.

fuel_type_code_pudl

string

Simplified fuel type code used in PUDL

mercury_content_ppm

number

Mercury content in parts per million (ppm) to the nearest 0.001 ppm.

moisture_content_pct

number

For coal only: the moisture content of the fuel in terms of moisture percentage by weight. Reported to the nearest 0.01 percent.

plant_id_eia

integer

The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.

plant_id_pudl

integer

A manually assigned PUDL plant ID. May not be constant over time.

plant_name_eia

string

Plant name.

report_date

date

Date reported.

state

string

Two letter US state abbreviation.

sulfur_content_pct

number

Sulfur content percentage by weight to the nearest 0.01 percent.

total_fuel_cost

number

Total annual reported fuel costs for the plant part. Includes costs from all fuels.

utility_id_eia

integer

The EIA Utility Identification number.

utility_id_pudl

integer

A manually assigned PUDL utility ID. May not be stable over time.

utility_name_eia

string

The name of the utility.

out_eia923__monthly_generation#

Noticed any irregularities with this table? File a report.

Monthly time series of net electricity generated by power plant generators.

Most-recent data:

2025

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

EIA Form 923 – Power Plant Operations Report (Schedule 3)

Primary key:

plant_id_eia, generator_id, report_date

Usage Warnings#

  • Date column arbitrarily uses the first of the month.

  • Not all IDs are present.

  • Table has known low coverage - either geographic or temporal or otherwise.

  • A small number of respondents only report annual fuel consumption, and all of it is reported in December.

  • Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.

Additional Details#

This is the most granular information we have about how much electricity individual generators are producing, but only about half of all the generation reported in the core_eia923__monthly_generation_fuel appears in this table due to the different reporting requirements imposed on different types and sizes of generators.

Whenever possible, we use this generator-level net generation to estimate the heat rates of generation units and the marginal cost of electricity on a per-generator basis, but those calculations depend on plant-level fuel costs and sometimes uncertain or incomplete boiler-generator associations.

This table inherits canonicalized values for plants and utilities. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__monthly_generation.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__monthly_generation.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia923__monthly_generation.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__monthly_generation.parquet"
)

Columns#

Field Name

Type

Description

data_maturity

string

Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.

generator_id

string

Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string!

net_generation_mwh

number

Net electricity generation for the specified period in megawatt-hours (MWh).

plant_id_eia

integer

The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.

plant_id_pudl

integer

A manually assigned PUDL plant ID. May not be constant over time.

plant_name_eia

string

Plant name.

report_date

date

Date reported.

unit_id_pudl

integer

Dynamically assigned PUDL unit id. WARNING: This ID is not guaranteed to be static long term as the input data and algorithm may evolve over time.

utility_id_eia

integer

The EIA Utility Identification number.

utility_id_pudl

integer

A manually assigned PUDL utility ID. May not be stable over time.

utility_name_eia

string

The name of the utility.

out_eia923__monthly_generation_fuel_by_generator#

Noticed any irregularities with this table? File a report.

Monthly time series of of estimated net generation and fuel consumption by generator.

Most-recent data:

2024

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

EIA Form 923 – Power Plant Operations Report (Schedule 3)

Primary key:

report_date, plant_id_eia, generator_id

Usage Warnings#

  • Contains estimated values.

  • This downscaling process used to create this table does not distinguish between primary and secondary energy_sources for generators (see below for implications).

  • Date column arbitrarily uses the first of the month.

  • A small number of respondents only report annual fuel consumption, and all of it is reported in December.

  • Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.

  • Contains information from multiple raw inputs.

Additional Details#

Based on allocating net electricity generation and fuel consumption reported in the EIA-923 generation and generation_fuel tables to individual generators.

The net generation and fuel consumption allocation method PUDL employs begins with the following context of the originally reported EIA-860 and EIA-923 data:

In this table, PUDL aggregates the net generation and fuel consumption that has been allocated to the generator_id/energy_source_code/prime_mover_code level in the:ref:out_eia923__monthly_generation_fuel_by_generator_energy_source to the generator level.This process does not distinguish between primary and secondary energy_sources for generators. Net generation is allocated equally between energy source codes, so if a plant has multiple generators with the same prime_mover_code but different energy source codes the core_eia923__monthly_generation_fuel records will be associated similarly between these two generators. Allocated net generation will still be proportional to each generator’s net generation or capacity.

This table inherits canonicalized values for generators, plants, and utilities. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__monthly_generation_fuel_by_generator.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__monthly_generation_fuel_by_generator.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia923__monthly_generation_fuel_by_generator.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__monthly_generation_fuel_by_generator.parquet"
)

Columns#

Field Name

Type

Description

fuel_consumed_for_electricity_mmbtu

number

Total consumption of fuel to produce electricity, in physical unit, year to date.

fuel_consumed_mmbtu

number

Total consumption of fuel in physical unit, year to date. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production.

generator_id

string

Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string!

net_generation_mwh

number

Net electricity generation for the specified period in megawatt-hours (MWh).

plant_id_eia

integer

The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.

plant_id_pudl

integer

A manually assigned PUDL plant ID. May not be constant over time.

plant_name_eia

string

Plant name.

report_date

date

Date reported.

unit_id_pudl

integer

Dynamically assigned PUDL unit id. WARNING: This ID is not guaranteed to be static long term as the input data and algorithm may evolve over time.

utility_id_eia

integer

The EIA Utility Identification number.

utility_id_pudl

integer

A manually assigned PUDL utility ID. May not be stable over time.

utility_name_eia

string

The name of the utility.

out_eia923__monthly_generation_fuel_by_generator_energy_source#

Noticed any irregularities with this table? File a report.

Monthly time series of of estimated net generation and fuel consumption associated with each combination of generator, energy source, and prime mover.

Most-recent data:

2024

Processing:

Data is ready for use in analyses, but for practical reasons has not been denormalized and remains in narrow format.

Source:

EIA Form 923 – Power Plant Operations Report (Schedule 3)

Primary key:

report_date, plant_id_eia, generator_id, prime_mover_code, energy_source_code

Usage Warnings#

  • Contains estimated values.

  • This downscaling process used to create this table does not distinguish between primary and secondary energy_sources for generators (see below for implications).

  • Date column arbitrarily uses the first of the month.

  • A small number of respondents only report annual fuel consumption, and all of it is reported in December.

  • Contains information from multiple raw inputs.

Additional Details#

The net generation and fuel consumption allocation method PUDL employs begins with the following context of the originally reported EIA-860 and EIA-923 data:

In this table, PUDL has allocated the net electricity generation and fuel consumption from core_eia923__monthly_generation_fuel to the generator_id/energy_source_code/prime_mover_code level.

The allocation process entails generating a fraction for each record based on the net generation in the core_eia923__monthly_generation table and the capacity from the core_eia860__scd_generators table. This process does not distinguish between primary and secondary energy_sources for generators. Net generation is allocated equally between energy source codes, so if a plant has multiple generators with the same prime_mover_code but different energy source codes the core_eia923__monthly_generation_fuel records will be associated similarly between these two generators. Allocated net generation will still be proportional to each generator’s net generation or capacity.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__monthly_generation_fuel_by_generator_energy_source.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__monthly_generation_fuel_by_generator_energy_source.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia923__monthly_generation_fuel_by_generator_energy_source.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__monthly_generation_fuel_by_generator_energy_source.parquet"
)

Columns#

Field Name

Type

Description

energy_source_code

string

A 2-3 letter code indicating the energy source (e.g. fuel type) associated with the record.

energy_source_code_num

string

Name of the energy_source_code_N column that this energy source code was reported in for the generator referenced in the same record.

fuel_consumed_for_electricity_mmbtu

number

Total consumption of fuel to produce electricity, in physical unit, year to date.

fuel_consumed_mmbtu

number

Total consumption of fuel in physical unit, year to date. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production.

generator_id

string

Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string!

net_generation_mwh

number

Net electricity generation for the specified period in megawatt-hours (MWh).

plant_id_eia

integer

The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.

prime_mover_code

string

Code for the type of prime mover (e.g. CT, CG)

report_date

date

Date reported.

out_eia923__monthly_generation_fuel_combined#

Noticed any irregularities with this table? File a report.

Monthly time series of generation and fuel consumption of all generation units.

Most-recent data:

2025

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

EIA Form 923 – Power Plant Operations Report (Schedule 3)

Primary key:

plant_id_eia, report_date, prime_mover_code, energy_source_code

Usage Warnings#

  • Date column arbitrarily uses the first of the month.

  • A small number of respondents only report annual fuel consumption, and all of it is reported in December.

  • Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.

Additional Details#

Denormalized, combined data from the core_eia923__monthly_generation_fuel and core_eia923__monthly_generation_fuel_nuclear with nuclear generation aggregated from the nuclear generation unit level up to the plant prime mover level, so as to be compatible with fossil fuel generation data.

This data can’t be easily linked to individual boilers, generators, and generation units, but it is provides the most complete coverage of fuel consumption and electricity generation for the entire generation fleet. We use the primary fuels and prime movers reported for each generator along with their capacities to attribute fuel consumption and generation when it isn’t directly reported in the core_eia923__monthly_generation and core_eia923__monthly_boiler_fuel tables in order to calculate capacity factors, heat rates, and the marginal cost of electricity.

The table makes a distinction between all fuel consumed and fuel consumed for electricity generation because some units are also combined heat and power (CHP) units, and also provide high temperature process heat at the expense of net electricity generation.

This table inherits canonicalized values for plants and utilities. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__monthly_generation_fuel_combined.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__monthly_generation_fuel_combined.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia923__monthly_generation_fuel_combined.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__monthly_generation_fuel_combined.parquet"
)

Columns#

Field Name

Type

Description

data_maturity

string

Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.

energy_source_code

string

A 2-3 letter code indicating the energy source (e.g. fuel type) associated with the record.

fuel_consumed_for_electricity_mmbtu

number

Total consumption of fuel to produce electricity, in physical unit, year to date.

fuel_consumed_for_electricity_units

number

Consumption for electric generation of the fuel type in physical unit.

fuel_consumed_mmbtu

number

Total consumption of fuel in physical unit, year to date. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production.

fuel_consumed_units

number

Consumption of the fuel type in physical unit. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production.

fuel_mmbtu_per_unit

number

Heat content of the fuel in millions of Btus per physical unit.

fuel_type_code_pudl

string

Simplified fuel type code used in PUDL

net_generation_mwh

number

Net electricity generation for the specified period in megawatt-hours (MWh).

plant_id_eia

integer

The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.

plant_id_pudl

integer

A manually assigned PUDL plant ID. May not be constant over time.

plant_name_eia

string

Plant name.

prime_mover_code

string

Code for the type of prime mover (e.g. CT, CG)

report_date

date

Date reported.

utility_id_eia

integer

The EIA Utility Identification number.

utility_id_pudl

integer

A manually assigned PUDL utility ID. May not be stable over time.

utility_name_eia

string

The name of the utility.

out_eia923__yearly_boiler_fuel#

Noticed any irregularities with this table? File a report.

Annual time series of boiler fuel consumption and emissions.

Most-recent data:

2024

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

EIA Form 923 – Power Plant Operations Report (Schedule 3)

Primary key:

plant_id_eia, boiler_id, energy_source_code, prime_mover_code, report_date

Usage Warnings#

  • Date column arbitrarily uses the first of the month.

  • A small number of respondents only report annual fuel consumption, and all of it is reported in December.

  • Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.

Additional Details#

Reports the quantity of each type of fuel consumed by each boiler on a monthly basis, as well as the sulfur and ash content of those fuels. Fuel quantity is reported in standard EIA fuel units (tons, barrels, Mcf). Heat content per unit of fuel is also reported, making this table useful for calculating the thermal efficiency (heat rate) of various generation units.

This table provides better coverage of the entire fleet of generators than the core_eia923__monthly_generation_fuel table, but the fuel consumption reported here is not directly associated with a generator. This complicates the heat rate calculation, since the associations between individual boilers and generators are incomplete and can be complex.

This table inherits canonicalized values for plants and utilities. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__yearly_boiler_fuel.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__yearly_boiler_fuel.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia923__yearly_boiler_fuel.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__yearly_boiler_fuel.parquet"
)

Columns#

Field Name

Type

Description

ash_content_pct

number

Ash content percentage by weight to the nearest 0.1 percent.

boiler_id

string

Alphanumeric boiler ID.

data_maturity

string

Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.

energy_source_code

string

A 2-3 letter code indicating the energy source (e.g. fuel type) associated with the record.

fuel_consumed_mmbtu

number

Total consumption of fuel in physical unit, year to date. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production.

fuel_consumed_units

number

Consumption of the fuel type in physical unit. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production.

fuel_mmbtu_per_unit

number

Heat content of the fuel in millions of Btus per physical unit.

fuel_type_code_pudl

string

Simplified fuel type code used in PUDL

plant_id_eia

integer

The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.

plant_id_pudl

integer

A manually assigned PUDL plant ID. May not be constant over time.

plant_name_eia

string

Plant name.

prime_mover_code

string

Code for the type of prime mover (e.g. CT, CG)

report_date

date

Date reported.

sulfur_content_pct

number

Sulfur content percentage by weight to the nearest 0.01 percent.

unit_id_pudl

integer

Dynamically assigned PUDL unit id. WARNING: This ID is not guaranteed to be static long term as the input data and algorithm may evolve over time.

utility_id_eia

integer

The EIA Utility Identification number.

utility_id_pudl

integer

A manually assigned PUDL utility ID. May not be stable over time.

utility_name_eia

string

The name of the utility.

out_eia923__yearly_fuel_receipts_costs#

Noticed any irregularities with this table? File a report.

Annual time series of fuel deliveries to power plants aggregated by plant, fuel type, and time period.

Most-recent data:

2024

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

EIA Form 923 – Power Plant Operations Report (Schedule 2 - Part A)

Primary key:

plant_id_eia, fuel_type_code_pudl, report_date

Usage Warnings#

  • Date column arbitrarily uses the first of the month.

  • Contains estimated values.

  • Some values have been redacted.

  • Time of fuel deliveries is not necessarily connected with time of fuel consumption.

  • Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.

Additional Details#

This table is an aggregation of the more detailed data in the core_eia923__fuel_receipts_costs table. It provides a tidy timeseries of deliveries by fuel type for each plant. However, not all values in the original table can be aggregated meaningfully, so this table contains only a subset of the source table columns – primarily numerical values and a handful of categorical variables, plus additional attributes that are constant within each plant-fuel-time period grouping and associated with the plant or utility. When aggregating numerical values any sum that contains an NA value is treated as NA.

This table inherits canonicalized values for plants and utilities. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__yearly_fuel_receipts_costs.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__yearly_fuel_receipts_costs.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia923__yearly_fuel_receipts_costs.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__yearly_fuel_receipts_costs.parquet"
)

Columns#

Field Name

Type

Description

ash_content_pct

number

Ash content percentage by weight to the nearest 0.1 percent.

chlorine_content_ppm

number

For coal only: the chlorine content in parts per million (ppm) to the nearest 0.001 ppm. If lab tests of the coal do not include the chlorine content, this field contains the amount specified in the contract with the supplier.

data_maturity

string

Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.

fuel_consumed_mmbtu

number

Total consumption of fuel in physical unit, year to date. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production.

fuel_cost_per_mmbtu

number

Average fuel cost per MMBTU of heat content in nominal USD.

fuel_cost_per_mmbtu_source

string

Indicates the source of the values in the fuel_cost_per_mmbtu column. The fuel cost either comes directly from the EIA forms (original), was filled in from the EIA’s API using state-level averages (eiaapi), was filled in using a rolling average (rolling_avg) or When the records get aggregated together and contain multiple sources (mixed).

fuel_mmbtu_per_unit

number

Heat content of the fuel in millions of Btus per physical unit.

fuel_received_units

number

Quantity of fuel received in tons, barrel, or Mcf.

fuel_type_code_pudl

string

Simplified fuel type code used in PUDL

mercury_content_ppm

number

Mercury content in parts per million (ppm) to the nearest 0.001 ppm.

moisture_content_pct

number

For coal only: the moisture content of the fuel in terms of moisture percentage by weight. Reported to the nearest 0.01 percent.

plant_id_eia

integer

The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.

plant_id_pudl

integer

A manually assigned PUDL plant ID. May not be constant over time.

plant_name_eia

string

Plant name.

report_date

date

Date reported.

state

string

Two letter US state abbreviation.

sulfur_content_pct

number

Sulfur content percentage by weight to the nearest 0.01 percent.

total_fuel_cost

number

Total annual reported fuel costs for the plant part. Includes costs from all fuels.

utility_id_eia

integer

The EIA Utility Identification number.

utility_id_pudl

integer

A manually assigned PUDL utility ID. May not be stable over time.

utility_name_eia

string

The name of the utility.

out_eia923__yearly_generation#

Noticed any irregularities with this table? File a report.

Annual time series of net electricity generated by power plant generators.

Most-recent data:

2024

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

EIA Form 923 – Power Plant Operations Report (Schedule 3)

Primary key:

plant_id_eia, generator_id, report_date

Usage Warnings#

  • Date column arbitrarily uses the first of the month.

  • Not all IDs are present.

  • Table has known low coverage - either geographic or temporal or otherwise.

  • A small number of respondents only report annual fuel consumption, and all of it is reported in December.

  • Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.

Additional Details#

This is the most granular information we have about how much electricity individual generators are producing, but only about half of all the generation reported in the core_eia923__monthly_generation_fuel appears in this table due to the different reporting requirements imposed on different types and sizes of generators.

Whenever possible, we use this generator-level net generation to estimate the heat rates of generation units and the marginal cost of electricity on a per-generator basis, but those calculations depend on plant-level fuel costs and sometimes uncertain or incomplete boiler-generator associations.

This table inherits canonicalized values for plants and utilities. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__yearly_generation.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__yearly_generation.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia923__yearly_generation.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__yearly_generation.parquet"
)

Columns#

Field Name

Type

Description

data_maturity

string

Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.

generator_id

string

Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string!

net_generation_mwh

number

Net electricity generation for the specified period in megawatt-hours (MWh).

plant_id_eia

integer

The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.

plant_id_pudl

integer

A manually assigned PUDL plant ID. May not be constant over time.

plant_name_eia

string

Plant name.

report_date

date

Date reported.

unit_id_pudl

integer

Dynamically assigned PUDL unit id. WARNING: This ID is not guaranteed to be static long term as the input data and algorithm may evolve over time.

utility_id_eia

integer

The EIA Utility Identification number.

utility_id_pudl

integer

A manually assigned PUDL utility ID. May not be stable over time.

utility_name_eia

string

The name of the utility.

out_eia923__yearly_generation_fuel_by_generator#

Noticed any irregularities with this table? File a report.

Annual time series of of estimated net generation and fuel consumption by generator.

Most-recent data:

2024

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

EIA Form 923 – Power Plant Operations Report (Schedule 3)

Primary key:

report_date, plant_id_eia, generator_id

Usage Warnings#

  • Contains estimated values.

  • This downscaling process used to create this table does not distinguish between primary and secondary energy_sources for generators (see below for implications).

  • Date column arbitrarily uses the first of the month.

  • A small number of respondents only report annual fuel consumption, and all of it is reported in December.

  • Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.

  • Contains information from multiple raw inputs.

Additional Details#

Based on allocating net electricity generation and fuel consumption reported in the EIA-923 generation and generation_fuel tables to individual generators.

The net generation and fuel consumption allocation method PUDL employs begins with the following context of the originally reported EIA-860 and EIA-923 data:

In this table, PUDL aggregates the net generation and fuel consumption that has been allocated to the generator_id/energy_source_code/prime_mover_code level in the:ref:out_eia923__yearly_generation_fuel_by_generator_energy_source to the generator level.This process does not distinguish between primary and secondary energy_sources for generators. Net generation is allocated equally between energy source codes, so if a plant has multiple generators with the same prime_mover_code but different energy source codes the core_eia923__monthly_generation_fuel records will be associated similarly between these two generators. Allocated net generation will still be proportional to each generator’s net generation or capacity.

This table inherits canonicalized values for generators, plants, and utilities. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__yearly_generation_fuel_by_generator.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__yearly_generation_fuel_by_generator.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia923__yearly_generation_fuel_by_generator.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__yearly_generation_fuel_by_generator.parquet"
)

Columns#

Field Name

Type

Description

fuel_consumed_for_electricity_mmbtu

number

Total consumption of fuel to produce electricity, in physical unit, year to date.

fuel_consumed_mmbtu

number

Total consumption of fuel in physical unit, year to date. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production.

generator_id

string

Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string!

net_generation_mwh

number

Net electricity generation for the specified period in megawatt-hours (MWh).

plant_id_eia

integer

The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.

plant_id_pudl

integer

A manually assigned PUDL plant ID. May not be constant over time.

plant_name_eia

string

Plant name.

report_date

date

Date reported.

unit_id_pudl

integer

Dynamically assigned PUDL unit id. WARNING: This ID is not guaranteed to be static long term as the input data and algorithm may evolve over time.

utility_id_eia

integer

The EIA Utility Identification number.

utility_id_pudl

integer

A manually assigned PUDL utility ID. May not be stable over time.

utility_name_eia

string

The name of the utility.

out_eia923__yearly_generation_fuel_by_generator_energy_source#

Noticed any irregularities with this table? File a report.

Annual time series of of estimated net generation and fuel consumption associated with each combination of generator, energy source, and prime mover.

Most-recent data:

2024

Processing:

Data is ready for use in analyses, but for practical reasons has not been denormalized and remains in narrow format.

Source:

EIA Form 923 – Power Plant Operations Report (Schedule 3)

Primary key:

report_date, plant_id_eia, generator_id, prime_mover_code, energy_source_code

Usage Warnings#

  • Contains estimated values.

  • This downscaling process used to create this table does not distinguish between primary and secondary energy_sources for generators (see below for implications).

  • Date column arbitrarily uses the first of the month.

  • A small number of respondents only report annual fuel consumption, and all of it is reported in December.

  • Contains information from multiple raw inputs.

Additional Details#

The net generation and fuel consumption allocation method PUDL employs begins with the following context of the originally reported EIA-860 and EIA-923 data:

In this table, PUDL has allocated the net electricity generation and fuel consumption from core_eia923__monthly_generation_fuel to the generator_id/energy_source_code/prime_mover_code level.

The allocation process entails generating a fraction for each record based on the net generation in the core_eia923__monthly_generation table and the capacity from the core_eia860__scd_generators table. This process does not distinguish between primary and secondary energy_sources for generators. Net generation is allocated equally between energy source codes, so if a plant has multiple generators with the same prime_mover_code but different energy source codes the core_eia923__monthly_generation_fuel records will be associated similarly between these two generators. Allocated net generation will still be proportional to each generator’s net generation or capacity.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__yearly_generation_fuel_by_generator_energy_source.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__yearly_generation_fuel_by_generator_energy_source.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia923__yearly_generation_fuel_by_generator_energy_source.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__yearly_generation_fuel_by_generator_energy_source.parquet"
)

Columns#

Field Name

Type

Description

energy_source_code

string

A 2-3 letter code indicating the energy source (e.g. fuel type) associated with the record.

energy_source_code_num

string

Name of the energy_source_code_N column that this energy source code was reported in for the generator referenced in the same record.

fuel_consumed_for_electricity_mmbtu

number

Total consumption of fuel to produce electricity, in physical unit, year to date.

fuel_consumed_mmbtu

number

Total consumption of fuel in physical unit, year to date. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production.

generator_id

string

Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string!

net_generation_mwh

number

Net electricity generation for the specified period in megawatt-hours (MWh).

plant_id_eia

integer

The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.

prime_mover_code

string

Code for the type of prime mover (e.g. CT, CG)

report_date

date

Date reported.

out_eia923__yearly_generation_fuel_by_generator_energy_source_owner#

Noticed any irregularities with this table? File a report.

Annual time series of of estimated net generation and fuel consumption for each generator, associated with each combination of generator, energy source, prime mover, and owner.

Most-recent data:

2024

Processing:

Data is ready for use in analyses, but for practical reasons has not been denormalized and remains in narrow format.

Source:

EIA Form 923 – Power Plant Operations Report (Schedule 3)

Primary key:

report_date, plant_id_eia, generator_id, prime_mover_code, energy_source_code, utility_id_eia, ownership_record_type

Usage Warnings#

  • Contains estimated values.

  • This downscaling process used to create this table does not distinguish between primary and secondary energy_sources for generators (see below for implications).

  • Date column arbitrarily uses the first of the month.

  • Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.

  • Contains information from multiple raw inputs.

Additional Details#

The net generation and fuel consumption allocation method PUDL employs begins with the following context of the originally reported EIA-860 and EIA-923 data:

In this table, PUDL has scaled the net electricity generation and fuel consumption from core_eia923__monthly_generation_fuel by ownership from out_eia860__yearly_ownership. Note that the utility_id_eia in this table refers to the OWNER of the generator, not the operator.

This process does not distinguish between primary and secondary energy_sources for generators. Net generation is allocated equally between energy source codes, so if a plant has multiple generators with the same prime_mover_code but different energy source codes the core_eia923__monthly_generation_fuel records will be associated similarly between these two generators. Allocated net generation will still be proportional to each generator’s net generation or capacity.

This table inherits canonicalized values for generators and plants. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__yearly_generation_fuel_by_generator_energy_source_owner.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__yearly_generation_fuel_by_generator_energy_source_owner.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia923__yearly_generation_fuel_by_generator_energy_source_owner.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__yearly_generation_fuel_by_generator_energy_source_owner.parquet"
)

Columns#

Field Name

Type

Description

capacity_mw

number

Total installed (nameplate) capacity, in megawatts.

energy_source_code

string

A 2-3 letter code indicating the energy source (e.g. fuel type) associated with the record.

energy_source_code_num

string

Name of the energy_source_code_N column that this energy source code was reported in for the generator referenced in the same record.

fraction_owned

number

Proportion of generator ownership attributable to this utility.

fuel_consumed_for_electricity_mmbtu

number

Total consumption of fuel to produce electricity, in physical unit, year to date.

fuel_consumed_mmbtu

number

Total consumption of fuel in physical unit, year to date. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production.

generator_id

string

Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string!

net_generation_mwh

number

Net electricity generation for the specified period in megawatt-hours (MWh).

ownership_record_type

string

Whether each generator record is for one owner or represents a total of all ownerships.

plant_id_eia

integer

The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.

prime_mover_code

string

Code for the type of prime mover (e.g. CT, CG)

report_date

date

Date reported.

utility_id_eia

integer

The EIA Utility Identification number.

out_eia923__yearly_generation_fuel_combined#

Noticed any irregularities with this table? File a report.

Annual time series of generation and fuel consumption of all generation units.

Most-recent data:

2024

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

EIA Form 923 – Power Plant Operations Report (Schedule 3)

Primary key:

plant_id_eia, report_date, prime_mover_code, energy_source_code

Usage Warnings#

  • Date column arbitrarily uses the first of the month.

  • A small number of respondents only report annual fuel consumption, and all of it is reported in December.

  • Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.

Additional Details#

Denormalized, combined data from the core_eia923__monthly_generation_fuel and core_eia923__monthly_generation_fuel_nuclear with nuclear generation aggregated from the nuclear generation unit level up to the plant prime mover level, so as to be compatible with fossil fuel generation data.

This data can’t be easily linked to individual boilers, generators, and generation units, but it is provides the most complete coverage of fuel consumption and electricity generation for the entire generation fleet. We use the primary fuels and prime movers reported for each generator along with their capacities to attribute fuel consumption and generation when it isn’t directly reported in the core_eia923__monthly_generation and core_eia923__monthly_boiler_fuel tables in order to calculate capacity factors, heat rates, and the marginal cost of electricity.

The table makes a distinction between all fuel consumed and fuel consumed for electricity generation because some units are also combined heat and power (CHP) units, and also provide high temperature process heat at the expense of net electricity generation.

This table inherits canonicalized values for plants and utilities. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__yearly_generation_fuel_combined.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__yearly_generation_fuel_combined.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia923__yearly_generation_fuel_combined.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia923__yearly_generation_fuel_combined.parquet"
)

Columns#

Field Name

Type

Description

data_maturity

string

Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.

energy_source_code

string

A 2-3 letter code indicating the energy source (e.g. fuel type) associated with the record.

fuel_consumed_for_electricity_mmbtu

number

Total consumption of fuel to produce electricity, in physical unit, year to date.

fuel_consumed_for_electricity_units

number

Consumption for electric generation of the fuel type in physical unit.

fuel_consumed_mmbtu

number

Total consumption of fuel in physical unit, year to date. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production.

fuel_consumed_units

number

Consumption of the fuel type in physical unit. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production.

fuel_mmbtu_per_unit

number

Heat content of the fuel in millions of Btus per physical unit.

fuel_type_code_pudl

string

Simplified fuel type code used in PUDL

net_generation_mwh

number

Net electricity generation for the specified period in megawatt-hours (MWh).

plant_id_eia

integer

The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.

plant_id_pudl

integer

A manually assigned PUDL plant ID. May not be constant over time.

plant_name_eia

string

Plant name.

prime_mover_code

string

Code for the type of prime mover (e.g. CT, CG)

report_date

date

Date reported.

utility_id_eia

integer

The EIA Utility Identification number.

utility_id_pudl

integer

A manually assigned PUDL utility ID. May not be stable over time.

utility_name_eia

string

The name of the utility.

out_eia930__hourly_aggregated_demand#

Noticed any irregularities with this table? File a report.

Hourly time series of aggregated balancing authority demand by EIA region, interconnect, and continental US.

Most-recent data:

2026

Processing:

Data is ready for use in analyses, but for practical reasons has not been denormalized and remains in narrow format.

Source:

EIA Form 930 – Hourly and Daily Balancing Authority Operations Report

Primary key:

datetime_utc, aggregation_level, aggregation_group

Usage Warnings#

  • Some columns contain subtotals; use caution when choosing columns to aggregate.

  • This table is experimental and/or a work in progress and may change in the future.

Additional Details#

The spatial granularity of each record is indicated by aggregation_level.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia930__hourly_aggregated_demand.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia930__hourly_aggregated_demand.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia930__hourly_aggregated_demand.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia930__hourly_aggregated_demand.parquet"
)

Columns#

Field Name

Type

Description

aggregation_group

string

Label identifying a group of balancing authorities to be used in aggregating demand E.g. a region of the US or a whole interconnect.

aggregation_level

string

Indicates the spacial granularity of aggregated value.

datetime_utc

datetime

Date and time converted to Coordinated Universal Time (UTC).

demand_imputed_pudl_mwh

number

Electricity demand calculated by subtracting BA interchange from net generation, with outliers and missing values imputed in PUDL.

out_eia930__hourly_operations#

Noticed any irregularities with this table? File a report.

Hourly time series of balancing authority net generation, interchange, and demand with imputed demand.

Most-recent data:

2026

Processing:

Data is ready for use in analyses, but for practical reasons has not been denormalized and remains in narrow format.

Source:

EIA Form 930 – Hourly and Daily Balancing Authority Operations Report

Primary key:

datetime_utc, balancing_authority_code_eia

Usage Warnings#

  • Contains rows where missing values were imputed.

  • This table is experimental and/or a work in progress and may change in the future.

Additional Details#

This table is based on core_eia930__hourly_operations, but adds imputed demand where the original data was missing or anomalous. Codes explaining why values have been imputed can be found in the core_pudl__codes_imputation_reasons table.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia930__hourly_operations.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia930__hourly_operations.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia930__hourly_operations.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia930__hourly_operations.parquet"
)

Columns#

Field Name

Type

Description

balancing_authority_code_eia

string

EIA short code identifying a balancing authority. May include Canadian and Mexican BAs.

datetime_utc

datetime

Date and time converted to Coordinated Universal Time (UTC).

demand_adjusted_mwh

number

Electricity demand adjusted by EIA to reflect non-physical commercial transfers through pseudo-ties and dynamic scheduling.

demand_forecast_mwh

number

Day ahead demand forecast.

demand_imputed_eia_mwh

number

Electricity demand calculated by subtracting BA interchange from net generation, with outliers and missing values imputed by EIA.

demand_imputed_pudl_mwh

number

Electricity demand calculated by subtracting BA interchange from net generation, with outliers and missing values imputed in PUDL.

demand_imputed_pudl_mwh_imputation_code

string

Code describing why a demand value was flagged for imputation.

demand_reported_mwh

number

Originally reported electricity demand, calculated by taking the net generation within the BA and subtracting the interchange with adjacent BAs.

interchange_adjusted_mwh

number

Energy interchange between adjacent balancing authorities, adjusted by EIA to reflect non-physical commercial transfers through pseudo-ties and dynamic scheduling.

interchange_imputed_eia_mwh

number

Energy interchange between adjacent balancing authorities, with outliers and missing values imputed by EIA.

interchange_reported_mwh

number

Original reported energy interchange between adjacent balancing authorities.

net_generation_adjusted_mwh

number

Reported net generation adjusted by EIA to reflect non-physical commercial transfers through pseudo-ties and dynamic scheduling.

net_generation_imputed_eia_mwh

number

Reported net generation with outlying values removed and missing values imputed by EIA.

net_generation_reported_mwh

number

Unaltered originally reported net generation for the specified period.

out_eia930__hourly_subregion_demand#

Noticed any irregularities with this table? File a report.

Hourly time series of balancing authority subregion demand with imputed demand.

Most-recent data:

2026

Processing:

Data is ready for use in analyses, but for practical reasons has not been denormalized and remains in narrow format.

Source:

EIA Form 930 – Hourly and Daily Balancing Authority Operations Report

Primary key:

datetime_utc, balancing_authority_code_eia, balancing_authority_subregion_code_eia

Usage Warnings#

  • Contains rows where missing values were imputed.

  • This table is experimental and/or a work in progress and may change in the future.

Additional Details#

This table is based on core_eia930__hourly_subregion_demand, but adds imputed demand where the original data was missing or anomalous. Codes explaining why values have been imputed can be found in the core_pudl__codes_imputation_reasons table.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia930__hourly_subregion_demand.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia930__hourly_subregion_demand.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia930__hourly_subregion_demand.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia930__hourly_subregion_demand.parquet"
)

Columns#

Field Name

Type

Description

balancing_authority_code_eia

string

EIA short code identifying a balancing authority. May include Canadian and Mexican BAs.

balancing_authority_subregion_code_eia

string

Code identifying subregions of larger balancing authorities.

datetime_utc

datetime

Date and time converted to Coordinated Universal Time (UTC).

demand_imputed_pudl_mwh

number

Electricity demand calculated by subtracting BA interchange from net generation, with outliers and missing values imputed in PUDL.

demand_imputed_pudl_mwh_imputation_code

string

Code describing why a demand value was flagged for imputation.

demand_reported_mwh

number

Originally reported electricity demand, calculated by taking the net generation within the BA and subtracting the interchange with adjacent BAs.

out_eia__monthly_generators#

Noticed any irregularities with this table? File a report.

Monthly time series of all generator attributes including calculated capacity factor, heat rate, fuel cost per MMBTU and fuel cost per MWh.

Most-recent data:

2026

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

EIA – Mix of multiple EIA Forms

Primary key:

report_date, plant_id_eia, generator_id

Usage Warnings#

  • Contains estimated values.

  • Due to coverage problems in other tables, some generators have heat rate estimates but not fuel cost estimates.

  • Not all columns are originally reported in or calculable from the input tables. Expect nulls.

  • Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.

  • Contains information from multiple raw inputs.

Additional Details#

This table includes all monthly attributes for all generators reported to EIA-860 and EIA-923.

To provide a complete picture of generator data, this table compiles data from many different EIA tables, including directly reported, imputed and calculated columns. We suggest using this table if you want to explore monthly attributes about generators and would rather use a more complete and denormalized table. If you are more interested in the originally reported values, we recommend searching for core_eia with the column you are most interested in.

The calculations of capacity factor, heat rate, fuel cost per MMBTU and fuel cost per MWh are based on the allocation of net generation reported on the basis of plant, prime mover and energy source to individual generators - which comes from this table: out_eia923__monthly_generation_fuel_by_generator.

Heat rates by generator-monthly are estimated by using allocated estimates for per-generator net generation and fuel consumption as well as the core_eia923__monthly_boiler_fuel table, which reports fuel consumed by boiler. Heat rates are necessary to estimate the amount of fuel consumed by a generation unit, and thus the fuel cost per MWh generated.

Plant specific fuel prices are taken from the core_eia923__fuel_receipts_costs table, which only has ~70% coverage, leading to some generators with heat rate estimates still lacking fuel cost estimates.

This table inherits canonicalized values for generators, plants, and utilities. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia__monthly_generators.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia__monthly_generators.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia__monthly_generators.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia__monthly_generators.parquet"
)

Columns#

Field Name

Type

Description

associated_combined_heat_power

boolean

Indicates whether the generator is associated with a combined heat and power system

balancing_authority_code_eia

string

EIA short code identifying a balancing authority. May include Canadian and Mexican BAs.

balancing_authority_name_eia

string

Name of the balancing authority.

bga_source

string

The source from where the unit_id_pudl is compiled. The unit_id_pudl comes directly from EIA 860, or string association (which looks at all the boilers and generators that are not associated with a unit and tries to find a matching string in the respective collection of boilers or generator), or from a unit connection (where the unit_id_eia is employed to find additional boiler generator connections).

bypass_heat_recovery

boolean

Can this generator operate while bypassing the heat recovery steam generator?

can_burn_multiple_fuels

boolean

Whether the generator can burn multiple fuels.

can_cofire_fuels

boolean

Whether the generator can co-fire fuels.

can_switch_oil_gas

boolean

Whether the generator can switch between oil and natural gas.

can_switch_when_operating

boolean

Indicates whether a fuel switching generator can switch fuels while operating.

capacity_factor

number

Fraction of potential generation that was actually reported for a plant part. Energy generated over time period / nameplate capacity * time period (hours/years/etc.).

capacity_mw

number

Total installed (nameplate) capacity, in megawatts.

carbon_capture

boolean

Indicates whether the generator uses carbon capture technology.

city

string

Name of the city.

county

string

County name.

current_planned_generator_operating_date

date

The most recently updated effective date on which the generator is scheduled to start operation

data_maturity

string

Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.

deliver_power_transgrid

boolean

Indicate whether the generator can deliver power to the transmission grid.

distributed_generation

boolean

Whether the generator is considered distributed generation

duct_burners

boolean

Indicates whether the unit has duct-burners for supplementary firing of the turbine exhaust gas

energy_source_1_transport_1

string

Primary mode of transport for energy source 1.

energy_source_1_transport_2

string

Secondary mode of transport for energy source 1.

energy_source_1_transport_3

string

Tertiary mode of transport for energy source 1.

energy_source_2_transport_1

string

Primary mode of transport for energy source 2.

energy_source_2_transport_2

string

Secondary mode of transport for energy source 2.

energy_source_2_transport_3

string

Tertiary mode of transport for energy source 2.

energy_source_code_1

string

The code representing the most predominant type of energy that fuels the generator.

energy_source_code_2

string

The code representing the second most predominant type of energy that fuels the generator

energy_source_code_3

string

The code representing the third most predominant type of energy that fuels the generator

energy_source_code_4

string

The code representing the fourth most predominant type of energy that fuels the generator

energy_source_code_5

string

The code representing the fifth most predominant type of energy that fuels the generator

energy_source_code_6

string

The code representing the sixth most predominant type of energy that fuels the generator

energy_storage_capacity_mwh

number

Energy storage capacity in MWh (e.g. for batteries).

ferc_qualifying_facility

boolean

Indicates whether or not a generator is a qualifying FERC cogeneration facility.

fluidized_bed_tech

boolean

Indicates whether the generator uses fluidized bed technology

fuel_cost_per_mmbtu

number

Average fuel cost per MMBTU of heat content in nominal USD.

fuel_cost_per_mmbtu_source

string

Indicates the source of the values in the fuel_cost_per_mmbtu column. The fuel cost either comes directly from the EIA forms (original), was filled in from the EIA’s API using state-level averages (eiaapi), was filled in using a rolling average (rolling_avg) or When the records get aggregated together and contain multiple sources (mixed).

fuel_cost_per_mwh

number

Derived from MCOE, a unit level value. Average fuel cost per MWh of heat content in nominal USD.

fuel_type_code_pudl

string

Simplified fuel type code used in PUDL

fuel_type_count

integer

A count of how many different simple energy sources there are associated with a generator.

generator_id

string

Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string!

generator_operating_date

date

Date the generator began commercial operation. If harvested values are inconsistent, we default to using the most recently reported date.

generator_retirement_date

date

Date of the scheduled or effected retirement of the generator.

latitude

number

Latitude of the plant’s location, in degrees.

longitude

number

Longitude of the plant’s location, in degrees.

minimum_load_mw

number

The minimum load at which the generator can operate at continuosuly.

nameplate_power_factor

number

The nameplate power factor of the generator.

net_capacity_mwdc

number

Generation capacity in megawatts of direct current that is subject to a net metering agreement. Typically used for behind-the-meter solar PV.

net_generation_mwh

number

Net electricity generation for the specified period in megawatt-hours (MWh).

operational_status

string

The operating status of the asset. For generators this is based on which tab the generator was listed in in EIA 860.

operational_status_code

string

The operating status of the asset.

original_planned_generator_operating_date

date

The date the generator was originally scheduled to be operational

other_combustion_tech

boolean

Indicates whether the generator uses other combustion technologies

other_modifications_date

date

Planned effective date that the generator is scheduled to enter commercial operation after any other planned modification is complete.

other_planned_modifications

boolean

Indicates whether there are there other modifications planned for the generator.

owned_by_non_utility

boolean

Whether any part of generator is owned by a nonutilty

ownership_code

string

Identifies the ownership for each generator.

planned_derate_date

date

Planned effective month that the generator is scheduled to enter operation after the derate modification.

planned_energy_source_code_1

string

New energy source code for the planned repowered generator.

planned_generator_retirement_date

date

Planned effective date of the scheduled retirement of the generator.

planned_modifications

boolean

Indicates whether there are any planned capacity uprates/derates, repowering, other modifications, or generator retirements scheduled for the next 5 years.

planned_net_summer_capacity_derate_mw

number

Decrease in summer capacity expected to be realized from the derate modification to the equipment.

planned_net_summer_capacity_uprate_mw

number

Increase in summer capacity expected to be realized from the modification to the equipment.

planned_net_winter_capacity_derate_mw

number

Decrease in winter capacity expected to be realized from the derate modification to the equipment.

planned_net_winter_capacity_uprate_mw

number

Increase in winter capacity expected to be realized from the uprate modification to the equipment.

planned_new_capacity_mw

number

The expected new namplate capacity for the generator.

planned_new_prime_mover_code

string

New prime mover for the planned repowered generator.

planned_repower_date

date

Planned effective date that the generator is scheduled to enter operation after the repowering is complete.

planned_uprate_date

date

Planned effective date that the generator is scheduled to enter operation after the uprate modification.

plant_id_eia

integer

The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.

plant_id_pudl

integer

A manually assigned PUDL plant ID. May not be constant over time.

plant_name_eia

string

Plant name.

previously_canceled

boolean

Indicates whether the generator was previously reported as indefinitely postponed or canceled

prime_mover_code

string

Code for the type of prime mover (e.g. CT, CG)

pulverized_coal_tech

boolean

Indicates whether the generator uses pulverized coal technology

reactive_power_output_mvar

number

Reactive Power Output (MVAr)

report_date

date

Date reported.

rto_iso_lmp_node_id

string

The designation used to identify the price node in RTO/ISO Locational Marginal Price reports

rto_iso_location_wholesale_reporting_id

string

The designation used to report the specific location of the wholesale sales transactions to FERC for the Electric Quarterly Report

solid_fuel_gasification

boolean

Indicates whether the generator is part of a solid fuel gasification system

startup_source_code_1

string

The code representing the first, second, third or fourth start-up and flame stabilization energy source used by the combustion unit(s) associated with this generator.

startup_source_code_2

string

The code representing the first, second, third or fourth start-up and flame stabilization energy source used by the combustion unit(s) associated with this generator.

startup_source_code_3

string

The code representing the first, second, third or fourth start-up and flame stabilization energy source used by the combustion unit(s) associated with this generator.

startup_source_code_4

string

The code representing the first, second, third or fourth start-up and flame stabilization energy source used by the combustion unit(s) associated with this generator.

state

string

Two letter US state abbreviation.

stoker_tech

boolean

Indicates whether the generator uses stoker technology

street_address

string

Physical street address.

subcritical_tech

boolean

Indicates whether the generator uses subcritical technology

summer_capacity_estimate

boolean

Whether the summer capacity value was an estimate

summer_capacity_mw

number

The net summer capacity.

summer_estimated_capability_mw

number

EIA estimated summer capacity (in MWh).

supercritical_tech

boolean

Indicates whether the generator uses supercritical technology

synchronized_transmission_grid

boolean

Indicates whether standby generators (SB status) can be synchronized to the grid.

technology_description

string

High level description of the technology used by the generator to produce electricity.

time_cold_shutdown_full_load_code

string

The minimum amount of time required to bring the unit to full load from shutdown.

timezone

string

IANA timezone name

topping_bottoming_code

string

If the generator is associated with a combined heat and power system, indicates whether the generator is part of a topping cycle or a bottoming cycle

total_fuel_cost

number

Total annual reported fuel costs for the plant part. Includes costs from all fuels.

total_mmbtu

number

Total annual heat content of fuel consumed by a plant part record in the plant parts list.

turbines_inverters_hydrokinetics

integer

Number of wind turbines, or hydrokinetic buoys.

turbines_num

integer

Number of wind turbines, or hydrokinetic buoys.

ultrasupercritical_tech

boolean

Indicates whether the generator uses ultra-supercritical technology

unit_heat_rate_mmbtu_per_mwh

number

Fuel content per unit of electricity generated. Coming from MCOE calculation.

unit_id_pudl

integer

Dynamically assigned PUDL unit id. WARNING: This ID is not guaranteed to be static long term as the input data and algorithm may evolve over time.

uprate_derate_completed_date

date

The date when the uprate or derate was completed.

uprate_derate_during_year

boolean

Was an uprate or derate completed on this generator during the reporting year?

utility_id_eia

integer

The EIA Utility Identification number.

utility_id_pudl

integer

A manually assigned PUDL utility ID. May not be stable over time.

utility_name_eia

string

The name of the utility.

winter_capacity_estimate

boolean

Whether the winter capacity value was an estimate

winter_capacity_mw

number

The net winter capacity.

winter_estimated_capability_mw

number

EIA estimated winter capacity (in MWh).

zip_code

string

Five digit US Zip Code.

out_eia__yearly_assn_plant_parts_plant_gen#

Noticed any irregularities with this table? File a report.

Annual Association table providing connections between plant parts (identified as record_id_eia) and ‘plant_gen’ records (identified as record_id_eia_plant_gen) from the out_eia__yearly_plant_parts table.

Most-recent data:

2026

Processing:

Data is ready for use in analyses, but for practical reasons has not been denormalized and remains in narrow format.

Source:

EIA – Mix of multiple EIA Forms

Primary key:

record_id_eia, record_id_eia_plant_gen

Usage Warnings#

  • Some columns contain subtotals; use caution when choosing columns to aggregate.

  • Contains information from multiple raw inputs.

Additional Details#

The purpose of this table is to help determine what generator records are associated with every plant part record. The plant part records have a one to many relationship to the associated ‘plant_gen’ records. The ‘plant_gen’ records are also included in the plant part records, so for these records there is a one to one relationship and the record_id_eia and the record_id_eia_plant_gen are the same.

All of the columns in this table that have a suffix of ‘_plant_gen’ are attributes of the record_id_eia_plant_gen record.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia__yearly_assn_plant_parts_plant_gen.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia__yearly_assn_plant_parts_plant_gen.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia__yearly_assn_plant_parts_plant_gen.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia__yearly_assn_plant_parts_plant_gen.parquet"
)

Columns#

Field Name

Type

Description

energy_source_code_1_plant_gen

string

Code representing the most predominant type of energy that fuels the record_id_eia_plant_gen’s generator.

ferc1_generator_agg_id_plant_gen

integer

ID dynamically assigned by PUDL to EIA records with multiple matches to a single FERC ID in the FERC-EIA manual matching process. This ID is associated with the record_id_eia_plant_gen record. It depends on ferc1_generator_agg_id, which has not been updated since 2020, but only affects a couple hundred records total across all years.

ferc_acct_name_plant_gen

string

Name of FERC account, derived from technology description and prime mover code. This name is associated with the record_id_eia_plant_gen record.

generator_id_plant_gen

string

Generator ID of the record_id_eia_plant_gen record. This is usually numeric, but sometimes includes letters. Make sure you treat it as a string!

generator_operating_year_plant_gen

integer

The year an associated plant_gen’s generator went into service.

generators_number

integer

The number of generators associated with each record_id_eia.

operational_status_pudl_plant_gen

string

The operating status of the asset using PUDL categories of the record_id_eia_plant_gen record .

ownership_record_type

string

Whether each generator record is for one owner or represents a total of all ownerships.

plant_id_eia

integer

The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.

prime_mover_code_plant_gen

string

Code for the type of prime mover (e.g. CT, CG) associated with the record_id_eia_plant_gen.

record_id_eia

string

Identifier for EIA plant parts analysis records.

record_id_eia_plant_gen

string

Identifier for EIA plant parts analysis records which is at the plant_part level of plant_gen - meaning each record pertains to one generator.

report_date

date

Date reported.

technology_description_plant_gen

string

High level description of the technology used by the record_id_eia_plant_gen’s generator to produce electricity.

unit_id_pudl_plant_gen

integer

Dynamically assigned PUDL unit id of the record_id_eia_plant_gen. WARNING: This ID is not guaranteed to be static long term as the input data and algorithm may evolve over time.

utility_id_eia

integer

The EIA Utility Identification number.

out_eia__yearly_boilers#

Noticed any irregularities with this table? File a report.

Annual time series of all EIA boiler attributes.

Most-recent data:

2025

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

EIA – Mix of multiple EIA Forms

Primary key:

plant_id_eia, boiler_id, report_date

Usage Warnings#

  • Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.

  • Contains information from multiple raw inputs.

Additional Details#

This table inherits canonicalized values for boilers, plants, and utilities. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia__yearly_boilers.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia__yearly_boilers.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia__yearly_boilers.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia__yearly_boilers.parquet"
)

Columns#

Field Name

Type

Description

air_flow_100pct_load_cubic_feet_per_minute

number

Total air flow including excess air at 100 percent load, reported at standard temperature and pressure (i.e. 68 F and one atmosphere pressure).

boiler_fuel_code_1

string

The code representing the most predominant type of energy that fuels the boiler.

boiler_fuel_code_2

string

The code representing the second most predominant type of energy that fuels the boiler.

boiler_fuel_code_3

string

The code representing the third most predominant type of energy that fuels the boiler.

boiler_fuel_code_4

string

The code representing the fourth most predominant type of energy that fuels the boiler.

boiler_id

string

Alphanumeric boiler ID.

boiler_manufacturer

string

Name of boiler manufacturer.

boiler_manufacturer_code

string

EIA short code for boiler manufacturer.

boiler_operating_date

date

Date the boiler began or is planned to begin commercial operation.

boiler_retirement_date

date

Date of the scheduled or effected retirement of the boiler.

boiler_status

string

EIA short code identifying boiler operational status.

boiler_type

string

EIA short code indicating the standards under which the boiler is operating as described in the U.S. EPA regulation under 40 CFR.

city

string

Name of the city.

compliance_year_mercury

integer

Year boiler was or is expected to be in compliance with federal, state and/or local regulations for mercury emissions.

compliance_year_nox

integer

Year boiler was or is expected to be in compliance with federal, state and/or local regulations for nitrogen oxide emissions.

compliance_year_particulate

integer

Year boiler was or is expected to be in compliance with federal, state and/or local regulations for particulate matter emissions.

compliance_year_so2

integer

Year boiler was or is expected to be in compliance with federal, state and/or local regulations for sulfur dioxide emissions.

county

string

County name.

data_maturity

string

Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.

efficiency_100pct_load

number

Boiler efficiency percentage when burning at 100 percent load to the nearest 0.1 percent.

efficiency_50pct_load

number

Boiler efficiency percentage when burning at 50 percent load to the nearest 0.1 percent.

firing_rate_using_coal_tons_per_hour

number

Design firing rate at maximum continuous steam flow for coal to the nearest 0.1 ton per hour.

firing_rate_using_gas_mcf_per_hour

number

Design firing rate at maximum continuous steam flow for gas to the nearest 0.1 cubic feet per hour.

firing_rate_using_oil_bbls_per_hour

number

Design firing rate at maximum continuous steam flow for pet coke to the nearest 0.1 barrels per hour.

firing_rate_using_other_fuels

number

Design firing rate at maximum continuous steam flow for energy sources other than coal, petroleum, or natural gas.

firing_type_1

string

EIA short code indicating the type of firing used by this boiler.

firing_type_2

string

EIA short code indicating the type of firing used by this boiler.

firing_type_3

string

EIA short code indicating the type of firing used by this boiler.

fly_ash_reinjection

boolean

Indicates whether the boiler is capable of re-injecting fly ash.

hrsg

boolean

indicates if the boiler is a heat recovery steam generator (HRSG).

latitude

number

Latitude of the plant’s location, in degrees.

longitude

number

Longitude of the plant’s location, in degrees.

max_steam_flow_lbs_per_hour

number

Maximum continuous steam flow at 100 percent load.

mercury_control_existing_strategy_1

string

Existing strategy to comply with the most stringent mercury regulation.

mercury_control_existing_strategy_2

string

Existing strategy to comply with the most stringent mercury regulation.

mercury_control_existing_strategy_3

string

Existing strategy to comply with the most stringent mercury regulation.

mercury_control_existing_strategy_4

string

Existing strategy to comply with the most stringent mercury regulation.

mercury_control_existing_strategy_5

string

Existing strategy to comply with the most stringent mercury regulation.

mercury_control_existing_strategy_6

string

Existing strategy to comply with the most stringent mercury regulation.

mercury_control_proposed_strategy_1

string

Proposed strategy to comply with the most stringent mercury regulation.

mercury_control_proposed_strategy_2

string

Proposed strategy to comply with the most stringent mercury regulation.

mercury_control_proposed_strategy_3

string

Proposed strategy to comply with the most stringent mercury regulation.

new_source_review

boolean

Indicates whether the boiler is subject to New Source Review requirements.

new_source_review_date

date

Month of issued New Source Review permit.

new_source_review_permit

string

New Source Review permit number.

nox_control_existing_caaa_compliance_strategy_1

string

Existing strategies to meet the nitrogen oxide requirements of Title IV of the Clean Air Act Amendment of 1990.

nox_control_existing_caaa_compliance_strategy_2

string

Existing strategies to meet the nitrogen oxide requirements of Title IV of the Clean Air Act Amendment of 1990.

nox_control_existing_caaa_compliance_strategy_3

string

Existing strategies to meet the nitrogen oxide requirements of Title IV of the Clean Air Act Amendment of 1990.

nox_control_existing_strategy_1

string

Existing strategy to comply with the most stringent nitrogen oxide regulation.

nox_control_existing_strategy_2

string

Existing strategy to comply with the most stringent nitrogen oxide regulation.

nox_control_existing_strategy_3

string

Existing strategy to comply with the most stringent nitrogen oxide regulation.

nox_control_manufacturer

string

Name of nitrogen oxide control manufacturer.

nox_control_manufacturer_code

string

Code indicating the nitrogen oxide control burner manufacturer.

nox_control_out_of_compliance_strategy_1

string

If boiler is not in compliance with nitrogen oxide regulations, strategy for compliance.

nox_control_out_of_compliance_strategy_2

string

If boiler is not in compliance with nitrogen oxide regulations, strategy for compliance.

nox_control_out_of_compliance_strategy_3

string

If boiler is not in compliance with nitrogen oxide regulations, strategy for compliance.

nox_control_planned_caaa_compliance_strategy_1

string

Planned strategies to meet the nitrogen oxide requirements of Title IV of the Clean Air Act Amendment of 1990.

nox_control_planned_caaa_compliance_strategy_2

string

Planned strategies to meet the nitrogen oxide requirements of Title IV of the Clean Air Act Amendment of 1990.

nox_control_planned_caaa_compliance_strategy_3

string

Planned strategies to meet the nitrogen oxide requirements of Title IV of the Clean Air Act Amendment of 1990.

nox_control_proposed_strategy_1

string

Proposed strategy to comply with the most stringent nitrogen oxide regulation.

nox_control_proposed_strategy_2

string

Proposed strategy to comply with the most stringent nitrogen oxide regulation.

nox_control_proposed_strategy_3

string

Proposed strategy to comply with the most stringent nitrogen oxide regulation.

nox_control_status_code

string

Nitrogen oxide control status code.

particulate_control_out_of_compliance_strategy_1

string

If boiler is not in compliance with particulate matter regulations, strategy for compliance.

particulate_control_out_of_compliance_strategy_2

string

If boiler is not in compliance with particulate matter regulations, strategy for compliance.

particulate_control_out_of_compliance_strategy_3

string

If boiler is not in compliance with particulate matter regulations, strategy for compliance.

plant_id_eia

integer

The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.

plant_id_pudl

integer

A manually assigned PUDL plant ID. May not be constant over time.

plant_name_eia

string

Plant name.

regulation_mercury

string

Most stringent type of statute or regulation code under which the boiler is operating for mercury control standards.

regulation_nox

string

EIA short code for most stringent type of statute or regulation code under which the boiler is operating for nitrogen oxide control standards.

regulation_particulate

string

EIA short code for most stringent type of statute or regulation code under which the boiler is operating for particulate matter control standards.

regulation_so2

string

EIA short code for most stringent type of statute or regulation code under which the boiler is operating for sulfur dioxide control standards.

report_date

date

Date reported.

so2_control_existing_caaa_compliance_strategy_1

string

Existing strategies to meet the sulfur dioxide requirements of Title IV of the Clean Air Act Amendment of 1990.

so2_control_existing_caaa_compliance_strategy_2

string

Existing strategies to meet the sulfur dioxide requirements of Title IV of the Clean Air Act Amendment of 1990.

so2_control_existing_caaa_compliance_strategy_3

string

Existing strategies to meet the sulfur dioxide requirements of Title IV of the Clean Air Act Amendment of 1990.

so2_control_existing_strategy_1

string

Existing strategy to comply with the most stringent sulfur dioxide regulation.

so2_control_existing_strategy_2

string

Existing strategy to comply with the most stringent sulfur dioxide regulation.

so2_control_existing_strategy_3

string

Existing strategy to comply with the most stringent sulfur dioxide regulation.

so2_control_out_of_compliance_strategy_1

string

If boiler is not in compliance with sulfur dioxide regulations, strategy for compliance.

so2_control_out_of_compliance_strategy_2

string

If boiler is not in compliance with sulfur dioxide regulations, strategy for compliance.

so2_control_out_of_compliance_strategy_3

string

If boiler is not in compliance with sulfur dioxide regulations, strategy for compliance.

so2_control_planned_caaa_compliance_strategy_1

string

Planned strategies to meet the sulfur dioxide requirements of Title IV of the Clean Air Act Amendment of 1990.

so2_control_planned_caaa_compliance_strategy_2

string

Planned strategies to meet the sulfur dioxide requirements of Title IV of the Clean Air Act Amendment of 1990.

so2_control_planned_caaa_compliance_strategy_3

string

Planned strategies to meet the sulfur dioxide requirements of Title IV of the Clean Air Act Amendment of 1990.

so2_control_proposed_strategy_1

string

Proposed strategy to comply with the most stringent sulfur dioxide regulation.

so2_control_proposed_strategy_2

string

Proposed strategy to comply with the most stringent sulfur dioxide regulation.

so2_control_proposed_strategy_3

string

Proposed strategy to comply with the most stringent sulfur dioxide regulation.

standard_nox_rate

number

Numeric value for the unit of measurement specified for nitrogen oxide.

standard_particulate_rate

number

Numeric value for the unit of measurement specified for particulate matter.

standard_so2_percent_scrubbed

number

The percent of sulfur dioxide to be scrubbed specified by the most stringent sulfur dioxide regulation.

standard_so2_rate

number

Numeric value for the unit of measurement specified for sulfur dioxide.

state

string

Two letter US state abbreviation.

street_address

string

Physical street address.

timezone

string

IANA timezone name

turndown_ratio

number

The turndown ratio for the boiler.

unit_id_pudl

integer

Dynamically assigned PUDL unit id. WARNING: This ID is not guaranteed to be static long term as the input data and algorithm may evolve over time.

unit_nox

string

Numeric value for the unit of measurement specified for nitrogen oxide.

unit_particulate

string

Numeric value for the unit of measurement specified for particulate matter.

unit_so2

string

Numeric value for the unit of measurement specified for sulfur dioxide.

utility_id_eia

integer

The EIA Utility Identification number.

utility_id_pudl

integer

A manually assigned PUDL utility ID. May not be stable over time.

utility_name_eia

string

The name of the utility.

waste_heat_input_mmbtu_per_hour

number

Design waste-heat input rate at maximum continuous steam flow where a waste-heat boiler is a boiler that receives all or a substantial portion of its energy input from the noncumbustible exhaust gases of a separate fuel-burning process (MMBTU per hour).

wet_dry_bottom

string

Wet or Dry Bottom where Wet Bottom is defined as slag tanks that are installed at furnace throat to contain and remove molten ash from the furnace, and Dry Bottom is defined as having no slag tanks at furnace throat area, throat area is clear, and bottom ash drops through throat to bottom ash water hoppers.

zip_code

string

Five digit US Zip Code.

out_eia__yearly_generators#

Noticed any irregularities with this table? File a report.

Annual time series of all generator attributes including calculated capacity factor, heat rate, fuel cost per MMBTU and fuel cost per MWh.

Most-recent data:

2026

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

EIA – Mix of multiple EIA Forms

Primary key:

report_date, plant_id_eia, generator_id

Usage Warnings#

  • Contains estimated values.

  • Due to coverage problems in other tables, some generators have heat rate estimates but not fuel cost estimates.

  • Not all columns are originally reported in or calculable from the input tables. Expect nulls.

  • Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.

  • Contains information from multiple raw inputs.

Additional Details#

This table includes all yearly attributes for all generators reported to EIA-860 and EIA-923.

To provide a complete picture of generator data, this table compiles data from many different EIA tables, including directly reported, imputed and calculated columns. We suggest using this table if you want to explore yearly attributes about generators and would rather use a more complete and denormalized table. If you are more interested in the originally reported values, we recommend searching for core_eia with the column you are most interested in.

The calculations of capacity factor, heat rate, fuel cost per MMBTU and fuel cost per MWh are based on the allocation of net generation reported on the basis of plant, prime mover and energy source to individual generators - which comes from this table: out_eia923__yearly_generation_fuel_by_generator.

Heat rates by generator-yearly are estimated by using allocated estimates for per-generator net generation and fuel consumption as well as the core_eia923__monthly_boiler_fuel table, which reports fuel consumed by boiler. Heat rates are necessary to estimate the amount of fuel consumed by a generation unit, and thus the fuel cost per MWh generated.

Plant specific fuel prices are taken from the core_eia923__fuel_receipts_costs table, which only has ~70% coverage, leading to some generators with heat rate estimates still lacking fuel cost estimates.

This table inherits canonicalized values for generators, plants, and utilities. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia__yearly_generators.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia__yearly_generators.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia__yearly_generators.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia__yearly_generators.parquet"
)

Columns#

Field Name

Type

Description

associated_combined_heat_power

boolean

Indicates whether the generator is associated with a combined heat and power system

balancing_authority_code_eia

string

EIA short code identifying a balancing authority. May include Canadian and Mexican BAs.

balancing_authority_name_eia

string

Name of the balancing authority.

bga_source

string

The source from where the unit_id_pudl is compiled. The unit_id_pudl comes directly from EIA 860, or string association (which looks at all the boilers and generators that are not associated with a unit and tries to find a matching string in the respective collection of boilers or generator), or from a unit connection (where the unit_id_eia is employed to find additional boiler generator connections).

bypass_heat_recovery

boolean

Can this generator operate while bypassing the heat recovery steam generator?

can_burn_multiple_fuels

boolean

Whether the generator can burn multiple fuels.

can_cofire_fuels

boolean

Whether the generator can co-fire fuels.

can_switch_oil_gas

boolean

Whether the generator can switch between oil and natural gas.

can_switch_when_operating

boolean

Indicates whether a fuel switching generator can switch fuels while operating.

capacity_factor

number

Fraction of potential generation that was actually reported for a plant part. Energy generated over time period / nameplate capacity * time period (hours/years/etc.).

capacity_mw

number

Total installed (nameplate) capacity, in megawatts.

carbon_capture

boolean

Indicates whether the generator uses carbon capture technology.

city

string

Name of the city.

county

string

County name.

current_planned_generator_operating_date

date

The most recently updated effective date on which the generator is scheduled to start operation

data_maturity

string

Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.

deliver_power_transgrid

boolean

Indicate whether the generator can deliver power to the transmission grid.

distributed_generation

boolean

Whether the generator is considered distributed generation

duct_burners

boolean

Indicates whether the unit has duct-burners for supplementary firing of the turbine exhaust gas

energy_source_1_transport_1

string

Primary mode of transport for energy source 1.

energy_source_1_transport_2

string

Secondary mode of transport for energy source 1.

energy_source_1_transport_3

string

Tertiary mode of transport for energy source 1.

energy_source_2_transport_1

string

Primary mode of transport for energy source 2.

energy_source_2_transport_2

string

Secondary mode of transport for energy source 2.

energy_source_2_transport_3

string

Tertiary mode of transport for energy source 2.

energy_source_code_1

string

The code representing the most predominant type of energy that fuels the generator.

energy_source_code_2

string

The code representing the second most predominant type of energy that fuels the generator

energy_source_code_3

string

The code representing the third most predominant type of energy that fuels the generator

energy_source_code_4

string

The code representing the fourth most predominant type of energy that fuels the generator

energy_source_code_5

string

The code representing the fifth most predominant type of energy that fuels the generator

energy_source_code_6

string

The code representing the sixth most predominant type of energy that fuels the generator

energy_storage_capacity_mwh

number

Energy storage capacity in MWh (e.g. for batteries).

ferc_qualifying_facility

boolean

Indicates whether or not a generator is a qualifying FERC cogeneration facility.

fluidized_bed_tech

boolean

Indicates whether the generator uses fluidized bed technology

fuel_cost_per_mmbtu

number

Average fuel cost per MMBTU of heat content in nominal USD.

fuel_cost_per_mmbtu_source

string

Indicates the source of the values in the fuel_cost_per_mmbtu column. The fuel cost either comes directly from the EIA forms (original), was filled in from the EIA’s API using state-level averages (eiaapi), was filled in using a rolling average (rolling_avg) or When the records get aggregated together and contain multiple sources (mixed).

fuel_cost_per_mwh

number

Derived from MCOE, a unit level value. Average fuel cost per MWh of heat content in nominal USD.

fuel_type_code_pudl

string

Simplified fuel type code used in PUDL

fuel_type_count

integer

A count of how many different simple energy sources there are associated with a generator.

generator_id

string

Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string!

generator_operating_date

date

Date the generator began commercial operation. If harvested values are inconsistent, we default to using the most recently reported date.

generator_retirement_date

date

Date of the scheduled or effected retirement of the generator.

latitude

number

Latitude of the plant’s location, in degrees.

longitude

number

Longitude of the plant’s location, in degrees.

minimum_load_mw

number

The minimum load at which the generator can operate at continuosuly.

nameplate_power_factor

number

The nameplate power factor of the generator.

net_capacity_mwdc

number

Generation capacity in megawatts of direct current that is subject to a net metering agreement. Typically used for behind-the-meter solar PV.

net_generation_mwh

number

Net electricity generation for the specified period in megawatt-hours (MWh).

operational_status

string

The operating status of the asset. For generators this is based on which tab the generator was listed in in EIA 860.

operational_status_code

string

The operating status of the asset.

original_planned_generator_operating_date

date

The date the generator was originally scheduled to be operational

other_combustion_tech

boolean

Indicates whether the generator uses other combustion technologies

other_modifications_date

date

Planned effective date that the generator is scheduled to enter commercial operation after any other planned modification is complete.

other_planned_modifications

boolean

Indicates whether there are there other modifications planned for the generator.

owned_by_non_utility

boolean

Whether any part of generator is owned by a nonutilty

ownership_code

string

Identifies the ownership for each generator.

planned_derate_date

date

Planned effective month that the generator is scheduled to enter operation after the derate modification.

planned_energy_source_code_1

string

New energy source code for the planned repowered generator.

planned_generator_retirement_date

date

Planned effective date of the scheduled retirement of the generator.

planned_modifications

boolean

Indicates whether there are any planned capacity uprates/derates, repowering, other modifications, or generator retirements scheduled for the next 5 years.

planned_net_summer_capacity_derate_mw

number

Decrease in summer capacity expected to be realized from the derate modification to the equipment.

planned_net_summer_capacity_uprate_mw

number

Increase in summer capacity expected to be realized from the modification to the equipment.

planned_net_winter_capacity_derate_mw

number

Decrease in winter capacity expected to be realized from the derate modification to the equipment.

planned_net_winter_capacity_uprate_mw

number

Increase in winter capacity expected to be realized from the uprate modification to the equipment.

planned_new_capacity_mw

number

The expected new namplate capacity for the generator.

planned_new_prime_mover_code

string

New prime mover for the planned repowered generator.

planned_repower_date

date

Planned effective date that the generator is scheduled to enter operation after the repowering is complete.

planned_uprate_date

date

Planned effective date that the generator is scheduled to enter operation after the uprate modification.

plant_id_eia

integer

The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.

plant_id_pudl

integer

A manually assigned PUDL plant ID. May not be constant over time.

plant_name_eia

string

Plant name.

previously_canceled

boolean

Indicates whether the generator was previously reported as indefinitely postponed or canceled

prime_mover_code

string

Code for the type of prime mover (e.g. CT, CG)

pulverized_coal_tech

boolean

Indicates whether the generator uses pulverized coal technology

reactive_power_output_mvar

number

Reactive Power Output (MVAr)

report_date

date

Date reported.

rto_iso_lmp_node_id

string

The designation used to identify the price node in RTO/ISO Locational Marginal Price reports

rto_iso_location_wholesale_reporting_id

string

The designation used to report the specific location of the wholesale sales transactions to FERC for the Electric Quarterly Report

solid_fuel_gasification

boolean

Indicates whether the generator is part of a solid fuel gasification system

startup_source_code_1

string

The code representing the first, second, third or fourth start-up and flame stabilization energy source used by the combustion unit(s) associated with this generator.

startup_source_code_2

string

The code representing the first, second, third or fourth start-up and flame stabilization energy source used by the combustion unit(s) associated with this generator.

startup_source_code_3

string

The code representing the first, second, third or fourth start-up and flame stabilization energy source used by the combustion unit(s) associated with this generator.

startup_source_code_4

string

The code representing the first, second, third or fourth start-up and flame stabilization energy source used by the combustion unit(s) associated with this generator.

state

string

Two letter US state abbreviation.

stoker_tech

boolean

Indicates whether the generator uses stoker technology

street_address

string

Physical street address.

subcritical_tech

boolean

Indicates whether the generator uses subcritical technology

summer_capacity_estimate

boolean

Whether the summer capacity value was an estimate

summer_capacity_mw

number

The net summer capacity.

summer_estimated_capability_mw

number

EIA estimated summer capacity (in MWh).

supercritical_tech

boolean

Indicates whether the generator uses supercritical technology

synchronized_transmission_grid

boolean

Indicates whether standby generators (SB status) can be synchronized to the grid.

technology_description

string

High level description of the technology used by the generator to produce electricity.

time_cold_shutdown_full_load_code

string

The minimum amount of time required to bring the unit to full load from shutdown.

timezone

string

IANA timezone name

topping_bottoming_code

string

If the generator is associated with a combined heat and power system, indicates whether the generator is part of a topping cycle or a bottoming cycle

total_fuel_cost

number

Total annual reported fuel costs for the plant part. Includes costs from all fuels.

total_mmbtu

number

Total annual heat content of fuel consumed by a plant part record in the plant parts list.

turbines_inverters_hydrokinetics

integer

Number of wind turbines, or hydrokinetic buoys.

turbines_num

integer

Number of wind turbines, or hydrokinetic buoys.

ultrasupercritical_tech

boolean

Indicates whether the generator uses ultra-supercritical technology

unit_heat_rate_mmbtu_per_mwh

number

Fuel content per unit of electricity generated. Coming from MCOE calculation.

unit_id_pudl

integer

Dynamically assigned PUDL unit id. WARNING: This ID is not guaranteed to be static long term as the input data and algorithm may evolve over time.

uprate_derate_completed_date

date

The date when the uprate or derate was completed.

uprate_derate_during_year

boolean

Was an uprate or derate completed on this generator during the reporting year?

utility_id_eia

integer

The EIA Utility Identification number.

utility_id_pudl

integer

A manually assigned PUDL utility ID. May not be stable over time.

utility_name_eia

string

The name of the utility.

winter_capacity_estimate

boolean

Whether the winter capacity value was an estimate

winter_capacity_mw

number

The net winter capacity.

winter_estimated_capability_mw

number

EIA estimated winter capacity (in MWh).

zip_code

string

Five digit US Zip Code.

out_eia__yearly_generators_by_ownership#

Noticed any irregularities with this table? File a report.

Annual time series of all EIA generators with ownership integrated.

Most-recent data:

2026

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

EIA – Mix of multiple EIA Forms

Primary key:

This table has no primary key. This table would have a primary key of the following columns, except there are ~200 records without utility IDs: [‘plant_id_eia’, ‘generator_id’, ‘report_date’, ‘utility_id_eia’, ‘ownership_record_type’].

Usage Warnings#

  • Some columns contain subtotals; use caution when choosing columns to aggregate.

  • Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.

  • Contains information from multiple raw inputs.

Additional Details#

Generators that report to EIA can have multiple utility owners. EIA reports the ownership fraction for each of the owners of generators (see core_eia860__scd_ownership). This table includes two records for each generator’s owner: one record represents a total ownership stake and the other record represents a proportional ownership stake (the ownership_record_type column labeled as “total” and “owned” respectively). All of the values reported for these generator-owner records are scaled to the ownership stake (represented by fraction_owned).

This table was created to represent every possible version of ownership slices of the EIA generators. It is primarily used as an input to out_eia__yearly_plant_parts.

This table inherits canonicalized values for generators, plants, and utilities. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia__yearly_generators_by_ownership.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia__yearly_generators_by_ownership.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia__yearly_generators_by_ownership.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia__yearly_generators_by_ownership.parquet"
)

Columns#

Field Name

Type

Description

capacity_eoy_mw

number

Total end of year installed (nameplate) capacity for a plant part, in megawatts.

capacity_factor

number

Fraction of potential generation that was actually reported for a plant part. Energy generated over time period / nameplate capacity * time period (hours/years/etc.).

capacity_mw

number

Total installed (nameplate) capacity, in megawatts.

energy_source_code_1

string

The code representing the most predominant type of energy that fuels the generator.

ferc_acct_name

string

Name of FERC account, derived from technology description and prime mover code.

fraction_owned

number

Proportion of generator ownership attributable to this utility.

fuel_cost_per_mmbtu

number

Average fuel cost per MMBTU of heat content in nominal USD.

fuel_cost_per_mmbtu_source

string

Indicates the source of the values in the fuel_cost_per_mmbtu column. The fuel cost either comes directly from the EIA forms (original), was filled in from the EIA’s API using state-level averages (eiaapi), was filled in using a rolling average (rolling_avg) or When the records get aggregated together and contain multiple sources (mixed).

fuel_cost_per_mwh

number

Derived from MCOE, a unit level value. Average fuel cost per MWh of heat content in nominal USD.

fuel_type_code_pudl

string

Simplified fuel type code used in PUDL

generator_id

string

Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string!

generator_operating_date

date

Date the generator began commercial operation. If harvested values are inconsistent, we default to using the most recently reported date.

generator_operating_year

integer

Year a generator went into service.

generator_retirement_date

date

Date of the scheduled or effected retirement of the generator.

net_generation_mwh

number

Net electricity generation for the specified period in megawatt-hours (MWh).

operational_status

string

The operating status of the asset. For generators this is based on which tab the generator was listed in in EIA 860.

operational_status_pudl

string

The operating status of the asset using PUDL categories.

ownership_record_type

string

Whether each generator record is for one owner or represents a total of all ownerships.

planned_generator_retirement_date

date

Planned effective date of the scheduled retirement of the generator.

plant_id_eia

integer

The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.

plant_id_pudl

integer

A manually assigned PUDL plant ID. May not be constant over time.

plant_name_eia

string

Plant name.

prime_mover_code

string

Code for the type of prime mover (e.g. CT, CG)

report_date

date

Date reported.

technology_description

string

High level description of the technology used by the generator to produce electricity.

total_fuel_cost

number

Total annual reported fuel costs for the plant part. Includes costs from all fuels.

total_mmbtu

number

Total annual heat content of fuel consumed by a plant part record in the plant parts list.

unit_heat_rate_mmbtu_per_mwh

number

Fuel content per unit of electricity generated. Coming from MCOE calculation.

unit_id_pudl

integer

Dynamically assigned PUDL unit id. WARNING: This ID is not guaranteed to be static long term as the input data and algorithm may evolve over time.

utility_id_eia

integer

The EIA Utility Identification number.

utility_id_pudl

integer

A manually assigned PUDL utility ID. May not be stable over time.

utility_name_eia

string

The name of the utility.

out_eia__yearly_plant_parts#

Noticed any irregularities with this table? File a report.

Annual time series of all EIA plant parts, for use in matching to FERC 1.

Most-recent data:

2026

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

EIA – Mix of multiple EIA Forms

Primary key:

record_id_eia

Usage Warnings#

  • Some columns contain subtotals; use caution when choosing columns to aggregate.

  • Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.

  • Contains information from multiple raw inputs.

Additional Details#

Practically speaking, a plant is a collection of generator(s). There are many attributes of generators (i.e. prime mover, primary fuel source, technology type). We can use these generator attributes to group generator records into larger aggregate records which we call “plant parts”. A plant part is a record which corresponds to a particular collection of generators that all share an identical attribute and utility owner. E.g. all of the generators with unit_id=2, or all of the generators with coal as their primary fuel source.

The EIA data about power plants (from EIA 923 and 860) is reported in tables with records that correspond to mostly generators and plants. Other datasets (cough cough FERC1) are less well organized and include plants, generators and other plant parts all in the same table without any clear labels. This plant part table is an attempt to create records corresponding to many different plant parts in order to connect specific slices of EIA plants to other datasets.

Because generators are often owned by multiple utilities, another dimension of this plant part table involves generating two records for each owner: one for the portion of the plant part they own and one for the plant part as a whole. The portion records are labeled in the ownership_record_type column as owned and the total records are labeled as total.

This table includes A LOT of duplicative information about EIA plants. It is primarily meant for use as an input into the record linkage between FERC1 plants and EIA.

This table inherits canonicalized values for generators, plants, and utilities. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia__yearly_plant_parts.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia__yearly_plant_parts.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia__yearly_plant_parts.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia__yearly_plant_parts.parquet"
)

Columns#

Field Name

Type

Description

appro_part_label

string

Plant part of the associated true granularity record.

appro_record_id_eia

string

EIA record ID of the associated true granularity record.

capacity_eoy_mw

number

Total end of year installed (nameplate) capacity for a plant part, in megawatts.

capacity_factor

number

Fraction of potential generation that was actually reported for a plant part. Energy generated over time period / nameplate capacity * time period (hours/years/etc.).

capacity_mw

number

Total installed (nameplate) capacity, in megawatts.

construction_year

integer

Year the plant’s oldest still operational unit was built.

energy_source_code_1

string

The code representing the most predominant type of energy that fuels the generator.

ferc1_generator_agg_id

integer

ID dynamically assigned by PUDL to EIA records with multiple matches to a single FERC ID in the FERC-EIA manual matching process. The ID is manually assigned and has not been updated since 2020, but only affects a couple hundred records total across all years.

ferc_acct_name

string

Name of FERC account, derived from technology description and prime mover code.

fraction_owned

number

Proportion of generator ownership attributable to this utility.

fuel_cost_per_mmbtu

number

Average fuel cost per MMBTU of heat content in nominal USD.

fuel_cost_per_mwh

number

Derived from MCOE, a unit level value. Average fuel cost per MWh of heat content in nominal USD.

fuel_type_code_pudl

string

Simplified fuel type code used in PUDL

generator_id

string

Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string!

generator_operating_year

integer

Year a generator went into service.

generator_retirement_date

date

Date of the scheduled or effected retirement of the generator.

installation_year

integer

Year the plant’s most recently built unit was installed.

net_generation_mwh

number

Net electricity generation for the specified period in megawatt-hours (MWh).

operational_status

string

The operating status of the asset. For generators this is based on which tab the generator was listed in in EIA 860.

operational_status_pudl

string

The operating status of the asset using PUDL categories.

ownership_dupe

boolean

Whether a plant part record has a duplicate record with different ownership status.

ownership_record_type

string

Whether each generator record is for one owner or represents a total of all ownerships.

planned_generator_retirement_date

date

Planned effective date of the scheduled retirement of the generator.

plant_id_eia

integer

The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.

plant_id_pudl

integer

A manually assigned PUDL plant ID. May not be constant over time.

plant_id_report_year

string

PUDL plant ID and report year of the record.

plant_name_eia

string

Plant name.

plant_name_ppe

string

Derived plant name that includes EIA plant name and other strings associated with ID and PK columns of the plant part.

plant_part

string

The part of the plant a record corresponds to.

plant_part_id_eia

string

Contains EIA plant ID, plant part, ownership, and EIA utility id

prime_mover_code

string

Code for the type of prime mover (e.g. CT, CG)

record_count

integer

Number of distinct generator IDs that participated in the aggregation for a plant part list record.

record_id_eia

string

Identifier for EIA plant parts analysis records.

report_date

date

Date reported.

report_year

integer

Four-digit year in which the data was reported.

technology_description

string

High level description of the technology used by the generator to produce electricity.

total_fuel_cost

number

Total annual reported fuel costs for the plant part. Includes costs from all fuels.

total_mmbtu

number

Total annual heat content of fuel consumed by a plant part record in the plant parts list.

true_gran

boolean

Indicates whether a plant part list record is associated with the highest priority plant part for all identical records.

unit_heat_rate_mmbtu_per_mwh

number

Fuel content per unit of electricity generated. Coming from MCOE calculation.

unit_id_pudl

integer

Dynamically assigned PUDL unit id. WARNING: This ID is not guaranteed to be static long term as the input data and algorithm may evolve over time.

utility_id_eia

integer

The EIA Utility Identification number.

utility_id_pudl

integer

A manually assigned PUDL utility ID. May not be stable over time.

utility_name_eia

string

The name of the utility.

out_eia__yearly_plants#

Noticed any irregularities with this table? File a report.

Annual time series of all EIA plant attributes.

Most-recent data:

2026

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

EIA – Mix of multiple EIA Forms

Primary key:

plant_id_eia, report_date

Usage Warnings#

  • Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.

  • Contains information from multiple raw inputs.

Additional Details#

This table inherits canonicalized values for plants. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia__yearly_plants.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia__yearly_plants.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia__yearly_plants.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia__yearly_plants.parquet"
)

Columns#

Field Name

Type

Description

ash_impoundment

boolean

Is there an ash impoundment (e.g. pond, reservoir) at the plant?

ash_impoundment_lined

boolean

If there is an ash impoundment at the plant, is the impoundment lined?

ash_impoundment_status

string

If there is an ash impoundment at the plant, the ash impoundment status as of December 31 of the reporting year.

balancing_authority_code_eia

string

EIA short code identifying a balancing authority. May include Canadian and Mexican BAs.

balancing_authority_code_eia_consistent_rate

number

Percentage consistency of balancing authority code across entity records.

balancing_authority_name_eia

string

Name of the balancing authority.

city

string

Name of the city.

county

string

County name.

data_maturity

string

Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.

datum

string

Geodetic coordinate system identifier (e.g. NAD27, NAD83, or WGS84).

energy_storage

boolean

Indicates if the facility has energy storage capabilities.

ferc_cogen_docket_no

string

The docket number relating to the FERC cogenerator status. See FERC Form 556.

ferc_cogen_status

boolean

Indicates whether the plant has FERC qualifying facility cogenerator status. See FERC Form 556.

ferc_exempt_wholesale_generator

boolean

Indicates whether the plant has FERC qualifying facility exempt wholesale generator status

ferc_exempt_wholesale_generator_docket_no

string

The docket number relating to the FERC qualifying facility exempt wholesale generator status.

ferc_qualifying_facility_docket_no

string

The docket number relating to the FERC qualifying facility cogenerator status. See FERC Form 556.

ferc_small_power_producer

boolean

Indicates whether the plant has FERC qualifying facility small power producer status. See FERC Form 556.

ferc_small_power_producer_docket_no

string

The docket number relating to the FERC qualifying facility small power producer status. See FERC Form 556.

grid_voltage_1_kv

number

Plant’s grid voltage at point of interconnection to transmission or distribution facilities

grid_voltage_2_kv

number

Plant’s grid voltage at point of interconnection to transmission or distribution facilities

grid_voltage_3_kv

number

Plant’s grid voltage at point of interconnection to transmission or distribution facilities

has_net_metering

boolean

Whether the plant has a net metering agreement in effect during the reporting year. (Only displayed for facilities that report the sun or wind as an energy source). This field was only reported up until 2015

iso_rto_code

string

The code of the plant’s ISO or RTO. NA if not reported in that year.

latitude

number

Latitude of the plant’s location, in degrees.

liquefied_natural_gas_storage

boolean

Indicates if the facility have the capability to store the natural gas in the form of liquefied natural gas.

longitude

number

Longitude of the plant’s location, in degrees.

natural_gas_local_distribution_company

string

Names of Local Distribution Company (LDC), connected to natural gas burning power plants.

natural_gas_pipeline_name_1

string

The name of the owner or operator of natural gas pipeline that connects directly to this facility or that connects to a lateral pipeline owned by this facility.

natural_gas_pipeline_name_2

string

The name of the owner or operator of natural gas pipeline that connects directly to this facility or that connects to a lateral pipeline owned by this facility.

natural_gas_pipeline_name_3

string

The name of the owner or operator of natural gas pipeline that connects directly to this facility or that connects to a lateral pipeline owned by this facility.

natural_gas_storage

boolean

Indicates if the facility have on-site storage of natural gas.

nerc_region

string

NERC region in which the plant is located

pipeline_notes

string

Additional owner or operator of natural gas pipeline.

plant_id_eia

integer

The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.

plant_id_pudl

integer

A manually assigned PUDL plant ID. May not be constant over time.

plant_name_eia

string

Plant name.

primary_purpose_id_naics

integer

North American Industry Classification System (NAICS) code that best describes the primary purpose of the reporting plant

regulatory_status_code

string

Indicates whether the plant is regulated or non-regulated.

report_date

date

Date reported.

reporting_frequency_code

string

Code that specifies what time period data has to be reported (i.e. monthly data or annual totals) and how often the power plant reports this data to EIA. See reporting_frequencies_eia for more details.

sector_id_eia

integer

EIA assigned sector ID, corresponding to high level NAICS sector, designated by the primary purpose, regulatory status and plant-level combined heat and power status

sector_name_eia

string

EIA assigned sector name, corresponding to high level NAICS sector, designated by the primary purpose, regulatory status and plant-level combined heat and power status

service_area

string

Service area in which plant is located; for unregulated companies, it’s the electric utility with which plant is interconnected

state

string

Two letter US state abbreviation.

street_address

string

Physical street address.

timezone

string

IANA timezone name

transmission_distribution_owner_id

integer

EIA-assigned code for owner of transmission/distribution system to which the plant is interconnected.

transmission_distribution_owner_name

string

Name of the owner of the transmission or distribution system to which the plant is interconnected.

transmission_distribution_owner_state

string

State location for owner of transmission/distribution system to which the plant is interconnected.

utility_id_eia

integer

The EIA Utility Identification number.

utility_id_pudl

integer

A manually assigned PUDL utility ID. May not be stable over time.

utility_name_eia

string

The name of the utility.

water_source

string

Name of water source associated with the plant.

zip_code

string

Five digit US Zip Code.

out_eia__yearly_utilities#

Noticed any irregularities with this table? File a report.

Annual time series of all EIA utility attributes.

Most-recent data:

2026

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

EIA – Mix of multiple EIA Forms

Primary key:

utility_id_eia, report_date

Usage Warnings#

  • Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.

  • Contains information from multiple raw inputs.

Additional Details#

This table inherits canonicalized values for utilities. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia__yearly_utilities.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia__yearly_utilities.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_eia__yearly_utilities.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_eia__yearly_utilities.parquet"
)

Columns#

Field Name

Type

Description

address_2

string

Second line of the address.

attention_line

string

Mail attention name of the operator/owner.

city

string

Name of the city.

contact_firstname

string

First name of utility contact 1.

contact_firstname_2

string

First name of utility contact 2.

contact_lastname

string

Last name of utility contact 1.

contact_lastname_2

string

Last name of utility contact 2.

contact_title

string

Title of of utility contact 1.

contact_title_2

string

Title of utility contact 2.

data_maturity

string

Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.

entity_type

string

Entity type of principal owner.

phone_extension

string

Phone extension for utility contact 1

phone_extension_2

string

Phone extension for utility contact 2

phone_number

string

Phone number for utility contact 1.

phone_number_2

string

Phone number for utility contact 2.

plants_reported_asset_manager

boolean

Is the reporting entity an asset manager of power plants reported on Schedule 2 of the form?

plants_reported_operator

boolean

Is the reporting entity an operator of power plants reported on Schedule 2 of the form?

plants_reported_other_relationship

boolean

Does the reporting entity have any other relationship to the power plants reported on Schedule 2 of the form?

plants_reported_owner

boolean

Is the reporting entity an owner of power plants reported on Schedule 2 of the form?

report_date

date

Date reported.

state

string

Two letter US state abbreviation.

street_address

string

Physical street address.

utility_id_eia

integer

The EIA Utility Identification number.

utility_id_pudl

integer

A manually assigned PUDL utility ID. May not be stable over time.

utility_name_eia

string

The name of the utility.

zip_code

string

Five digit US Zip Code.

zip_code_4

string

Four digit US Zip Code suffix.

out_ferc1__yearly_all_plants#

Noticed any irregularities with this table? File a report.

Annual time series of plants reported in the steam, small generators, hydro, and pumped storage tables.

Most-recent data:

2024

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

FERC Form 1 – Annual Report of Major Electric Utilities (Schedules 402, 404, 406 and 408)

Primary key:

record_id

Usage Warnings#

  • Not all columns are originally reported in all of the input plant tables. Expect nulls.

  • FERC does not restrict respondents to report unique and non-duplicative plant records. There are sporadic instances of respondents reporting portions of plants and then the total plant (ex: unit 1, unit 2 and total). Use caution when aggregating.

  • FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

This table is a concatenation of the following plant tables:

  • core_ferc1__yearly_steam_plants_sched402

  • core_ferc1__yearly_hydroelectric_plants_sched406

  • core_ferc1__yearly_small_plants_sched410

  • core_ferc1__yearly_pumped_storage_plants_sched408

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_all_plants.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_all_plants.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_all_plants.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_all_plants.parquet"
)

Columns#

Field Name

Type

Description

asset_retirement_cost

number

Asset retirement cost (USD).

avg_num_employees

number

The average number of employees assigned to each plant.

capacity_factor

number

Fraction of potential generation that was actually reported for a plant part. Energy generated over time period / nameplate capacity * time period (hours/years/etc.).

capacity_mw

number

Total installed (nameplate) capacity, in megawatts.

capex_annual_addition

number

Annual capital addition into capex_total.

capex_annual_addition_rolling

number

Year-to-date capital addition into capex_total.

capex_annual_per_kw

number

Annual capital addition into capex_total per kw.

capex_annual_per_mw

number

Annual capital addition into capex_total per MW.

capex_annual_per_mw_rolling

number

Year-to-date capital addition into capex_total per MW.

capex_annual_per_mwh

number

Annual capital addition into capex_total per MWh.

capex_annual_per_mwh_rolling

number

Year-to-date capital addition into capex_total per MWh.

capex_equipment

number

Cost of plant: equipment (USD).

capex_equipment_electric

number

Cost of plant: accessory electric equipment (USD).

capex_equipment_misc

number

Cost of plant: miscellaneous power plant equipment (USD).

capex_facilities

number

Cost of plant: reservoirs, dams, and waterways (USD).

capex_land

number

Cost of plant: land and land rights (USD).

capex_per_mw

number

Cost of plant per megawatt of installed (nameplate) capacity. Nominal USD.

capex_roads

number

Cost of plant: roads, railroads, and bridges (USD).

capex_structures

number

Cost of plant: structures and improvements (USD).

capex_total

number

Total cost of plant (USD).

capex_wheels_turbines_generators

number

Cost of plant: water wheels, turbines, and generators (USD).

capex_wo_retirement_total

number

Total cost of plant (USD) without retirements.

construction_type

string

Type of plant construction (‘outdoor’, ‘semioutdoor’, or ‘conventional’). Categorized by PUDL based on our best guess of intended value in FERC1 freeform strings.

construction_year

integer

Year the plant’s oldest still operational unit was built.

energy_used_for_pumping_mwh

number

Energy used for pumping, in megawatt-hours.

ferc_license_id

string

The FERC license ID of a project.

fuel_cost_per_mmbtu

number

Average fuel cost per MMBTU of heat content in nominal USD.

fuel_cost_per_mwh

number

Derived from MCOE, a unit level value. Average fuel cost per MWh of heat content in nominal USD.

fuel_type

string

Type of fuel.

installation_year

integer

Year the plant’s most recently built unit was installed.

license_id_ferc1

integer

FERC issued operating license ID for the facility, if available. This value is extracted from the original plant name where possible.

net_capacity_adverse_conditions_mw

number

Net plant capability under the least favorable operating conditions, in megawatts.

net_capacity_favorable_conditions_mw

number

Net plant capability under the most favorable operating conditions, in megawatts.

net_generation_mwh

number

Net electricity generation for the specified period in megawatt-hours (MWh).

net_load_mwh

number

Net output for load (net generation - energy used for pumping) in megawatt-hours.

not_water_limited_capacity_mw

number

Plant capacity in MW when not limited by condenser water.

opex_allowances

number

Allowances.

opex_boiler

number

Maintenance of boiler (or reactor) plant.

opex_coolants

number

Cost of coolants and water (nuclear plants only)

opex_dams

number

Production expenses: maintenance of reservoirs, dams, and waterways (USD).

opex_electric

number

Production expenses: electric expenses (USD).

opex_engineering

number

Production expenses: maintenance, supervision, and engineering (USD).

opex_fuel

number

Production expenses: fuel (USD).

opex_generation_misc

number

Production expenses: miscellaneous power generation expenses (USD).

opex_hydraulic

number

Production expenses: hydraulic expenses (USD).

opex_maintenance

number

Production expenses: Maintenance (USD).

opex_misc_plant

number

Production expenses: maintenance of miscellaneous hydraulic plant (USD).

opex_misc_power

number

Miscellaneous steam (or nuclear) expenses.

opex_misc_steam

number

Maintenance of miscellaneous steam (or nuclear) plant.

opex_nonfuel_per_mwh

number

Investments in non-fuel production expenses per Mwh.

opex_operations

number

Production expenses: operations, supervision, and engineering (USD).

opex_per_mwh

number

Total production expenses (USD per MWh generated).

opex_plant

number

Production expenses: maintenance of electric plant (USD).

opex_production_before_pumping

number

Total production expenses before pumping (USD).

opex_production_total

number

Total operating expenses.

opex_pumped_storage

number

Production expenses: pumped storage (USD).

opex_pumping

number

Production expenses: We are here to PUMP YOU UP! (USD).

opex_rents

number

Production expenses: rents (USD).

opex_steam

number

Steam expenses.

opex_steam_other

number

Steam from other sources.

opex_structures

number

Production expenses: maintenance of structures (USD).

opex_total

number

Total production expenses, excluding fuel (USD).

opex_total_nonfuel

number

Total production expenses, excluding fuel (USD).

opex_transfer

number

Steam transferred (Credit).

opex_water_for_power

number

Production expenses: water for power (USD).

peak_demand_mw

number

Net peak demand for 60 minutes. Note: in some cases peak demand for other time periods may have been reported instead, if hourly peak demand was unavailable.

plant_capability_mw

number

Net plant capability in megawatts.

plant_hours_connected_while_generating

number

Hours the plant was connected to load while generating in the report year.

plant_id_ferc1

integer

Algorithmically assigned PUDL FERC Plant ID. WARNING: NOT STABLE BETWEEN PUDL DB INITIALIZATIONS.

plant_id_pudl

integer

A manually assigned PUDL plant ID. May not be constant over time.

plant_name_ferc1

string

Name of the plant, as reported to FERC. This is a freeform string, not guaranteed to be consistent across references to the same plant.

plant_type

string

Type of plant.

record_id

string

Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.

report_year

integer

Four-digit year in which the data was reported.

utility_id_ferc1

integer

PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.

utility_id_ferc1_dbf

integer

FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year.

utility_id_ferc1_xbrl

string

FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year.

utility_id_pudl

integer

A manually assigned PUDL utility ID. May not be stable over time.

utility_name_ferc1

string

Name of the responding utility, as it is reported in FERC Form 1. For human readability only.

water_limited_capacity_mw

number

Plant capacity in MW when limited by condenser water.

out_ferc1__yearly_balance_sheet_assets_sched110#

Noticed any irregularities with this table? File a report.

Annual time series of utility assets and other debits.

Most-recent data:

2024

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 110)

Primary key:

utility_id_ferc1, report_year, asset_type

Usage Warnings#

  • Some columns contain subtotals; use caution when choosing columns to aggregate.

  • FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_balance_sheet_assets_sched110.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_balance_sheet_assets_sched110.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_balance_sheet_assets_sched110.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_balance_sheet_assets_sched110.parquet"
)

Columns#

Field Name

Type

Description

asset_type

string

Type of asset being reported to the core_ferc1__yearly_balance_sheet_assets_sched110 table.

balance

string

Indication of whether a column is a credit or debit, as reported in the XBRL taxonomy.

ending_balance

number

Account balance at end of year.

ferc_account

string

Actual FERC Account number (e.g. ‘359.1’) if available, or a PUDL assigned ID when FERC accounts have been split or combined in reporting.

record_id

string

Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.

report_year

integer

Four-digit year in which the data was reported.

row_type_xbrl

string

Indicates whether the value reported in the row is calculated, or uniquely reported within the table.

starting_balance

number

Account balance at beginning of year.

utility_id_ferc1

integer

PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.

utility_id_ferc1_dbf

integer

FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year.

utility_id_ferc1_xbrl

string

FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year.

utility_id_pudl

integer

A manually assigned PUDL utility ID. May not be stable over time.

utility_name_ferc1

string

Name of the responding utility, as it is reported in FERC Form 1. For human readability only.

utility_type

string

Listing of utility plant types. Examples include Electric Utility, Gas Utility, and Other Utility.

out_ferc1__yearly_balance_sheet_liabilities_sched110#

Noticed any irregularities with this table? File a report.

Annual time series of utility liabilities and other credits.

Most-recent data:

2024

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 110)

Primary key:

utility_id_ferc1, report_year, liability_type

Usage Warnings#

  • Some columns contain subtotals; use caution when choosing columns to aggregate.

  • FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_balance_sheet_liabilities_sched110.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_balance_sheet_liabilities_sched110.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_balance_sheet_liabilities_sched110.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_balance_sheet_liabilities_sched110.parquet"
)

Columns#

Field Name

Type

Description

balance

string

Indication of whether a column is a credit or debit, as reported in the XBRL taxonomy.

ending_balance

number

Account balance at end of year.

ferc_account

string

Actual FERC Account number (e.g. ‘359.1’) if available, or a PUDL assigned ID when FERC accounts have been split or combined in reporting.

liability_type

string

Type of liability being reported to the core_ferc1__yearly_balance_sheet_liabilities_sched110 table.

record_id

string

Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.

report_year

integer

Four-digit year in which the data was reported.

row_type_xbrl

string

Indicates whether the value reported in the row is calculated, or uniquely reported within the table.

starting_balance

number

Account balance at beginning of year.

utility_id_ferc1

integer

PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.

utility_id_ferc1_dbf

integer

FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year.

utility_id_ferc1_xbrl

string

FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year.

utility_id_pudl

integer

A manually assigned PUDL utility ID. May not be stable over time.

utility_name_ferc1

string

Name of the responding utility, as it is reported in FERC Form 1. For human readability only.

utility_type

string

Listing of utility plant types. Examples include Electric Utility, Gas Utility, and Other Utility.

out_ferc1__yearly_cash_flows_sched120#

Noticed any irregularities with this table? File a report.

Annual time series of utility cash flow.

Most-recent data:

2024

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 120)

Primary key:

utility_id_ferc1, report_year, amount_type

Usage Warnings#

  • Some columns contain subtotals; use caution when choosing columns to aggregate.

  • FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_cash_flows_sched120.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_cash_flows_sched120.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_cash_flows_sched120.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_cash_flows_sched120.parquet"
)

Columns#

Field Name

Type

Description

amount

number

Reported amount of dollars. This could be a balance or a change in value.

amount_type

string

Label describing the type of amount being reported. This could be a balance or a change in value.

balance

string

Indication of whether a column is a credit or debit, as reported in the XBRL taxonomy.

record_id

string

Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.

report_year

integer

Four-digit year in which the data was reported.

row_type_xbrl

string

Indicates whether the value reported in the row is calculated, or uniquely reported within the table.

utility_id_ferc1

integer

PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.

utility_id_ferc1_dbf

integer

FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year.

utility_id_ferc1_xbrl

string

FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year.

utility_id_pudl

integer

A manually assigned PUDL utility ID. May not be stable over time.

utility_name_ferc1

string

Name of the responding utility, as it is reported in FERC Form 1. For human readability only.

out_ferc1__yearly_depreciation_by_function_sched219#

Noticed any irregularities with this table? File a report.

Annual time series of ending balances in accumulated provision for depreciation of electric utility plant.

Most-recent data:

2024

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 219 - Section B)

Primary key:

utility_id_ferc1, report_year, depreciation_type, plant_function, plant_status, utility_type

Usage Warnings#

  • Some columns contain subtotals; use caution when choosing columns to aggregate.

  • FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

Electric utility plant refers to FERC Account 108. Section B includes the Balances at End of Year According to Functional Classification (plant_function).

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_depreciation_by_function_sched219.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_depreciation_by_function_sched219.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_depreciation_by_function_sched219.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_depreciation_by_function_sched219.parquet"
)

Columns#

Field Name

Type

Description

balance

string

Indication of whether a column is a credit or debit, as reported in the XBRL taxonomy.

depreciation_type

string

Type of depreciation provision within FERC Account 108, including cost ofremoval, depreciation expenses, salvage, cost of retired plant, etc.

ending_balance

number

Account balance at end of year.

plant_function

string

Functional role played by utility plant (steam production, nuclear production, distribution, transmission, etc.).

plant_status

string

Utility plant financial status (in service, future, leased, total).

record_id

string

Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.

report_year

integer

Four-digit year in which the data was reported.

row_type_xbrl

string

Indicates whether the value reported in the row is calculated, or uniquely reported within the table.

utility_id_ferc1

integer

PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.

utility_id_ferc1_dbf

integer

FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year.

utility_id_ferc1_xbrl

string

FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year.

utility_id_pudl

integer

A manually assigned PUDL utility ID. May not be stable over time.

utility_name_ferc1

string

Name of the responding utility, as it is reported in FERC Form 1. For human readability only.

utility_type

string

Listing of utility plant types. Examples include Electric Utility, Gas Utility, and Other Utility.

out_ferc1__yearly_depreciation_changes_sched219#

Noticed any irregularities with this table? File a report.

Annual time series of changes in accumulated provision for depreciation of electric utility plant.

Most-recent data:

2024

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 219 - Section A)

Primary key:

utility_id_ferc1, report_year, depreciation_type, plant_status, utility_type

Usage Warnings#

  • Some columns contain subtotals; use caution when choosing columns to aggregate.

  • FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

Electric utility plant refers to FERC Account 108.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_depreciation_changes_sched219.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_depreciation_changes_sched219.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_depreciation_changes_sched219.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_depreciation_changes_sched219.parquet"
)

Columns#

Field Name

Type

Description

balance

string

Indication of whether a column is a credit or debit, as reported in the XBRL taxonomy.

depreciation_type

string

Type of depreciation provision within FERC Account 108, including cost ofremoval, depreciation expenses, salvage, cost of retired plant, etc.

dollar_value

number

Dollar value of reported income, expense, asset, or liability.

ferc_account

string

Actual FERC Account number (e.g. ‘359.1’) if available, or a PUDL assigned ID when FERC accounts have been split or combined in reporting.

plant_status

string

Utility plant financial status (in service, future, leased, total).

record_id

string

Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.

report_year

integer

Four-digit year in which the data was reported.

row_type_xbrl

string

Indicates whether the value reported in the row is calculated, or uniquely reported within the table.

utility_id_ferc1

integer

PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.

utility_id_ferc1_dbf

integer

FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year.

utility_id_ferc1_xbrl

string

FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year.

utility_id_pudl

integer

A manually assigned PUDL utility ID. May not be stable over time.

utility_name_ferc1

string

Name of the responding utility, as it is reported in FERC Form 1. For human readability only.

utility_type

string

Listing of utility plant types. Examples include Electric Utility, Gas Utility, and Other Utility.

out_ferc1__yearly_depreciation_factors_sched336#

Noticed any irregularities with this table? File a report.

Annual time series of factors used in estimating depreciation charges.

Most-recent data:

2024

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 336 - Section C)

Primary key:

This table has no primary key. This table is too unstructured to have a primary key.

Usage Warnings#

  • Some columns contain subtotals; use caution when choosing columns to aggregate.

  • Contains columns which may appear categorical, but are actually free text.

  • The rate and percentage (pct) columns are reported either as values between 0-1 or 0-100. It is often consistent within a given utility-year but use caution when comparing value across different utilities and even the same utility filing in a different year.

  • FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

This table contains details at a variety of levels of granularity.There are many free-form text fields in this table which results in respondents filling out this table very differently from each other or from year to year. We recommend using this table carefully with one utility-year at a time - not attempting to perform analysis across long time-series or across utilities without much caution and cleaning. This table only contains information from Section C: Factors Used in Estimating Depreciation Charges.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_depreciation_factors_sched336.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_depreciation_factors_sched336.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_depreciation_factors_sched336.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_depreciation_factors_sched336.parquet"
)

Columns#

Field Name

Type

Description

account_num

string

Account number(s) in connection with factors used in estimating depreciation charges.

depreciable_plant_base

number

Depreciable plant balance (depreciable base) to which rates are applied.

depreciation_factors

string

Label of the factor of depreciation factors. This field contains is an unstructured, free-form strings. It often includes FERC account IDs, sometimes includes plant names and sometimes includes headers indicating the beginning of a new section - such as a plant or asset type header which is followed by sub-components like FERC account IDs or plant names depending on the section.

depreciation_rate

string

Depreciation rate applied to utility plant balance.Be aware that the formatting of this column is not expected to be standard - expect some values between 0-1 and some between 0-100.

mortality_curve_type

string

Description of the type of mortality curve selected in plant mortality studies prepared to assist in estimating average service lives.

net_salvage_pct

number

Percentage representing the estimated value of utility plant at the end of its service life. Be aware that the formatting of this column is not expected to be standard - expect some values between 0-1 and some between 0-100.

order_num

number

This field is defined in FERC-XBRL documentation as a field that is used to sequence a table.FERC-XBRL documentation notes: ‘This field is added to a table to control ordering of the items on the table.’FERC’s documentation also notes that this field should always be an integer - although there are many instances of floating point values which seem to increment by decimal points. Nonetheless, this field can be used to help understand the original order of the table. This field did not exist prior to FERC publishing Form 1 as XBRL and thus is always null prior to 2021.

record_id

string

Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.

remaining_life_avg

number

Estimated weighted average of remaining life of utility plant assets.The original DBF data was reported as years while the XBRL data was reported as strings in this time duration format: PnYnMnDTnHnMnS. PUDL converts these string time durations into years.

report_year

integer

Four-digit year in which the data was reported.

service_life_avg

number

Estimated average service life of utility plant. The original DBF data was reported as years while the XBRL data was reported as strings in this time duration format: PnYnMnDTnHnMnS. PUDL converts these string time durations into years.

utility_id_ferc1

integer

PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.

utility_id_ferc1_dbf

integer

FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year.

utility_id_ferc1_xbrl

string

FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year.

utility_id_pudl

integer

A manually assigned PUDL utility ID. May not be stable over time.

utility_name_ferc1

string

Name of the responding utility, as it is reported in FERC Form 1. For human readability only.

out_ferc1__yearly_depreciation_summary_sched336#

Noticed any irregularities with this table? File a report.

Annual time series of depreciation and amortization of electric plant.

Most-recent data:

2024

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 336 - Section A)

Primary key:

utility_id_ferc1, report_year, plant_function, ferc_account_label

Usage Warnings#

  • Some columns contain subtotals; use caution when choosing columns to aggregate.

  • FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

Electric Plant refers to FERC Accounts 403, 404, and 405. This table only contains information from Section A: Summary of depreciation and amortization changes.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_depreciation_summary_sched336.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_depreciation_summary_sched336.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_depreciation_summary_sched336.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_depreciation_summary_sched336.parquet"
)

Columns#

Field Name

Type

Description

dollar_value

number

Dollar value of reported income, expense, asset, or liability.

ferc_account

string

Actual FERC Account number (e.g. ‘359.1’) if available, or a PUDL assigned ID when FERC accounts have been split or combined in reporting.

ferc_account_label

string

Long FERC account identifier derived from values reported in the XBRL taxonomies. May also refer to aggregations of individual FERC accounts.

plant_function

string

Functional role played by utility plant (steam production, nuclear production, distribution, transmission, etc.).

record_id

string

Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.

report_year

integer

Four-digit year in which the data was reported.

utility_id_ferc1

integer

PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.

utility_id_ferc1_dbf

integer

FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year.

utility_id_ferc1_xbrl

string

FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year.

utility_id_pudl

integer

A manually assigned PUDL utility ID. May not be stable over time.

utility_name_ferc1

string

Name of the responding utility, as it is reported in FERC Form 1. For human readability only.

utility_type

string

Listing of utility plant types. Examples include Electric Utility, Gas Utility, and Other Utility.

out_ferc1__yearly_detailed_balance_sheet_assets#

Noticed any irregularities with this table? File a report.

Annual time series of granular, de-duplicated accounting data of utilities’ balance sheet assets.

Most-recent data:

2024

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

FERC Form 1 – Annual Report of Major Electric Utilities (Schedules 110, 200, 204 and 219)

Primary key:

This table has no primary key.

Usage Warnings#

  • The data from these xbrl_factoid tables contains nested totals and subtotals - making aggregations difficult. We used FERC 1’s reported calculations to determine the most granular, non-duplicative records. The reported calculations are not always perfect - we’ve corrected many of them, but errors could still exist.

  • FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

This table is derived from four FERC Form 1 accounting tables with nested calculations:

  • core_ferc1__yearly_balance_sheet_assets_sched110

  • core_ferc1__yearly_utility_plant_summary_sched200

  • core_ferc1__yearly_plant_in_service_sched204

  • core_ferc1__yearly_depreciation_by_function_sched219

We reconciled the nested calculations within these tables and then identified the most granular data across the tables. We applied slight modifications to three columns (utility_type, plant_function & plant_status) as compared to the originally reported values in our core tables. The modifications were applied to either provide more specificity (i.e. we converted some total utility_type’s into electric) or to condense similar categories for easier analysis (i.e. creating a hydraulic_production plant_function by combining hydraulic_production_conventional and hydraulic_production_pumped_storage). See pudl.output.ferc1.Exploder for more details. This table was made entirely with support and direction from RMI.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_detailed_balance_sheet_assets.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_detailed_balance_sheet_assets.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_detailed_balance_sheet_assets.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_detailed_balance_sheet_assets.parquet"
)

Columns#

Field Name

Type

Description

ending_balance

number

Account balance at end of year.

in_rate_base

boolean

Whether or not a record from the detailed FERC1 accounting tables should be considered allowable in a utility’s rate base based on utility accounting standards. This flag was manually compiled by RMI utility accounting experts based on the xbrl_factoid and sometimes varies based on the utility_type, plant_status or plant_function.

plant_function

string

Functional role played by utility plant (steam production, nuclear production, distribution, transmission, etc.).

plant_status

string

Utility plant financial status (in service, future, leased, total).

rate_base_category

string

A category of asset or liability that RMI compiled to use as a shorthand for various types of utility assets. These tags were compiled manually based on the xbrl_factoid and sometimes varies based on the utility_type, plant_function or plant_status as well.

report_year

integer

Four-digit year in which the data was reported.

table_name

string

The name of the PUDL database table where a given record originated from.

utility_id_ferc1

integer

PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.

utility_id_ferc1_dbf

integer

FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year.

utility_id_ferc1_xbrl

string

FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year.

utility_id_pudl

integer

A manually assigned PUDL utility ID. May not be stable over time.

utility_name_ferc1

string

Name of the responding utility, as it is reported in FERC Form 1. For human readability only.

utility_type

string

Listing of utility plant types. Examples include Electric Utility, Gas Utility, and Other Utility.

utility_type_other

string

Freeform description of type of utility reported in one of the other three other utility_type sections in the core_ferc1__yearly_utility_plant_summary_sched200 table. This field is reported only in the DBF reporting years (1994-2020).

xbrl_factoid

string

The name of type of value which is a derivative of the XBRL fact name.

out_ferc1__yearly_detailed_balance_sheet_liabilities#

Noticed any irregularities with this table? File a report.

Annual time series of granular, de-duplicated accounting data of utilities’ balance sheet liabilities.

Most-recent data:

2024

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 110 and 118)

Primary key:

This table has no primary key.

Usage Warnings#

  • The data from these xbrl_factoid tables contains nested totals and subtotals - making aggregations difficult. We used FERC 1’s reported calculations to determine the most granular, non-duplicative records. The reported calculations are not always perfect - we’ve corrected many of them, but errors could still exist.

  • FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

This table is derived from two FERC Form 1 accounting tables with nested calculations:

  • core_ferc1__yearly_balance_sheet_liabilities_sched110

  • core_ferc1__yearly_retained_earnings_sched118

We reconciled the nested calculations within these tables and then identified the most granular data across the tables. We applied slight modifications to three columns (utility_type, plant_function & plant_status) as compared to the originally reported values in our core tables. The modifications were applied to either provide more specificity (i.e. we converted some total utility_type’s into electric) or to condense similar categories for easier analysis (i.e. creating a hydraulic_production plant_function by combining hydraulic_production_conventional and hydraulic_production_pumped_storage). See pudl.output.ferc1.Exploder for more details. This table was made entirely with support and direction from RMI.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_detailed_balance_sheet_liabilities.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_detailed_balance_sheet_liabilities.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_detailed_balance_sheet_liabilities.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_detailed_balance_sheet_liabilities.parquet"
)

Columns#

Field Name

Type

Description

ending_balance

number

Account balance at end of year.

in_rate_base

boolean

Whether or not a record from the detailed FERC1 accounting tables should be considered allowable in a utility’s rate base based on utility accounting standards. This flag was manually compiled by RMI utility accounting experts based on the xbrl_factoid and sometimes varies based on the utility_type, plant_status or plant_function.

rate_base_category

string

A category of asset or liability that RMI compiled to use as a shorthand for various types of utility assets. These tags were compiled manually based on the xbrl_factoid and sometimes varies based on the utility_type, plant_function or plant_status as well.

report_year

integer

Four-digit year in which the data was reported.

table_name

string

The name of the PUDL database table where a given record originated from.

utility_id_ferc1

integer

PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.

utility_id_ferc1_dbf

integer

FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year.

utility_id_ferc1_xbrl

string

FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year.

utility_id_pudl

integer

A manually assigned PUDL utility ID. May not be stable over time.

utility_name_ferc1

string

Name of the responding utility, as it is reported in FERC Form 1. For human readability only.

utility_type

string

Listing of utility plant types. Examples include Electric Utility, Gas Utility, and Other Utility.

xbrl_factoid

string

The name of type of value which is a derivative of the XBRL fact name.

out_ferc1__yearly_detailed_income_statements#

Noticed any irregularities with this table? File a report.

Annual time series of granular, de-duplicated accounting data of utilities’ income statements.

Most-recent data:

2024

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

FERC Form 1 – Annual Report of Major Electric Utilities (Schedules 114, 300, 320 and 336)

Primary key:

This table has no primary key.

Usage Warnings#

  • The data from these xbrl_factoid tables contains nested totals and subtotals - making aggregations difficult. We used FERC 1’s reported calculations to determine the most granular, non-duplicative records. The reported calculations are not always perfect - we’ve corrected many of them, but errors could still exist.

  • FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

This table is derived from four FERC Form 1 accounting tables with nested calculations:

We reconciled the nested calculations within these tables and then identified the most granular data across the tables. We applied slight modifications to two columns (utility_type & plant_function) as compared to the originally reported values in our core tables. The modifications were applied to either provide more specificity (i.e. we converted some total utility_type’s into electric) or to condense similar categories for easier analysis (i.e. creating a hydraulic_production plant_function by combining hydraulic_production_conventional and hydraulic_production_pumped_storage). See pudl.output.ferc1.Exploder for more details. This table was made entirely with support and direction from RMI.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_detailed_income_statements.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_detailed_income_statements.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_detailed_income_statements.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_detailed_income_statements.parquet"
)

Columns#

Field Name

Type

Description

dollar_value

number

Dollar value of reported income, expense, asset, or liability.

in_revenue_requirement

boolean

Whether or not a record from the detailed income statement data is typically included in a utility’s revenue requirement. This flag was manually compiled by RMI utility accounting experts based on the xbrl_factoid and sometimes varies based on the utility_type or plant_function.

plant_function

string

Functional role played by utility plant (steam production, nuclear production, distribution, transmission, etc.).

report_year

integer

Four-digit year in which the data was reported.

revenue_requirement_category

string

The category of revenue requirement associated with each component of utility’sincome statements. These categories were manually compiled by RMI utility accounting experts based on the xbrl_factoid and sometimes vary based on the utility_type or plant_function. This column is intended to be used to aggregate this table.

revenue_requirement_technology

string

The technology type associated with components of a utility’s revenue requirement. These categories were manually compiled by RMI utility accounting experts based on the xbrl_factoid and sometimes vary based on the utility_type or plant_function as well. This column is intended to be used to aggregate this table.

table_name

string

The name of the PUDL database table where a given record originated from.

utility_id_ferc1

integer

PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.

utility_id_ferc1_dbf

integer

FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year.

utility_id_ferc1_xbrl

string

FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year.

utility_id_pudl

integer

A manually assigned PUDL utility ID. May not be stable over time.

utility_name_ferc1

string

Name of the responding utility, as it is reported in FERC Form 1. For human readability only.

utility_type

string

Listing of utility plant types. Examples include Electric Utility, Gas Utility, and Other Utility.

xbrl_factoid

string

The name of type of value which is a derivative of the XBRL fact name.

out_ferc1__yearly_energy_dispositions_sched401#

Noticed any irregularities with this table? File a report.

Annual time series of dispositions of electric energy sold, exchanged, or stored.

Most-recent data:

2024

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 401a)

Primary key:

utility_id_ferc1, report_year, energy_disposition_type

Usage Warnings#

  • Some columns contain subtotals; use caution when choosing columns to aggregate.

  • FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

Electric Energy Account, dispositions only. Schedule 401a. Electricity utilities delivered to end users, internal losses, etc.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_energy_dispositions_sched401.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_energy_dispositions_sched401.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_energy_dispositions_sched401.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_energy_dispositions_sched401.parquet"
)

Columns#

Field Name

Type

Description

energy_disposition_type

string

Type of energy disposition reported in the core_ferc1__yearly_energy_dispositions_sched401. Dispositions include sales to customers, re-sales of energy, energy used internally, losses, etc.

energy_mwh

number

Sources and uses of energy in MWh.

record_id

string

Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.

report_year

integer

Four-digit year in which the data was reported.

row_type_xbrl

string

Indicates whether the value reported in the row is calculated, or uniquely reported within the table.

utility_id_ferc1

integer

PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.

utility_id_ferc1_dbf

integer

FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year.

utility_id_ferc1_xbrl

string

FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year.

utility_id_pudl

integer

A manually assigned PUDL utility ID. May not be stable over time.

utility_name_ferc1

string

Name of the responding utility, as it is reported in FERC Form 1. For human readability only.

out_ferc1__yearly_energy_sources_sched401#

Noticed any irregularities with this table? File a report.

Annual time series of sources of electric energy generated or purchased, exchanged and wheeled.

Most-recent data:

2024

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 401a)

Primary key:

utility_id_ferc1, report_year, energy_source_type

Usage Warnings#

  • Some columns contain subtotals; use caution when choosing columns to aggregate.

  • FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

Electric Energy Account, sources only. Schedule 401a. Amount of electricity the utility obtained from each of several sources.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_energy_sources_sched401.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_energy_sources_sched401.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_energy_sources_sched401.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_energy_sources_sched401.parquet"
)

Columns#

Field Name

Type

Description

energy_mwh

number

Sources and uses of energy in MWh.

energy_source_type

string

Type of energy source reported in the core_ferc1__yearly_energy_sources_sched401 table. There are three groups of energy sources: generation, power exchanges and transmission.

record_id

string

Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.

report_year

integer

Four-digit year in which the data was reported.

row_type_xbrl

string

Indicates whether the value reported in the row is calculated, or uniquely reported within the table.

utility_id_ferc1

integer

PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.

utility_id_ferc1_dbf

integer

FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year.

utility_id_ferc1_xbrl

string

FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year.

utility_id_pudl

integer

A manually assigned PUDL utility ID. May not be stable over time.

utility_name_ferc1

string

Name of the responding utility, as it is reported in FERC Form 1. For human readability only.

out_ferc1__yearly_hydroelectric_plants_sched406#

Noticed any irregularities with this table? File a report.

Annual time series of plant statistics for large hydroelectric generating plants.

Most-recent data:

2024

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 406)

Primary key:

record_id The best approximation for primary keys for this table would be: report_year, utility_id_ferc1, plant_name_ferc1. FERC does not publish plant IDs. The main identifying column is plant_name_ferc1 but that is a free-form string field and there are duplicate records.

Usage Warnings#

  • The plant_type and construction_type are standardized into categorical values from free-form strings.

  • FERC does not restrict respondents to report unique and non-duplicative plant records. There are sporadic instances of respondents reporting portions of plants and then the total plant (ex: unit 1, unit 2 and total). Use caution when aggregating.

  • FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

Large plants have an installed nameplate capacity of more than 10 MW.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_hydroelectric_plants_sched406.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_hydroelectric_plants_sched406.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_hydroelectric_plants_sched406.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_hydroelectric_plants_sched406.parquet"
)

Columns#

Field Name

Type

Description

asset_retirement_cost

number

Asset retirement cost (USD).

avg_num_employees

number

The average number of employees assigned to each plant.

capacity_factor

number

Fraction of potential generation that was actually reported for a plant part. Energy generated over time period / nameplate capacity * time period (hours/years/etc.).

capacity_mw

number

Total installed (nameplate) capacity, in megawatts.

capex_equipment

number

Cost of plant: equipment (USD).

capex_facilities

number

Cost of plant: reservoirs, dams, and waterways (USD).

capex_land

number

Cost of plant: land and land rights (USD).

capex_per_mw

number

Cost of plant per megawatt of installed (nameplate) capacity. Nominal USD.

capex_roads

number

Cost of plant: roads, railroads, and bridges (USD).

capex_structures

number

Cost of plant: structures and improvements (USD).

capex_total

number

Total cost of plant (USD).

construction_type

string

Type of plant construction (‘outdoor’, ‘semioutdoor’, or ‘conventional’). Categorized by PUDL based on our best guess of intended value in FERC1 freeform strings.

construction_year

integer

Year the plant’s oldest still operational unit was built.

installation_year

integer

Year the plant’s most recently built unit was installed.

net_capacity_adverse_conditions_mw

number

Net plant capability under the least favorable operating conditions, in megawatts.

net_capacity_favorable_conditions_mw

number

Net plant capability under the most favorable operating conditions, in megawatts.

net_generation_mwh

number

Net electricity generation for the specified period in megawatt-hours (MWh).

opex_dams

number

Production expenses: maintenance of reservoirs, dams, and waterways (USD).

opex_electric

number

Production expenses: electric expenses (USD).

opex_engineering

number

Production expenses: maintenance, supervision, and engineering (USD).

opex_generation_misc

number

Production expenses: miscellaneous power generation expenses (USD).

opex_hydraulic

number

Production expenses: hydraulic expenses (USD).

opex_misc_plant

number

Production expenses: maintenance of miscellaneous hydraulic plant (USD).

opex_operations

number

Production expenses: operations, supervision, and engineering (USD).

opex_per_mwh

number

Total production expenses (USD per MWh generated).

opex_plant

number

Production expenses: maintenance of electric plant (USD).

opex_rents

number

Production expenses: rents (USD).

opex_structures

number

Production expenses: maintenance of structures (USD).

opex_total

number

Total production expenses, excluding fuel (USD).

opex_total_nonfuel

number

Total production expenses, excluding fuel (USD).

opex_water_for_power

number

Production expenses: water for power (USD).

peak_demand_mw

number

Net peak demand for 60 minutes. Note: in some cases peak demand for other time periods may have been reported instead, if hourly peak demand was unavailable.

plant_hours_connected_while_generating

number

Hours the plant was connected to load while generating in the report year.

plant_id_pudl

integer

A manually assigned PUDL plant ID. May not be constant over time.

plant_name_ferc1

string

Name of the plant, as reported to FERC. This is a freeform string, not guaranteed to be consistent across references to the same plant.

plant_type

string

Type of plant.

project_num

integer

FERC Licensed Project Number.

record_id

string

Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.

report_year

integer

Four-digit year in which the data was reported.

utility_id_ferc1

integer

PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.

utility_id_ferc1_dbf

integer

FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year.

utility_id_ferc1_xbrl

string

FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year.

utility_id_pudl

integer

A manually assigned PUDL utility ID. May not be stable over time.

utility_name_ferc1

string

Name of the responding utility, as it is reported in FERC Form 1. For human readability only.

out_ferc1__yearly_income_statements_sched114#

Noticed any irregularities with this table? File a report.

Annual time series of utility income statements.

Most-recent data:

2024

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 114)

Primary key:

utility_id_ferc1, report_year, utility_type, income_type

Usage Warnings#

  • Some columns contain subtotals; use caution when choosing columns to aggregate.

  • FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_income_statements_sched114.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_income_statements_sched114.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_income_statements_sched114.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_income_statements_sched114.parquet"
)

Columns#

Field Name

Type

Description

balance

string

Indication of whether a column is a credit or debit, as reported in the XBRL taxonomy.

dollar_value

number

Dollar value of reported income, expense, asset, or liability.

ferc_account

string

Actual FERC Account number (e.g. ‘359.1’) if available, or a PUDL assigned ID when FERC accounts have been split or combined in reporting.

income_type

string

Type of income reported in core_ferc1__yearly_income_statements_sched114 table.

record_id

string

Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.

report_year

integer

Four-digit year in which the data was reported.

row_type_xbrl

string

Indicates whether the value reported in the row is calculated, or uniquely reported within the table.

utility_id_ferc1

integer

PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.

utility_id_ferc1_dbf

integer

FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year.

utility_id_ferc1_xbrl

string

FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year.

utility_id_pudl

integer

A manually assigned PUDL utility ID. May not be stable over time.

utility_name_ferc1

string

Name of the responding utility, as it is reported in FERC Form 1. For human readability only.

utility_type

string

Listing of utility plant types. Examples include Electric Utility, Gas Utility, and Other Utility.

out_ferc1__yearly_operating_expenses_sched320#

Noticed any irregularities with this table? File a report.

Annual time series of operating and maintenance costs associated with producing electricity.

Most-recent data:

2024

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 320)

Primary key:

utility_id_ferc1, report_year, expense_type

Usage Warnings#

  • Some columns contain subtotals; use caution when choosing columns to aggregate.

  • FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_operating_expenses_sched320.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_operating_expenses_sched320.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_operating_expenses_sched320.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_operating_expenses_sched320.parquet"
)

Columns#

Field Name

Type

Description

dollar_value

number

Dollar value of reported income, expense, asset, or liability.

expense_type

string

The type of expense.

ferc_account

string

Actual FERC Account number (e.g. ‘359.1’) if available, or a PUDL assigned ID when FERC accounts have been split or combined in reporting.

record_id

string

Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.

report_year

integer

Four-digit year in which the data was reported.

row_type_xbrl

string

Indicates whether the value reported in the row is calculated, or uniquely reported within the table.

utility_id_ferc1

integer

PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.

utility_id_ferc1_dbf

integer

FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year.

utility_id_ferc1_xbrl

string

FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year.

utility_id_pudl

integer

A manually assigned PUDL utility ID. May not be stable over time.

utility_name_ferc1

string

Name of the responding utility, as it is reported in FERC Form 1. For human readability only.

utility_type

string

Listing of utility plant types. Examples include Electric Utility, Gas Utility, and Other Utility.

out_ferc1__yearly_operating_revenues_sched300#

Noticed any irregularities with this table? File a report.

Annual time series of utilities’ electric operating revenues.

Most-recent data:

2024

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 300)

Primary key:

utility_id_ferc1, report_year, revenue_type

Usage Warnings#

  • Some columns contain subtotals; use caution when choosing columns to aggregate.

  • FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

This table includes only the structured part of schedule 300. There are a number of revenue_type’s that do not have sales_mwh,or avg_customers_per_month provided, in which case these columnswill be NULL.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_operating_revenues_sched300.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_operating_revenues_sched300.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_operating_revenues_sched300.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_operating_revenues_sched300.parquet"
)

Columns#

Field Name

Type

Description

avg_customers_per_month

number

Average number of customers per month.

dollar_value

number

Dollar value of reported income, expense, asset, or liability.

ferc_account

string

Actual FERC Account number (e.g. ‘359.1’) if available, or a PUDL assigned ID when FERC accounts have been split or combined in reporting.

record_id

string

Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.

report_year

integer

Four-digit year in which the data was reported.

revenue_type

string

Label describing types of revenues.

row_type_xbrl

string

Indicates whether the value reported in the row is calculated, or uniquely reported within the table.

sales_mwh

number

Quantity of electricity sold in MWh.

utility_id_ferc1

integer

PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.

utility_id_ferc1_dbf

integer

FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year.

utility_id_ferc1_xbrl

string

FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year.

utility_id_pudl

integer

A manually assigned PUDL utility ID. May not be stable over time.

utility_name_ferc1

string

Name of the responding utility, as it is reported in FERC Form 1. For human readability only.

utility_type

string

Listing of utility plant types. Examples include Electric Utility, Gas Utility, and Other Utility.

out_ferc1__yearly_other_regulatory_assets_sched232#

Noticed any irregularities with this table? File a report.

Annual time series of utilities’ other regulatory assets, including rate order docket number.

Most-recent data:

2024

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 232)

Primary key:

This table has no primary key. Respondents are able to enter any number of assets across many rows. There are no IDs or set fields enforced in the original table.

Usage Warnings#

  • Contains columns which may appear categorical, but are actually free text.

  • Some columns contain subtotals; use caution when choosing columns to aggregate.

  • FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_other_regulatory_assets_sched232.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_other_regulatory_assets_sched232.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_other_regulatory_assets_sched232.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_other_regulatory_assets_sched232.parquet"
)

Columns#

Field Name

Type

Description

account_detail

string

Description of the account number credited from making debit adjustment to other regulatory liabilities.

additional_description

string

Additional free-form description column. In most instances, the values in this column are exactly the same as the description column,but it differs from the standard description enough that we kept both columns. This field did not exist prior to FERC publishing Form 1 as XBRL and thus is always null prior to 2021.

credits_written_off_recovered

number

Recovered amount of divestiture of other assets lacking physical substance resulting from the ratemaking actions of regulatory agencies.

debits

number

The increase (decrease) during the period in the value of other assets resulting from the ratemaking actions of regulatory agencies.

description

string

Long human-readable description of the meaning of a code/label.

ending_balance

number

Account balance at end of year.

report_year

integer

Four-digit year in which the data was reported.

starting_balance

number

Account balance at beginning of year.

utility_id_ferc1

integer

PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.

utility_id_ferc1_dbf

integer

FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year.

utility_id_ferc1_xbrl

string

FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year.

utility_id_pudl

integer

A manually assigned PUDL utility ID. May not be stable over time.

utility_name_ferc1

string

Name of the responding utility, as it is reported in FERC Form 1. For human readability only.

out_ferc1__yearly_other_regulatory_liabilities_sched278#

Noticed any irregularities with this table? File a report.

Annual time series of utilities’ other regulatory liabilities, including rate order docket number.

Most-recent data:

2024

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 278)

Primary key:

This table has no primary key. Respondents are able to enter any number of liabilities across many rows. There are no IDs or set fields enforced in the original table.

Usage Warnings#

  • Contains columns which may appear categorical, but are actually free text.

  • Some columns contain subtotals; use caution when choosing columns to aggregate.

  • FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_other_regulatory_liabilities_sched278.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_other_regulatory_liabilities_sched278.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_other_regulatory_liabilities_sched278.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_other_regulatory_liabilities_sched278.parquet"
)

Columns#

Field Name

Type

Description

account_detail

string

Description of the account number credited from making debit adjustment to other regulatory liabilities.

decrease_in_other_regulatory_liabilities

number

The decrease during the reporting period of other regulatory liabilities.

description

string

Long human-readable description of the meaning of a code/label.

ending_balance

number

Account balance at end of year.

increase_in_other_regulatory_liabilities

number

The increase during the reporting period of other regulatory liabilities.

report_year

integer

Four-digit year in which the data was reported.

starting_balance

number

Account balance at beginning of year.

utility_id_ferc1

integer

PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.

utility_id_ferc1_dbf

integer

FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year.

utility_id_ferc1_xbrl

string

FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year.

utility_id_pudl

integer

A manually assigned PUDL utility ID. May not be stable over time.

utility_name_ferc1

string

Name of the responding utility, as it is reported in FERC Form 1. For human readability only.

out_ferc1__yearly_plant_in_service_sched204#

Noticed any irregularities with this table? File a report.

Annual time series of utilities’ balances and changes to FERC Electric Plant in Service accounts.

Most-recent data:

2024

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 204)

Primary key:

utility_id_ferc1, report_year, ferc_account_label

Usage Warnings#

  • Some columns contain subtotals; use caution when choosing columns to aggregate.

  • FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

Account numbers correspond to the FERC Uniform System of Accounts for Electric Plant, which is defined in Code of Federal Regulations (CFR) Title 18, Chapter I, Subchapter C, Part 101. (See e.g. https://www.law.cornell.edu/cfr/text/18/part-101). Each FERC respondent reports starting and ending balances for each account annually. Balances are organization wide, and are not broken down on a per-plant basis. End of year balance should equal beginning year balance plus the sum of additions, retirements, adjustments, and transfers.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_plant_in_service_sched204.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_plant_in_service_sched204.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_plant_in_service_sched204.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_plant_in_service_sched204.parquet"
)

Columns#

Field Name

Type

Description

additions

number

Cost of acquisition of items classified within the account.

adjustments

number

Cost of adjustments to the account.

ending_balance

number

Account balance at end of year.

ferc_account

string

Actual FERC Account number (e.g. ‘359.1’) if available, or a PUDL assigned ID when FERC accounts have been split or combined in reporting.

ferc_account_label

string

Long FERC account identifier derived from values reported in the XBRL taxonomies. May also refer to aggregations of individual FERC accounts.

plant_status

string

Utility plant financial status (in service, future, leased, total).

record_id

string

Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.

report_year

integer

Four-digit year in which the data was reported.

retirements

number

Cost of disposal of items classified within the account.

row_type_xbrl

string

Indicates whether the value reported in the row is calculated, or uniquely reported within the table.

starting_balance

number

Account balance at beginning of year.

transfers

number

Cost of transfers into (out of) the account.

utility_id_ferc1

integer

PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.

utility_id_ferc1_dbf

integer

FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year.

utility_id_ferc1_xbrl

string

FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year.

utility_id_pudl

integer

A manually assigned PUDL utility ID. May not be stable over time.

utility_name_ferc1

string

Name of the responding utility, as it is reported in FERC Form 1. For human readability only.

utility_type

string

Listing of utility plant types. Examples include Electric Utility, Gas Utility, and Other Utility.

out_ferc1__yearly_pumped_storage_plants_sched408#

Noticed any irregularities with this table? File a report.

Annual time series of plant statistics for hydroelectric pumped storage plants with an installed nameplate capacity of 10+ MW.

Most-recent data:

2024

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 408)

Primary key:

record_id The best approximation for primary keys for this table would be: report_year, utility_id_ferc1, plant_name_ferc1. FERC does not publish plant IDs. The main identifying column is plant_name_ferc1 but that is a free-form string field and there are duplicate records.

Usage Warnings#

  • FERC does not restrict respondents to report unique and non-duplicative plant records. There are sporadic instances of respondents reporting portions of plants and then the total plant (ex: unit 1, unit 2 and total). Use caution when aggregating.

  • FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

As reported in Schedule 408 of FERC Form 1.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_pumped_storage_plants_sched408.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_pumped_storage_plants_sched408.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_pumped_storage_plants_sched408.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_pumped_storage_plants_sched408.parquet"
)

Columns#

Field Name

Type

Description

asset_retirement_cost

number

Asset retirement cost (USD).

avg_num_employees

number

The average number of employees assigned to each plant.

capacity_factor

number

Fraction of potential generation that was actually reported for a plant part. Energy generated over time period / nameplate capacity * time period (hours/years/etc.).

capacity_mw

number

Total installed (nameplate) capacity, in megawatts.

capex_equipment_electric

number

Cost of plant: accessory electric equipment (USD).

capex_equipment_misc

number

Cost of plant: miscellaneous power plant equipment (USD).

capex_facilities

number

Cost of plant: reservoirs, dams, and waterways (USD).

capex_land

number

Cost of plant: land and land rights (USD).

capex_per_mw

number

Cost of plant per megawatt of installed (nameplate) capacity. Nominal USD.

capex_roads

number

Cost of plant: roads, railroads, and bridges (USD).

capex_structures

number

Cost of plant: structures and improvements (USD).

capex_total

number

Total cost of plant (USD).

capex_wheels_turbines_generators

number

Cost of plant: water wheels, turbines, and generators (USD).

construction_type

string

Type of plant construction (‘outdoor’, ‘semioutdoor’, or ‘conventional’). Categorized by PUDL based on our best guess of intended value in FERC1 freeform strings.

construction_year

integer

Year the plant’s oldest still operational unit was built.

energy_used_for_pumping_mwh

number

Energy used for pumping, in megawatt-hours.

installation_year

integer

Year the plant’s most recently built unit was installed.

net_generation_mwh

number

Net electricity generation for the specified period in megawatt-hours (MWh).

net_load_mwh

number

Net output for load (net generation - energy used for pumping) in megawatt-hours.

opex_dams

number

Production expenses: maintenance of reservoirs, dams, and waterways (USD).

opex_electric

number

Production expenses: electric expenses (USD).

opex_engineering

number

Production expenses: maintenance, supervision, and engineering (USD).

opex_generation_misc

number

Production expenses: miscellaneous power generation expenses (USD).

opex_misc_plant

number

Production expenses: maintenance of miscellaneous hydraulic plant (USD).

opex_operations

number

Production expenses: operations, supervision, and engineering (USD).

opex_per_mwh

number

Total production expenses (USD per MWh generated).

opex_plant

number

Production expenses: maintenance of electric plant (USD).

opex_production_before_pumping

number

Total production expenses before pumping (USD).

opex_pumped_storage

number

Production expenses: pumped storage (USD).

opex_pumping

number

Production expenses: We are here to PUMP YOU UP! (USD).

opex_rents

number

Production expenses: rents (USD).

opex_structures

number

Production expenses: maintenance of structures (USD).

opex_total

number

Total production expenses, excluding fuel (USD).

opex_total_nonfuel

number

Total production expenses, excluding fuel (USD).

opex_water_for_power

number

Production expenses: water for power (USD).

peak_demand_mw

number

Net peak demand for 60 minutes. Note: in some cases peak demand for other time periods may have been reported instead, if hourly peak demand was unavailable.

plant_capability_mw

number

Net plant capability in megawatts.

plant_hours_connected_while_generating

number

Hours the plant was connected to load while generating in the report year.

plant_id_pudl

integer

A manually assigned PUDL plant ID. May not be constant over time.

plant_name_ferc1

string

Name of the plant, as reported to FERC. This is a freeform string, not guaranteed to be consistent across references to the same plant.

project_num

integer

FERC Licensed Project Number.

record_id

string

Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.

report_year

integer

Four-digit year in which the data was reported.

utility_id_ferc1

integer

PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.

utility_id_ferc1_dbf

integer

FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year.

utility_id_ferc1_xbrl

string

FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year.

utility_id_pudl

integer

A manually assigned PUDL utility ID. May not be stable over time.

utility_name_ferc1

string

Name of the responding utility, as it is reported in FERC Form 1. For human readability only.

out_ferc1__yearly_purchased_power_and_exchanges_sched326#

Noticed any irregularities with this table? File a report.

Annual time series of purchased power (Account 555) including power exchanges (transactions involving a balancing of debits and credits for energy, capacity, etc.) and any settlements for imbalanced exchanges.

Most-recent data:

2024

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 326)

Primary key:

This table has no primary key. The primary key for this table would be record_id, however a small number of these values are duplicated in 2021, 2022, and 2024.

Usage Warnings#

  • Contains columns which may appear categorical, but are actually free text.

  • FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

This table has data about inter-utility power purchases. This includes how much electricity was purchased, how much it cost, and who it was purchased from. Unfortunately the field describing which other utility the power was being bought from (seller_name) is poorly standardized, making it difficult to correlate with other data.

Purchased Power is considered FERC Account 555 according to FERC’s Uniform System of Accounts. Reported on pages 326-327 of FERC Form 1.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_purchased_power_and_exchanges_sched326.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_purchased_power_and_exchanges_sched326.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_purchased_power_and_exchanges_sched326.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_purchased_power_and_exchanges_sched326.parquet"
)

Columns#

Field Name

Type

Description

billing_demand_mw

number

Monthly average billing demand (for requirements purchases, and any transactions involving demand charges). In megawatts.

coincident_peak_demand_mw

number

Average monthly coincident peak (CP) demand (for requirements purchases, and any transactions involving demand charges). Monthly CP demand is the metered demand during the hour (60-minute integration) in which the supplier’s system reaches its monthly peak. In megawatts.

delivered_mwh

number

Gross megawatt-hours delivered in power exchanges and used as the basis for settlement.

demand_charges

number

Demand charges (USD).

energy_charges

number

Energy charges (USD).

non_coincident_peak_demand_mw

number

Average monthly non-coincident peak (NCP) demand (for requirements purhcases, and any transactions involving demand charges). Monthly NCP demand is the maximum metered hourly (60-minute integration) demand in a month. In megawatts.

other_charges

number

Other charges, including out-of-period adjustments (USD).

purchase_type_code

string

Categorization based on the original contractual terms and conditions of the service. Must be one of ‘requirements’, ‘long_firm’, ‘intermediate_firm’, ‘short_firm’, ‘long_unit’, ‘intermediate_unit’, ‘electricity_exchange’, ‘other_service’, or ‘adjustment’. Requirements service is ongoing high reliability service, with load integrated into system resource planning. ‘Long term’ means 5+ years. ‘Intermediate term’ is 1-5 years. ‘Short term’ is less than 1 year. ‘Firm’ means not interruptible for economic reasons. ‘unit’ indicates service from a particular designated generating unit. ‘exchange’ is an in-kind transaction.

purchased_mwh

number

Megawatt-hours shown on bills rendered to the respondent. Includes both electricity purchased for storage and non-storage purposes, which were lumped together prior to 2021.

purchased_other_than_storage_mwh

number

Number of megawatt hours purchased during the period for other than energy storage.

purchased_storage_mwh

number

Number of megawatt hours purchased during the period for energy storage.

received_mwh

number

Gross megawatt-hours received in power exchanges and used as the basis for settlement.

record_id

string

Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.

report_year

integer

Four-digit year in which the data was reported.

seller_name

string

Name of the seller, or the other party in an exchange transaction.

tariff

string

FERC Rate Schedule Number or Tariff. (Note: may be incomplete if originally reported on multiple lines.)

total_settlement

number

Sum of demand, energy, and other charges (USD). For power exchanges, the settlement amount for the net receipt of energy. If more energy was delivered than received, this amount is negative.

utility_id_ferc1

integer

PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.

utility_id_ferc1_dbf

integer

FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year.

utility_id_ferc1_xbrl

string

FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year.

utility_id_pudl

integer

A manually assigned PUDL utility ID. May not be stable over time.

utility_name_ferc1

string

Name of the responding utility, as it is reported in FERC Form 1. For human readability only.

out_ferc1__yearly_rate_base#

Noticed any irregularities with this table? File a report.

Annual time series of granular accounting data consisting of what utilities can typically include in their rate bases.

Most-recent data:

2024

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

FERC Form 1 – Annual Report of Major Electric Utilities (Schedules 110, 118, 200, 204 219 and 320)

Primary key:

This table has no primary key.

Usage Warnings#

  • The data from these xbrl_factoid tables contains nested totals and subtotals - making aggregations difficult. We used FERC 1’s reported calculations to determine the most granular, non-duplicative records. The reported calculations are not always perfect - we’ve corrected many of them, but errors could still exist.

  • FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

This table is derived from seven FERC Form 1 accounting tables with nested calculations. We reconciled these nested calculations and then identified the most granular data across the tables. Here are the three direct upstream inputs - the two detailed tables have several core_ferc1 inputs each:

We applied slight modifications to three columns (utility_type, plant_function & plant_status) as compared to the originally reported values in our core tables. The modifications were applied to either provide more specificity (i.e. we converted some total utility_type’s into electric) or to condense similar categories for easier analysis (i.e. creating a hydraulic_production plant_function by combining hydraulic_production_conventional and hydraulic_production_pumped_storage). See pudl.output.ferc1.Exploder for more details. This table was made entirely with support and direction from RMI.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_rate_base.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_rate_base.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_rate_base.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_rate_base.parquet"
)

Columns#

Field Name

Type

Description

ending_balance

number

Account balance at end of year.

ferc_account

string

Actual FERC Account number (e.g. ‘359.1’) if available, or a PUDL assigned ID when FERC accounts have been split or combined in reporting.

is_disaggregated_in_rate_base

boolean

Indicates whether or not records with null values in the in_rate_base column were disaggregated. See documentation for process: pudl.output.ferc1.disaggregate_null_or_total_tag

is_disaggregated_utility_type

boolean

Indicates whether or not records with null or total values in the utility_type column were disaggregated. See documentation for process: pudl.output.ferc1.disaggregate_null_or_total_tag

plant_function

string

Functional role played by utility plant (steam production, nuclear production, distribution, transmission, etc.).

plant_status

string

Utility plant financial status (in service, future, leased, total).

rate_base_category

string

A category of asset or liability that RMI compiled to use as a shorthand for various types of utility assets. These tags were compiled manually based on the xbrl_factoid and sometimes varies based on the utility_type, plant_function or plant_status as well.

record_id

string

Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.

report_year

integer

Four-digit year in which the data was reported.

row_type_xbrl

string

Indicates whether the value reported in the row is calculated, or uniquely reported within the table.

table_name

string

The name of the PUDL database table where a given record originated from.

utility_id_ferc1

integer

PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.

utility_id_ferc1_dbf

integer

FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year.

utility_id_ferc1_xbrl

string

FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year.

utility_id_pudl

integer

A manually assigned PUDL utility ID. May not be stable over time.

utility_name_ferc1

string

Name of the responding utility, as it is reported in FERC Form 1. For human readability only.

utility_type

string

Listing of utility plant types.

utility_type_other

string

Freeform description of type of utility reported in one of the other three other utility_type sections in the core_ferc1__yearly_utility_plant_summary_sched200 table. This field is reported only in the DBF reporting years (1994-2020).

xbrl_factoid

string

The name of type of value which is a derivative of the XBRL fact name.

out_ferc1__yearly_retained_earnings_sched118#

Noticed any irregularities with this table? File a report.

Annual time series of utilities’ statements of retained earnings.

Most-recent data:

2024

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 118)

Primary key:

utility_id_ferc1, report_year, earnings_type

Usage Warnings#

  • Some columns contain subtotals; use caution when choosing columns to aggregate.

  • FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_retained_earnings_sched118.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_retained_earnings_sched118.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_retained_earnings_sched118.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_retained_earnings_sched118.parquet"
)

Columns#

Field Name

Type

Description

balance

string

Indication of whether a column is a credit or debit, as reported in the XBRL taxonomy.

earnings_type

string

Label describing types of earnings.

ending_balance

number

Account balance at end of year.

ferc_account

string

Actual FERC Account number (e.g. ‘359.1’) if available, or a PUDL assigned ID when FERC accounts have been split or combined in reporting.

record_id

string

Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.

report_year

integer

Four-digit year in which the data was reported.

row_type_xbrl

string

Indicates whether the value reported in the row is calculated, or uniquely reported within the table.

starting_balance

number

Account balance at beginning of year.

utility_id_ferc1

integer

PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.

utility_id_ferc1_dbf

integer

FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year.

utility_id_ferc1_xbrl

string

FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year.

utility_id_pudl

integer

A manually assigned PUDL utility ID. May not be stable over time.

utility_name_ferc1

string

Name of the responding utility, as it is reported in FERC Form 1. For human readability only.

utility_type

string

Listing of utility plant types. Examples include Electric Utility, Gas Utility, and Other Utility.

out_ferc1__yearly_sales_by_rate_schedules_sched304#

Noticed any irregularities with this table? File a report.

Annual time series of utilities’ electric sales from all rate schedules in effect throughout the year.

Most-recent data:

2024

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 304)

Primary key:

This table has no primary key.

Usage Warnings#

  • Some columns contain subtotals; use caution when choosing columns to aggregate.

  • Values in rate_schedule_description are free-form strings.

  • Data prior to 2021 does not include information in columns: rate_schedule_type and billing_status.

  • Units of revenue_per_kwh are suspected to include a mix of dollars and possibly cents.

  • FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

The pre-2021 data in this table (extracted from FoxProDB vs. XBRL) is extremely unstructured. Where the post-2020 data (from XBRL) sorts the data into rate schedule types: residential, industrial, commercial, public_lighting, public_authorities, railroads, interdepartmental, provision_for_rate_refund, commercial_and_industrial, total, and billing status: billed, unbilled, total, the pre-2021 data stuffs all of that information (if you’re lucky) into the rate_schedule_description column. There’s no point trying to parse through the pre 2021 rate_schedule_description column en masse because it’s just too messy. The contents of rate_schedule_description often contain numbers and acronyms that have little to no meaning out of context. The table is structured somewhat like the FERC1 small generators table with headings about rate structure type also embedded into the rate_schedule_description column. To all who dare, beware.

This table is a combination of one pre-2021 (DBF) table and nine post-2020 (XBRL) tables–one for each rate schedule type plus totals–hence increase in data clarity post-2020. The rate_schedule_type and billing_status columns are only relevant for post-2020 data as they can be reliably parsed from each of the tables and incorporated into columns. The rate_schedule_description is supposed to contain sub-rate_schedule_type names for charges (Ex: Residential 1, Residential 2, etc.). However, the pre-2021 data contains a little bit of everything (or nothing) and the post-2020 has some totals or wonky data thrown in. That’s to say, even when working with post-2020 data, be wary of aggregating the data. That’s what the “total” rows are for.

The values that come from from the totals table are marked with the string “total” in the rate_schedule_description column. The totals table is a product of the transition to XBRL, so these distinguishable totals are only available for data post-2020 (otherwise you could try keyword searching for “total” in rate_schedule_description). The total table contains two types of totals, the utility totals across all rate schedules in a given year (marked with rate_schedule_description = “total” and rate_schedule_type = “total”) and each of the utility’s individual rate schedule totals in a given year (marked with rate_schedule_description = “total” and rate_schedule_type = “residential” or any other rate schedule type).

The rate schedule based XBRL tables only report billed values whereas the total tables report billed, unbilled, and total values. (See the column description for more info on the difference between billed and unbilled). This is important to consider if you’re endeavoring to compare the subtotal values with the total values. We have not attempted to fix or verify any subtotals or totals that don’t add up.

Another important note is the possibility of unit discrepancies in certain columns. The revenue_per_kwh column does not specify reporting units, and closer inspection of the data reveals two clear peaks approximate two orders of magnitude apart. This indicates that values may be reported in both dollars and cents. However, because the price of energy per kwh varies so much regionally, we cannot guarantee which is which and have not put any cleaning mechanisms in place to account for this.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_sales_by_rate_schedules_sched304.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_sales_by_rate_schedules_sched304.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_sales_by_rate_schedules_sched304.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_sales_by_rate_schedules_sched304.parquet"
)

Columns#

Field Name

Type

Description

avg_customers_per_month

number

Average number of customers per month.

billing_status

string

Whether an amount is billed, unbilled, or both. Billed amounts pertain to the exchange of energy and unbilled amounts pertain to other sources of revenue such as contracts with peaker plants to keep them on standby or charging rent to host cell antennas on transmission towers.

dollar_value

number

Dollar value of reported income, expense, asset, or liability.

kwh_per_customer

number

kWh per customer.

rate_schedule_description

string

Free-form description of what the rate schedule name is. Not standardized. Often a sub-category of rate_schedule_type.

rate_schedule_type

string

Categorization of rate schedule type.

record_id

string

Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.

report_year

integer

Four-digit year in which the data was reported.

revenue_per_kwh

number

The amount of revenue per kWh by rate schedule acquired in the given report year.

sales_mwh

number

Quantity of electricity sold in MWh.

utility_id_ferc1

integer

PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.

utility_id_ferc1_dbf

integer

FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year.

utility_id_ferc1_xbrl

string

FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year.

utility_id_pudl

integer

A manually assigned PUDL utility ID. May not be stable over time.

utility_name_ferc1

string

Name of the responding utility, as it is reported in FERC Form 1. For human readability only.

out_ferc1__yearly_small_plants_sched410#

Noticed any irregularities with this table? File a report.

Annual time series of plant statistics for internal combustion plants, gas turbine-plants, conventional hydro plants, and pumped storage plants with less than 10 MW installed nameplate capacity and steam plants with less than 25 MW installed nameplate capacity.

Most-recent data:

2024

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 410)

Primary key:

record_id The best approximation for primary keys for this table would be: report_year, utility_id_ferc1, plant_name_ferc1. FERC does not publish plant IDs. The main identifying column is plant_name_ferc1 but that is a free-form string field and there are duplicate records.

Usage Warnings#

  • FERC does not restrict respondents to report unique and non-duplicative plant records. There are sporadic instances of respondents reporting portions of plants and then the total plant (ex: unit 1, unit 2 and total). Use caution when aggregating.

  • FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

As reported on FERC Form 1 Schedule 410 (pages 410-411) and extracted from the FERC Visual FoxPro and XBRL. See our pudl.extract.ferc1.TABLE_NAME_MAP_FERC1 for links to the raw tables.

The raw version of this table is more like a digitized PDF than an actual data table. The rows contain lots of information in addition to what the columns might suggest. For instance, a single column may contain header rows, note rows, and total rows. This extraneous information is useful, but it prevents proper analysis when mixed in with the rest of the values data in the column. We employ a couple of data transformations to extract these rows from the data and preserve some of the information they contain (fuel type, plant type, FERC license, or general notes about the plant) in separate columns.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_small_plants_sched410.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_small_plants_sched410.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_small_plants_sched410.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_small_plants_sched410.parquet"
)

Columns#

Field Name

Type

Description

capacity_mw

number

Total installed (nameplate) capacity, in megawatts.

capex_per_mw

number

Cost of plant per megawatt of installed (nameplate) capacity. Nominal USD.

capex_total

number

Total cost of plant (USD).

construction_year

integer

Year the plant’s oldest still operational unit was built.

fuel_cost_per_mmbtu

number

Average fuel cost per MMBTU of heat content in nominal USD.

fuel_type

string

Type of fuel.

license_id_ferc1

integer

FERC issued operating license ID for the facility, if available. This value is extracted from the original plant name where possible.

net_generation_mwh

number

Net electricity generation for the specified period in megawatt-hours (MWh).

opex_fuel

number

Production expenses: fuel (USD).

opex_maintenance

number

Production expenses: Maintenance (USD).

opex_operations

number

Production expenses: operations, supervision, and engineering (USD).

opex_total

number

Total production expenses, excluding fuel (USD).

opex_total_nonfuel

number

Total production expenses, excluding fuel (USD).

peak_demand_mw

number

Net peak demand for 60 minutes. Note: in some cases peak demand for other time periods may have been reported instead, if hourly peak demand was unavailable.

plant_id_pudl

integer

A manually assigned PUDL plant ID. May not be constant over time.

plant_name_ferc1

string

Name of the plant, as reported to FERC. This is a freeform string, not guaranteed to be consistent across references to the same plant.

plant_type

string

Type of plant.

record_id

string

Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.

report_year

integer

Four-digit year in which the data was reported.

utility_id_ferc1

integer

PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.

utility_id_ferc1_dbf

integer

FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year.

utility_id_ferc1_xbrl

string

FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year.

utility_id_pudl

integer

A manually assigned PUDL utility ID. May not be stable over time.

utility_name_ferc1

string

Name of the responding utility, as it is reported in FERC Form 1. For human readability only.

out_ferc1__yearly_steam_plants_fuel_by_plant_sched402#

Noticed any irregularities with this table? File a report.

Annual time series of FERC fuel data by plant.

Most-recent data:

2024

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 402)

Primary key:

report_year, utility_id_ferc1, plant_name_ferc1

Usage Warnings#

  • Some columns contain subtotals; use caution when choosing columns to aggregate.

  • FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_steam_plants_fuel_by_plant_sched402.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_steam_plants_fuel_by_plant_sched402.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_steam_plants_fuel_by_plant_sched402.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_steam_plants_fuel_by_plant_sched402.parquet"
)

Columns#

Field Name

Type

Description

coal_fraction_cost

number

Coal cost as a percentage of overall fuel cost.

coal_fraction_mmbtu

number

Coal heat content as a percentage of overall fuel heat content (MMBTU).

fuel_cost

number

Total fuel cost for plant (in $USD).

fuel_mmbtu

number

Total heat content for plant (in MMBTU).

gas_fraction_cost

number

Natural gas cost as a percentage of overall fuel cost.

gas_fraction_mmbtu

number

Natural gas heat content as a percentage of overall fuel heat content (MMBTU).

nuclear_fraction_cost

number

Nuclear cost as a percentage of overall fuel cost.

nuclear_fraction_mmbtu

number

Nuclear heat content as a percentage of overall fuel heat content (MMBTU).

oil_fraction_cost

number

Oil cost as a percentage of overall fuel cost.

oil_fraction_mmbtu

number

Oil heat content as a percentage of overall fuel heat content (MMBTU).

plant_id_pudl

integer

A manually assigned PUDL plant ID. May not be constant over time.

plant_name_ferc1

string

Name of the plant, as reported to FERC. This is a freeform string, not guaranteed to be consistent across references to the same plant.

primary_fuel_by_cost

string

Primary fuel for plant as a percentage of cost.

primary_fuel_by_mmbtu

string

Primary fuel for plant as a percentage of heat content.

report_year

integer

Four-digit year in which the data was reported.

utility_id_ferc1

integer

PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.

utility_id_ferc1_dbf

integer

FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year.

utility_id_ferc1_xbrl

string

FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year.

utility_id_pudl

integer

A manually assigned PUDL utility ID. May not be stable over time.

utility_name_ferc1

string

Name of the responding utility, as it is reported in FERC Form 1. For human readability only.

waste_fraction_cost

number

Waste-heat cost as a percentage of overall fuel cost.

waste_fraction_mmbtu

number

Waste-heat heat content as a percentage of overall fuel heat content (MMBTU).

out_ferc1__yearly_steam_plants_fuel_sched402#

Noticed any irregularities with this table? File a report.

Annual time series of fuel cost and quantity for steam plants with a capacity of 25+ MW, internal combustion and gas-turbine plants of 10+ MW, and all nuclear plants.

Most-recent data:

2024

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 402)

Primary key:

record_id

Usage Warnings#

  • The fuel_type_code_pudl is inferred from a free-form string field.

  • FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

This table is a subset of the steam plant table reported on page 402 of FERC Form 1.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_steam_plants_fuel_sched402.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_steam_plants_fuel_sched402.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_steam_plants_fuel_sched402.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_steam_plants_fuel_sched402.parquet"
)

Columns#

Field Name

Type

Description

fuel_consumed_mmbtu

number

Total consumption of fuel in physical unit, year to date. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production.

fuel_consumed_total_cost

number

Total cost of consumed fuel.

fuel_consumed_units

number

Consumption of the fuel type in physical unit. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production.

fuel_cost_per_mmbtu

number

Average fuel cost per MMBTU of heat content in nominal USD.

fuel_cost_per_unit_burned

number

Average cost of fuel consumed in the report year per reported fuel unit (USD).

fuel_cost_per_unit_delivered

number

Average cost of fuel delivered in the report year per reported fuel unit (USD).

fuel_mmbtu_per_unit

number

Heat content of the fuel in millions of Btus per physical unit.

fuel_type_code_pudl

string

Simplified fuel type code used in PUDL

fuel_units

string

Reported unit of measure for fuel.

plant_id_pudl

integer

A manually assigned PUDL plant ID. May not be constant over time.

plant_name_ferc1

string

Name of the plant, as reported to FERC. This is a freeform string, not guaranteed to be consistent across references to the same plant.

record_id

string

Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.

report_year

integer

Four-digit year in which the data was reported.

utility_id_ferc1

integer

PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.

utility_id_ferc1_dbf

integer

FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year.

utility_id_ferc1_xbrl

string

FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year.

utility_id_pudl

integer

A manually assigned PUDL utility ID. May not be stable over time.

utility_name_ferc1

string

Name of the responding utility, as it is reported in FERC Form 1. For human readability only.

out_ferc1__yearly_steam_plants_sched402#

Noticed any irregularities with this table? File a report.

Annual time series of plant statistics for steam plants with a capacity of 25+ MW, internal combustion and gas-turbine plants of 10+ MW, and all nuclear plants.

Most-recent data:

2024

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 402)

Primary key:

record_id The best approximation for primary keys for this table would be: report_year, utility_id_ferc1, plant_name_ferc1. FERC does not publish plant IDs. The main identifying column is plant_name_ferc1 but that is a free-form string field and there are duplicate records.

Usage Warnings#

  • FERC does not restrict respondents to report unique and non-duplicative plant records. There are sporadic instances of respondents reporting portions of plants and then the total plant (ex: unit 1, unit 2 and total). Use caution when aggregating.

  • FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_steam_plants_sched402.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_steam_plants_sched402.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_steam_plants_sched402.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_steam_plants_sched402.parquet"
)

Columns#

Field Name

Type

Description

asset_retirement_cost

number

Asset retirement cost (USD).

avg_num_employees

number

The average number of employees assigned to each plant.

capacity_factor

number

Fraction of potential generation that was actually reported for a plant part. Energy generated over time period / nameplate capacity * time period (hours/years/etc.).

capacity_mw

number

Total installed (nameplate) capacity, in megawatts.

capex_annual_addition

number

Annual capital addition into capex_total.

capex_annual_addition_rolling

number

Year-to-date capital addition into capex_total.

capex_annual_per_kw

number

Annual capital addition into capex_total per kw.

capex_annual_per_mw

number

Annual capital addition into capex_total per MW.

capex_annual_per_mw_rolling

number

Year-to-date capital addition into capex_total per MW.

capex_annual_per_mwh

number

Annual capital addition into capex_total per MWh.

capex_annual_per_mwh_rolling

number

Year-to-date capital addition into capex_total per MWh.

capex_equipment

number

Cost of plant: equipment (USD).

capex_land

number

Cost of plant: land and land rights (USD).

capex_per_mw

number

Cost of plant per megawatt of installed (nameplate) capacity. Nominal USD.

capex_structures

number

Cost of plant: structures and improvements (USD).

capex_total

number

Total cost of plant (USD).

capex_wo_retirement_total

number

Total cost of plant (USD) without retirements.

construction_type

string

Type of plant construction (‘outdoor’, ‘semioutdoor’, or ‘conventional’). Categorized by PUDL based on our best guess of intended value in FERC1 freeform strings.

construction_year

integer

Year the plant’s oldest still operational unit was built.

installation_year

integer

Year the plant’s most recently built unit was installed.

net_generation_mwh

number

Net electricity generation for the specified period in megawatt-hours (MWh).

not_water_limited_capacity_mw

number

Plant capacity in MW when not limited by condenser water.

opex_allowances

number

Allowances.

opex_boiler

number

Maintenance of boiler (or reactor) plant.

opex_coolants

number

Cost of coolants and water (nuclear plants only)

opex_electric

number

Production expenses: electric expenses (USD).

opex_engineering

number

Production expenses: maintenance, supervision, and engineering (USD).

opex_fuel

number

Production expenses: fuel (USD).

opex_fuel_per_mwh

number

Production expenses: fuel (USD) per megawatt-hour (Mwh).

opex_misc_power

number

Miscellaneous steam (or nuclear) expenses.

opex_misc_steam

number

Maintenance of miscellaneous steam (or nuclear) plant.

opex_nonfuel_per_mwh

number

Investments in non-fuel production expenses per Mwh.

opex_operations

number

Production expenses: operations, supervision, and engineering (USD).

opex_per_mwh

number

Total production expenses (USD per MWh generated).

opex_plants

number

Maintenance of electrical plant.

opex_production_total

number

Total operating expenses.

opex_rents

number

Production expenses: rents (USD).

opex_steam

number

Steam expenses.

opex_steam_other

number

Steam from other sources.

opex_structures

number

Production expenses: maintenance of structures (USD).

opex_total_nonfuel

number

Total production expenses, excluding fuel (USD).

opex_transfer

number

Steam transferred (Credit).

peak_demand_mw

number

Net peak demand for 60 minutes. Note: in some cases peak demand for other time periods may have been reported instead, if hourly peak demand was unavailable.

plant_capability_mw

number

Net plant capability in megawatts.

plant_hours_connected_while_generating

number

Hours the plant was connected to load while generating in the report year.

plant_id_ferc1

integer

Algorithmically assigned PUDL FERC Plant ID. WARNING: NOT STABLE BETWEEN PUDL DB INITIALIZATIONS.

plant_id_pudl

integer

A manually assigned PUDL plant ID. May not be constant over time.

plant_name_ferc1

string

Name of the plant, as reported to FERC. This is a freeform string, not guaranteed to be consistent across references to the same plant.

plant_type

string

Type of plant.

record_id

string

Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.

report_year

integer

Four-digit year in which the data was reported.

utility_id_ferc1

integer

PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.

utility_id_ferc1_dbf

integer

FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year.

utility_id_ferc1_xbrl

string

FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year.

utility_id_pudl

integer

A manually assigned PUDL utility ID. May not be stable over time.

utility_name_ferc1

string

Name of the responding utility, as it is reported in FERC Form 1. For human readability only.

water_limited_capacity_mw

number

Plant capacity in MW when limited by condenser water.

out_ferc1__yearly_transmission_lines_sched422#

Noticed any irregularities with this table? File a report.

Annual time series of statistics about transmission lines.

Most-recent data:

2024

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 422)

Primary key:

This table has no primary key. Each record of this table is supposed to represent one stretch of a transmission line, but there are no IDs and many nulls in the fields which would nominally distinguish unique transmission lines.

Usage Warnings#

  • Contains columns which may appear categorical, but are actually free text.

  • Some columns contain subtotals; use caution when choosing columns to aggregate.

  • FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

Information describing transmission lines, the cost of lines, annual operating and capital expenses, etc. This table includes transmission lines having nominal voltage of 132 kilovolts or greater. Transmission lines below these voltages are required to be reported in group totals only for each voltage.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_transmission_lines_sched422.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_transmission_lines_sched422.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_transmission_lines_sched422.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_transmission_lines_sched422.parquet"
)

Columns#

Field Name

Type

Description

capex_land

number

Cost of plant: land and land rights (USD).

capex_other

number

Other costs associated with the plant (USD).

capex_total

number

Total cost of plant (USD).

conductor_size_and_material

string

Size of transmission conductor and material of the transmission line.

designed_voltage_kv

number

Manufactured (Designed) voltage, expressed in kilo-volts, for three-phase 60 cycle alternative current transmission lines

end_point

string

The end point of a transmission line.

num_transmission_circuits

integer

Number of circuits in a transmission line.

operating_voltage_kv

number

The operating voltage, expressed kilo-volts, for three-phase 60 cycle alternative current transmission lines.

opex_maintenance

number

Production expenses: Maintenance (USD).

opex_operations

number

Production expenses: operations, supervision, and engineering (USD).

opex_rents

number

Production expenses: rents (USD).

opex_total

number

Total production expenses, excluding fuel (USD).

record_id

string

Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.

report_year

integer

Four-digit year in which the data was reported.

start_point

string

The starting point of a transmission line.

supporting_structure_type

string

Supporting structure of the transmission line.

transmission_line_and_structures_length_miles

number

Length (in pole miles or circuit miles (if transmission lines are underground)) for lines that are agrregated with other lines / structures (whose cost are aggregated and combined with other structures).

transmission_line_length_miles

number

Length (in pole miles or circuit miles (if transmission lines are underground)) for lines that are stand alone structures (whose cost are reported on a stand-alone basis).

utility_id_ferc1

integer

PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.

utility_id_ferc1_dbf

integer

FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year.

utility_id_ferc1_xbrl

string

FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year.

utility_id_pudl

integer

A manually assigned PUDL utility ID. May not be stable over time.

utility_name_ferc1

string

Name of the responding utility, as it is reported in FERC Form 1. For human readability only.

out_ferc1__yearly_utility_plant_summary_sched200#

Noticed any irregularities with this table? File a report.

Annual time series of utility plant and accumulated provisions for depreciation, amortization and depletion of utility plant assets.

Most-recent data:

2024

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 200)

Primary key:

utility_id_ferc1, report_year, utility_type, utility_plant_asset_type

Usage Warnings#

  • Some columns contain subtotals; use caution when choosing columns to aggregate.

  • FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_utility_plant_summary_sched200.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_utility_plant_summary_sched200.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc1__yearly_utility_plant_summary_sched200.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc1__yearly_utility_plant_summary_sched200.parquet"
)

Columns#

Field Name

Type

Description

ending_balance

number

Account balance at end of year.

record_id

string

Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.

report_year

integer

Four-digit year in which the data was reported.

row_type_xbrl

string

Indicates whether the value reported in the row is calculated, or uniquely reported within the table.

utility_id_ferc1

integer

PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.

utility_id_ferc1_dbf

integer

FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year.

utility_id_ferc1_xbrl

string

FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year.

utility_id_pudl

integer

A manually assigned PUDL utility ID. May not be stable over time.

utility_name_ferc1

string

Name of the responding utility, as it is reported in FERC Form 1. For human readability only.

utility_plant_asset_type

string

Type of utility plant asset reported in the core_ferc1__yearly_utility_plant_summary_sched200 table. Assets include those leased to others, held for future use, construction work-in-progress and details of accumulated depreciation.

utility_type

string

Listing of utility plant types. Examples include Electric Utility, Gas Utility, and Other Utility.

utility_type_other

string

Freeform description of type of utility reported in one of the other three other utility_type sections in the core_ferc1__yearly_utility_plant_summary_sched200 table. This field is reported only in the DBF reporting years (1994-2020).

out_ferc714__georeferenced_respondents#

Noticed any irregularities with this table? File a report.

Annual summary of demand and other information about FERC-714 respondents.

Most-recent data:

2024

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

FERC Form 714 – Annual Electric Balancing Authority Area and Planning Area Report

Primary key:

respondent_id_ferc714, report_date

Usage Warnings#

  • Contains information from multiple raw inputs.

  • FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

This table differs from out_ferc714__summarized_demand in that it also includes a geometry column describing the respondent’s service territory in each year. These service territories are based on the counties that the corresponding EIA-861 respondent reported serving in that year. There is sometimes ambiguity as to whether a FERC-714 respondent should be interpreted as an individual utility or a balancing authority. The respodent_type column indicates which type of entity has been assumed in determining the service territory from EIA-861 data.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc714__georeferenced_respondents.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc714__georeferenced_respondents.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc714__georeferenced_respondents.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc714__georeferenced_respondents.parquet"
)

Columns#

Field Name

Type

Description

area_km2

number

County area in km2.

balancing_authority_code_eia

string

EIA short code identifying a balancing authority. May include Canadian and Mexican BAs.

balancing_authority_id_eia

integer

EIA balancing authority ID. This is often (but not always!) the same as the utility ID associated with the same legal entity.

balancing_authority_name_eia

string

Name of the balancing authority.

demand_annual_mwh

number

Annual electricity demand in a given report year.

eia_code

integer

EIA utility or balancing area authority ID associated with this FERC Form 714 respondent. Note that many utilities are also balancing authorities and in many cases EIA uses the same integer ID to identify a utility in its role as a balancing authority AND as a utility, but there is no requirement that these IDs be the same, and in a number of cases they are different.

geometry

geometry

Geospatial representation of the feature.

population

number

County population, sourced from Census DP1 data.

report_date

date

Date reported.

respondent_id_ferc714

integer

PUDL-assigned identifying a respondent to FERC Form 714. This ID associates natively reported respondent IDs from the original CSV and XBRL data sources.

respondent_name_ferc714

string

Name of the utility, balancing area authority, or planning authority responding to FERC Form 714.

respondent_type

string

Whether a respondent to the FERC form 714 is a utility or a balancing authority.

utility_id_eia

integer

The EIA Utility Identification number.

utility_name_eia

string

The name of the utility.

out_ferc714__hourly_estimated_state_demand#

Noticed any irregularities with this table? File a report.

Hourly time series of estimated electricity demand for each state, scaled such that it matches the total electricity sales by state reported in EIA 861.

Most-recent data:

2024

Processing:

Data is ready for use in analyses, but for practical reasons has not been denormalized and remains in narrow format.

Source:

FERC Form 714 – Annual Electric Balancing Authority Area and Planning Area Report

Primary key:

state_id_fips, datetime_utc

Usage Warnings#

  • Contains estimated values.

  • Contains information from multiple raw inputs.

  • Large table; do not attempt to open with Excel.

  • Contains information from multiple raw inputs.

  • FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

This table uses hourly electricity demand and information about service territories to estimate the total hourly electricity demand for each US state. Hourly demand is drawn from the FERC 714, which reports at the balancing authority and utility level. Service territories for utilities and balancing authorities are inferred from information in the EIA 861, including the counties served by each utility and the utilities that make up each balancing authority.

This table uses the total electricity sales by state reported in the EIA 861 as a scaling factor to ensure that the magnitude of electricity sales is roughly correct, and obtains the shape of the demand curve from the hourly planning area demand reported in the FERC 714.

This table includes the state FIPS codes - see out_censusdp1tract__states for additional information about FIPS state codes.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc714__hourly_estimated_state_demand.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc714__hourly_estimated_state_demand.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc714__hourly_estimated_state_demand.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc714__hourly_estimated_state_demand.parquet"
)

Columns#

Field Name

Type

Description

datetime_utc

datetime

Date and time converted to Coordinated Universal Time (UTC).

demand_mwh

number

Electricity demand (energy) within a given timeframe.

scaled_demand_mwh

number

Estimated electricity demand scaled by the total sales within a state.

state_id_fips

string

Two digit state FIPS code.

out_ferc714__hourly_planning_area_demand#

Noticed any irregularities with this table? File a report.

Hourly time series of electricity demand by planning area.

Most-recent data:

2024

Processing:

Data is ready for use in analyses, but for practical reasons has not been denormalized and remains in narrow format.

Source:

FERC Form 714 – Annual Electric Balancing Authority Area and Planning Area Report (Part III, Schedule 2a)

Primary key:

respondent_id_ferc714, datetime_utc

Usage Warnings#

  • Contains rows where missing values were imputed.

  • The datetime_utc timestamps have been cleaned due to inconsistent datetime reporting. See core_ferc714__hourly_planning_area_demand for additional details.

  • FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

This table is based on core_ferc714__hourly_planning_area_demand, but adds imputed demand values where the original data was missing or anomalous. Codes explaining why values have been imputed can be found in the core_pudl__codes_imputation_reasons table.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc714__hourly_planning_area_demand.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc714__hourly_planning_area_demand.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc714__hourly_planning_area_demand.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc714__hourly_planning_area_demand.parquet"
)

Columns#

Field Name

Type

Description

datetime_utc

datetime

Date and time converted to Coordinated Universal Time (UTC).

demand_imputed_pudl_mwh

number

Electricity demand calculated by subtracting BA interchange from net generation, with outliers and missing values imputed in PUDL.

demand_imputed_pudl_mwh_imputation_code

string

Code describing why a demand value was flagged for imputation.

demand_reported_mwh

number

Originally reported electricity demand, calculated by taking the net generation within the BA and subtracting the interchange with adjacent BAs.

report_date

date

Date reported.

respondent_id_ferc714

integer

PUDL-assigned identifying a respondent to FERC Form 714. This ID associates natively reported respondent IDs from the original CSV and XBRL data sources.

respondent_id_ferc714_csv

integer

FERC Form 714 respondent ID from CSV reported data - published from years: 2006-2020. This ID is linked to the newer years of reported XBRL data through the PUDL-assigned respondent_id_ferc714 ID. This ID was originally reported as respondent_id. Note that this ID does not correspond to FERC respondent IDs from other forms.

respondent_id_ferc714_xbrl

string

FERC Form 714 respondent ID from XBRL reported data - published from years: 2021-present. This ID is linked to the older years of reported CSV data through the PUDL-assigned respondent_id_ferc714 ID. This ID was originally reported as entity_id. Note that this ID does not correspond to FERC respondent IDs from other forms.

timezone

string

IANA timezone name

out_ferc714__respondents_with_fips#

Noticed any irregularities with this table? File a report.

Annual respondents with the county FIPS IDs for their service territories.

Most-recent data:

2024

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

FERC Form 714 – Annual Electric Balancing Authority Area and Planning Area Report

Primary key:

This table has no primary key. Note that the state and county FIPS columns can be null. Otherwise the natural primary key would be: [‘respondent_id_ferc714’, ‘report_date’, ‘county_id_fips’]. The state FIPS code is not part of the primary key because the first two digits of the county FIPS code indicate the state.

Usage Warnings#

  • Contains information from multiple raw inputs.

  • FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc714__respondents_with_fips.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc714__respondents_with_fips.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc714__respondents_with_fips.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc714__respondents_with_fips.parquet"
)

Columns#

Field Name

Type

Description

balancing_authority_code_eia

string

EIA short code identifying a balancing authority. May include Canadian and Mexican BAs.

balancing_authority_id_eia

integer

EIA balancing authority ID. This is often (but not always!) the same as the utility ID associated with the same legal entity.

balancing_authority_name_eia

string

Name of the balancing authority.

county

string

County name.

county_id_fips

string

County ID from the Federal Information Processing Standard Publication 6-4.

eia_code

integer

EIA utility or balancing area authority ID associated with this FERC Form 714 respondent. Note that many utilities are also balancing authorities and in many cases EIA uses the same integer ID to identify a utility in its role as a balancing authority AND as a utility, but there is no requirement that these IDs be the same, and in a number of cases they are different.

report_date

date

Date reported.

respondent_id_ferc714

integer

PUDL-assigned identifying a respondent to FERC Form 714. This ID associates natively reported respondent IDs from the original CSV and XBRL data sources.

respondent_name_ferc714

string

Name of the utility, balancing area authority, or planning authority responding to FERC Form 714.

respondent_type

string

Whether a respondent to the FERC form 714 is a utility or a balancing authority.

state

string

Two letter US state abbreviation.

state_id_fips

string

Two digit state FIPS code.

utility_id_eia

integer

The EIA Utility Identification number.

utility_name_eia

string

The name of the utility.

out_ferc714__summarized_demand#

Noticed any irregularities with this table? File a report.

Summarized demand statistics and FERC-714 respondent attributes by respondent-year.

Most-recent data:

2024

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

FERC Form 714 – Annual Electric Balancing Authority Area and Planning Area Report

Primary key:

respondent_id_ferc714, report_date

Usage Warnings#

  • FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc714__summarized_demand.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc714__summarized_demand.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_ferc714__summarized_demand.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_ferc714__summarized_demand.parquet"
)

Columns#

Field Name

Type

Description

area_km2

number

County area in km2.

balancing_authority_code_eia

string

EIA short code identifying a balancing authority. May include Canadian and Mexican BAs.

balancing_authority_id_eia

integer

EIA balancing authority ID. This is often (but not always!) the same as the utility ID associated with the same legal entity.

balancing_authority_name_eia

string

Name of the balancing authority.

demand_annual_mwh

number

Annual electricity demand in a given report year.

demand_annual_per_capita_mwh

number

Per-capita annual demand, averaged using Census county-level population estimates.

demand_density_mwh_km2

number

Annual demand per km2 of a given service territory.

eia_code

integer

EIA utility or balancing area authority ID associated with this FERC Form 714 respondent. Note that many utilities are also balancing authorities and in many cases EIA uses the same integer ID to identify a utility in its role as a balancing authority AND as a utility, but there is no requirement that these IDs be the same, and in a number of cases they are different.

population

number

County population, sourced from Census DP1 data.

population_density_km2

number

Average population per sq. km area of a service territory.

report_date

date

Date reported.

respondent_id_ferc714

integer

PUDL-assigned identifying a respondent to FERC Form 714. This ID associates natively reported respondent IDs from the original CSV and XBRL data sources.

respondent_name_ferc714

string

Name of the utility, balancing area authority, or planning authority responding to FERC Form 714.

respondent_type

string

Whether a respondent to the FERC form 714 is a utility or a balancing authority.

utility_id_eia

integer

The EIA Utility Identification number.

utility_name_eia

string

The name of the utility.

out_gridpathratoolkit__hourly_available_capacity_factor#

Noticed any irregularities with this table? File a report.

Hourly time series of capacity factors defining the capacity available from an aggregated group of generators.

Most-recent data:

2021

Processing:

Data is ready for use in analyses, but for practical reasons has not been denormalized and remains in narrow format.

Source:

GridPath Resource Adequacy Toolkit Data

Primary key:

datetime_utc, aggregation_group

Usage Warnings#

  • Contains estimated values.

  • The original data is no longer being collected or reported in this way.

Additional Details#

This table contains a mix of profiles representing whole regions and individual plants, where the individual plants are hybrid wind or solar plus electricity storage facilities. Capacity factors are stated as a fraction of the aggregate nameplate capacity of the group.

For the hybrid facilities, the capacity factor represents the available output of only the renewable generators. Estimates of the generation profiles have been extended across a longer range of dates than the original data. For wind, the years 2015-2020 are synthesized data and for solar 2020 is synthesized. See the Appendix of the GridPath Resource Adequacy Toolkit report for details.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_gridpathratoolkit__hourly_available_capacity_factor.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_gridpathratoolkit__hourly_available_capacity_factor.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_gridpathratoolkit__hourly_available_capacity_factor.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_gridpathratoolkit__hourly_available_capacity_factor.parquet"
)

Columns#

Field Name

Type

Description

aggregation_group

string

A label identifying a group of aggregated generator capacity factors.

capacity_factor

number

Fraction of potential generation that was actually reported for a plant part. Energy generated over time period / nameplate capacity * time period (hours/years/etc.).

datetime_utc

datetime

Date and time converted to Coordinated Universal Time (UTC).

out_pudl__yearly_assn_eia_ferc1_plant_parts#

Noticed any irregularities with this table? File a report.

Annual Association table providing connections between power plant data reported in FERC Form 1 and any available EIA data related to the plant parts covered by each FERC Form 1 record.

Most-recent data:

2024

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

The Public Utility Data Liberation (PUDL) Project

Primary key:

record_id_ferc1

Usage Warnings#

  • Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.

  • Contains information from multiple raw inputs.

  • FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

This table answers the question “What EIA data reported about plants or generators should be associated with a given plant record found in the FERC Form 1?”

Each record in this table corresponds to a single FERC Form 1 record reported in one of several tables describing power plants (large steam, hydro, small, etc.). These FERC records can correspond to an entire plant, individual generators within a plant, all generators in a plant with the same prime mover type, or just the respondent’s ownership share of any of those categories (or other categories). Furthermore, the same utility may report the same plant in different ways in different years.

The EIA data associated with each FERC plant record comes from our Plant Parts EIA table, out_eia__yearly_plant_parts. The EIA data in each record represents an aggregation of several slices of an EIA plant, across both physical characteristics and utility ownership.

This table inherits canonicalized values for generators and plants. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_pudl__yearly_assn_eia_ferc1_plant_parts.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_pudl__yearly_assn_eia_ferc1_plant_parts.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_pudl__yearly_assn_eia_ferc1_plant_parts.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_pudl__yearly_assn_eia_ferc1_plant_parts.parquet"
)

Columns#

Field Name

Type

Description

appro_part_label

string

Plant part of the associated true granularity record.

appro_record_id_eia

string

EIA record ID of the associated true granularity record.

asset_retirement_cost

number

Asset retirement cost (USD).

avg_num_employees

number

The average number of employees assigned to each plant.

capacity_factor_eia

number

Fraction of potential generation that was actually reported for a plant part. Energy generated over time period / nameplate capacity * time period (hours/years/etc.).

capacity_factor_ferc1

number

Fraction of potential generation that was actually reported for a plant part. Energy generated over time period / nameplate capacity * time period (hours/years/etc.).

capacity_mw_eia

number

Total installed (nameplate) capacity, in megawatts.

capacity_mw_ferc1

number

Total installed (nameplate) capacity, in megawatts.

capex_annual_addition

number

Annual capital addition into capex_total.

capex_annual_addition_rolling

number

Year-to-date capital addition into capex_total.

capex_annual_per_kw

number

Annual capital addition into capex_total per kw.

capex_annual_per_mw

number

Annual capital addition into capex_total per MW.

capex_annual_per_mw_rolling

number

Year-to-date capital addition into capex_total per MW.

capex_annual_per_mwh

number

Annual capital addition into capex_total per MWh.

capex_annual_per_mwh_rolling

number

Year-to-date capital addition into capex_total per MWh.

capex_equipment

number

Cost of plant: equipment (USD).

capex_equipment_electric

number

Cost of plant: accessory electric equipment (USD).

capex_equipment_misc

number

Cost of plant: miscellaneous power plant equipment (USD).

capex_facilities

number

Cost of plant: reservoirs, dams, and waterways (USD).

capex_land

number

Cost of plant: land and land rights (USD).

capex_per_mw

number

Cost of plant per megawatt of installed (nameplate) capacity. Nominal USD.

capex_roads

number

Cost of plant: roads, railroads, and bridges (USD).

capex_structures

number

Cost of plant: structures and improvements (USD).

capex_total

number

Total cost of plant (USD).

capex_wheels_turbines_generators

number

Cost of plant: water wheels, turbines, and generators (USD).

capex_wo_retirement_total

number

Total cost of plant (USD) without retirements.

construction_type

string

Type of plant construction (‘outdoor’, ‘semioutdoor’, or ‘conventional’). Categorized by PUDL based on our best guess of intended value in FERC1 freeform strings.

construction_year_eia

integer

Year the plant’s oldest still operational unit was built.

construction_year_ferc1

integer

Year the plant’s oldest still operational unit was built.

energy_source_code_1

string

The code representing the most predominant type of energy that fuels the generator.

energy_used_for_pumping_mwh

number

Energy used for pumping, in megawatt-hours.

ferc_acct_name

string

Name of FERC account, derived from technology description and prime mover code.

ferc_license_id

string

The FERC license ID of a project.

fraction_owned

number

Proportion of generator ownership attributable to this utility.

fuel_cost_per_mmbtu_eia

number

Average fuel cost per MMBTU of heat content in nominal USD.

fuel_cost_per_mmbtu_ferc1

number

Average fuel cost per MMBTU of heat content in nominal USD.

fuel_cost_per_mwh_eia

number

Derived from MCOE, a unit level value. Average fuel cost per MWh of heat content in nominal USD.

fuel_cost_per_mwh_ferc1

number

Derived from MCOE, a unit level value. Average fuel cost per MWh of heat content in nominal USD.

fuel_type

string

Type of fuel.

fuel_type_code_pudl_eia

string

Simplified fuel type code used in PUDL

fuel_type_code_pudl_ferc1

string

Simplified fuel type code used in PUDL

generator_id

string

Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string!

generator_operating_year

integer

Year a generator went into service.

installation_year_eia

integer

Year the plant’s most recently built unit was installed.

installation_year_ferc1

integer

Year the plant’s most recently built unit was installed.

license_id_ferc1

integer

FERC issued operating license ID for the facility, if available. This value is extracted from the original plant name where possible.

match_type

string

Indicates the source and validation of the match between EIA and FERC. Match types include matches was generated from the model, verified by the training data, overridden by the training data, etc.

net_capacity_adverse_conditions_mw

number

Net plant capability under the least favorable operating conditions, in megawatts.

net_capacity_favorable_conditions_mw

number

Net plant capability under the most favorable operating conditions, in megawatts.

net_generation_mwh_eia

number

Net electricity generation for the specified period in megawatt-hours (MWh).

net_generation_mwh_ferc1

number

Net electricity generation for the specified period in megawatt-hours (MWh).

net_load_mwh

number

Net output for load (net generation - energy used for pumping) in megawatt-hours.

not_water_limited_capacity_mw

number

Plant capacity in MW when not limited by condenser water.

operational_status

string

The operating status of the asset. For generators this is based on which tab the generator was listed in in EIA 860.

operational_status_pudl

string

The operating status of the asset using PUDL categories.

opex_allowances

number

Allowances.

opex_boiler

number

Maintenance of boiler (or reactor) plant.

opex_coolants

number

Cost of coolants and water (nuclear plants only)

opex_dams

number

Production expenses: maintenance of reservoirs, dams, and waterways (USD).

opex_electric

number

Production expenses: electric expenses (USD).

opex_engineering

number

Production expenses: maintenance, supervision, and engineering (USD).

opex_fuel

number

Production expenses: fuel (USD).

opex_generation_misc

number

Production expenses: miscellaneous power generation expenses (USD).

opex_hydraulic

number

Production expenses: hydraulic expenses (USD).

opex_maintenance

number

Production expenses: Maintenance (USD).

opex_misc_plant

number

Production expenses: maintenance of miscellaneous hydraulic plant (USD).

opex_misc_power

number

Miscellaneous steam (or nuclear) expenses.

opex_misc_steam

number

Maintenance of miscellaneous steam (or nuclear) plant.

opex_nonfuel_per_mwh

number

Investments in non-fuel production expenses per Mwh.

opex_operations

number

Production expenses: operations, supervision, and engineering (USD).

opex_per_mwh

number

Total production expenses (USD per MWh generated).

opex_plant

number

Production expenses: maintenance of electric plant (USD).

opex_production_before_pumping

number

Total production expenses before pumping (USD).

opex_production_total

number

Total operating expenses.

opex_pumped_storage

number

Production expenses: pumped storage (USD).

opex_pumping

number

Production expenses: We are here to PUMP YOU UP! (USD).

opex_rents

number

Production expenses: rents (USD).

opex_steam

number

Steam expenses.

opex_steam_other

number

Steam from other sources.

opex_structures

number

Production expenses: maintenance of structures (USD).

opex_total

number

Total production expenses, excluding fuel (USD).

opex_total_nonfuel

number

Total production expenses, excluding fuel (USD).

opex_transfer

number

Steam transferred (Credit).

opex_water_for_power

number

Production expenses: water for power (USD).

ownership_dupe

boolean

Whether a plant part record has a duplicate record with different ownership status.

ownership_record_type

string

Whether each generator record is for one owner or represents a total of all ownerships.

peak_demand_mw

number

Net peak demand for 60 minutes. Note: in some cases peak demand for other time periods may have been reported instead, if hourly peak demand was unavailable.

plant_capability_mw

number

Net plant capability in megawatts.

plant_hours_connected_while_generating

number

Hours the plant was connected to load while generating in the report year.

plant_id_eia

integer

The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.

plant_id_ferc1

integer

Algorithmically assigned PUDL FERC Plant ID. WARNING: NOT STABLE BETWEEN PUDL DB INITIALIZATIONS.

plant_id_pudl

integer

A manually assigned PUDL plant ID. May not be constant over time.

plant_name_eia

string

Plant name.

plant_name_ferc1

string

Name of the plant, as reported to FERC. This is a freeform string, not guaranteed to be consistent across references to the same plant.

plant_name_ppe

string

Derived plant name that includes EIA plant name and other strings associated with ID and PK columns of the plant part.

plant_part

string

The part of the plant a record corresponds to.

plant_part_id_eia

string

Contains EIA plant ID, plant part, ownership, and EIA utility id

plant_type

string

Type of plant.

prime_mover_code

string

Code for the type of prime mover (e.g. CT, CG)

record_count

integer

Number of distinct generator IDs that participated in the aggregation for a plant part list record.

record_id_eia

string

Identifier for EIA plant parts analysis records.

record_id_ferc1

string

Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.

report_date

date

Date reported.

report_year

integer

Four-digit year in which the data was reported.

technology_description

string

High level description of the technology used by the generator to produce electricity.

total_fuel_cost_eia

number

Total annual reported fuel costs for the plant part. Includes costs from all fuels.

total_fuel_cost_ferc1

number

Total annual reported fuel costs for the plant part. Includes costs from all fuels.

total_mmbtu_eia

number

Total annual heat content of fuel consumed by a plant part record in the plant parts list.

total_mmbtu_ferc1

number

Total annual heat content of fuel consumed by a plant part record in the plant parts list.

true_gran

boolean

Indicates whether a plant part list record is associated with the highest priority plant part for all identical records.

unit_heat_rate_mmbtu_per_mwh_eia

number

Fuel content per unit of electricity generated. Coming from MCOE calculation.

unit_heat_rate_mmbtu_per_mwh_ferc1

number

Fuel content per unit of electricity generated. Calculated from FERC reported fuel consumption and net generation.

unit_id_pudl

integer

Dynamically assigned PUDL unit id. WARNING: This ID is not guaranteed to be static long term as the input data and algorithm may evolve over time.

utility_id_eia

integer

The EIA Utility Identification number.

utility_id_ferc1

integer

PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.

utility_id_pudl

integer

A manually assigned PUDL utility ID. May not be stable over time.

utility_name_ferc1

string

Name of the responding utility, as it is reported in FERC Form 1. For human readability only.

water_limited_capacity_mw

number

Plant capacity in MW when limited by condenser water.

out_rus12__monthly_demand_and_energy_at_delivery_points#

Noticed any irregularities with this table? File a report.

Monthly time series of demand and energy delivered to distribution utilities.

Most-recent data:

2021

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part H - Section E)

Primary key:

report_date, borrower_id_rus, delivery_recipient

Usage Warnings#

  • Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__monthly_demand_and_energy_at_delivery_points.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__monthly_demand_and_energy_at_delivery_points.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus12__monthly_demand_and_energy_at_delivery_points.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__monthly_demand_and_energy_at_delivery_points.parquet"
)

Columns#

Field Name

Type

Description

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

borrower_name_rus

string

The name of the RUS (Rural Utilities Service) borrower.

delivered_demand_mw

number

The amount of demand delivered in MW.

delivered_energy_mwh

number

The amount of energy delivered in MWh.

delivery_recipient

string

The recipient of the delivered energy or demand.

report_date

date

Date reported.

state

string

Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

out_rus12__monthly_demand_and_energy_at_power_sources#

Noticed any irregularities with this table? File a report.

Monthly time series of demand and energy at point of generation.

Most-recent data:

2021

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part H - Section D)

Primary key:

report_date, borrower_id_rus

Usage Warnings#

  • Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Additional Details#

There are some peak demand dates that have an hour component and others that don’t. This is because there is inconsistent reporting of peak_demand_hour in the raw data. If there is no hour reported, we record the hour as 0 and only the date will show in the data.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__monthly_demand_and_energy_at_power_sources.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__monthly_demand_and_energy_at_power_sources.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus12__monthly_demand_and_energy_at_power_sources.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__monthly_demand_and_energy_at_power_sources.parquet"
)

Columns#

Field Name

Type

Description

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

borrower_name_rus

string

The name of the RUS (Rural Utilities Service) borrower.

energy_output_mwh

number

The amount of energy output in MWh.

is_peak_coincident

boolean

Whether or not the peak_mw is coincident or non-coincident peak.

peak_demand_date

datetime

The date of the peak demand.

peak_demand_mw

number

Net peak demand for 60 minutes. Note: in some cases peak demand for other time periods may have been reported instead, if hourly peak demand was unavailable.

report_date

date

Date reported.

state

string

Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

out_rus12__yearly_balance_sheet_assets#

Noticed any irregularities with this table? File a report.

Annual time series of assets and other debts from the balance sheet.

Most-recent data:

2021

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part A - Section B)

Primary key:

report_date, borrower_id_rus, asset_type

Usage Warnings#

  • Some columns contain subtotals; use caution when choosing columns to aggregate.

  • Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Additional Details#

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_balance_sheet_assets.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_balance_sheet_assets.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus12__yearly_balance_sheet_assets.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_balance_sheet_assets.parquet"
)

Columns#

Field Name

Type

Description

asset_type

string

Type of asset being reported to the core_ferc1__yearly_balance_sheet_assets_sched110 table.

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

borrower_name_rus

string

The name of the RUS (Rural Utilities Service) borrower.

ending_balance

number

Account balance at end of year.

is_total

boolean

Whether or not this record represents a total.

report_date

date

Date reported.

state

string

Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

out_rus12__yearly_balance_sheet_liabilities#

Noticed any irregularities with this table? File a report.

Annual time series of liabilities and other credits from the balance sheet.

Most-recent data:

2021

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part A - Section B)

Primary key:

report_date, borrower_id_rus, liability_type

Usage Warnings#

  • Some columns contain subtotals; use caution when choosing columns to aggregate.

  • Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Additional Details#

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_balance_sheet_liabilities.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_balance_sheet_liabilities.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus12__yearly_balance_sheet_liabilities.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_balance_sheet_liabilities.parquet"
)

Columns#

Field Name

Type

Description

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

borrower_name_rus

string

The name of the RUS (Rural Utilities Service) borrower.

ending_balance

number

Account balance at end of year.

is_total

boolean

Whether or not this record represents a total.

liability_type

string

Type of liability being reported to the core_ferc1__yearly_balance_sheet_liabilities_sched110 table.

report_date

date

Date reported.

state

string

Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

out_rus12__yearly_depreciation_changes#

Noticed any irregularities with this table? File a report.

Annual time series of changes in accumulated depreciation.

Most-recent data:

2021

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part H - Section B)

Primary key:

report_date, borrower_id_rus, depreciation_and_amortization_group, depreciation_and_amortization_item

Usage Warnings#

  • Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_depreciation_changes.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_depreciation_changes.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus12__yearly_depreciation_changes.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_depreciation_changes.parquet"
)

Columns#

Field Name

Type

Description

accruals

number

Value of additions into an asset class a.k.a accruals.

adjustments_and_transfers

number

Amount of adjustments and transfers within a class of assets.

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

borrower_name_rus

string

The name of the RUS (Rural Utilities Service) borrower.

composite_depreciation_rate

number

The composite depreciation rate within a given category. This is typically expressed as a number between 0 and 100.

depreciation_and_amortization_group

string

High-level category of depreciation and amortization items.

depreciation_and_amortization_item

string

Category of depreciation and amortization items.

ending_balance

number

Account balance at end of year.

is_total

boolean

Whether or not this record represents a total.

report_date

date

Date reported.

retirements_less_net_salvage

number

Cost of retirements minus any net salvage value.

state

string

Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

out_rus12__yearly_depreciation_misc#

Noticed any irregularities with this table? File a report.

Annual time series of accumulated depreciation.

Most-recent data:

2021

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part H - 2nd part of Section B)

Primary key:

report_date, borrower_id_rus, depreciation_and_amortization_item

Usage Warnings#

  • Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Additional Details#

This is the second half of the depreciation table. It includes ending balances of six depreciation and amortization values.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_depreciation_misc.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_depreciation_misc.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus12__yearly_depreciation_misc.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_depreciation_misc.parquet"
)

Columns#

Field Name

Type

Description

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

borrower_name_rus

string

The name of the RUS (Rural Utilities Service) borrower.

depreciation_and_amortization_item

string

Category of depreciation and amortization items.

ending_balance

number

Account balance at end of year.

report_date

date

Date reported.

state

string

Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

out_rus12__yearly_external_financial_risk_ratio#

Noticed any irregularities with this table? File a report.

Annual time series of ratio of investments and loan guarantee balances to total utility plant assets.

Most-recent data:

2021

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part H - Section F - Subsection III)

Primary key:

report_date, borrower_id_rus

Usage Warnings#

  • Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_external_financial_risk_ratio.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_external_financial_risk_ratio.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus12__yearly_external_financial_risk_ratio.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_external_financial_risk_ratio.parquet"
)

Columns#

Field Name

Type

Description

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

borrower_name_rus

string

The name of the RUS (Rural Utilities Service) borrower.

external_financial_risk_ratio

number

total investments + loan guarantee balances / total utility plant assets. This ratio shows how much a utility is financially exposed to outside entities relative to its own assets.

report_date

date

Date reported.

state

string

Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

out_rus12__yearly_investments#

Noticed any irregularities with this table? File a report.

Annual time series of investments, loan guarantees and loans.

Most-recent data:

2021

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part H - Section F, Sub-section I)

Primary key:

This table has no primary key. This is a list of all investments or loans in each year and borrowers can have multiple records with the same investment_description.

Usage Warnings#

  • Some columns contain subtotals; use caution when choosing columns to aggregate.

  • Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Additional Details#

Reporting of investments is required by 7 CFR 1717, Subpart N. Investment categories reported on this Part correspond to Balance Sheet items in Part A Section B in table core_rus12__yearly_balance_sheet_assets.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_investments.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_investments.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus12__yearly_investments.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_investments.parquet"
)

Columns#

Field Name

Type

Description

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

borrower_name_rus

string

The name of the RUS (Rural Utilities Service) borrower.

excluded_investments

number

Excluded investment.

for_rural_development

boolean

Whether or not the investment or loan is for rural development. This includes investments in any/all types of projects or products that were made to improve the economy and/or quality of life in the specified area.

included_investments

number

Included investment.

income_or_loss

number

Income or loss from investment.

investment_description

string

Description of investment. This is a free-form text field and thus contains a wide variety of values.

investment_type_code

integer

Investment type code.

report_date

date

Date reported.

state

string

Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

out_rus12__yearly_lines_stations_labor_materials_cost#

Noticed any irregularities with this table? File a report.

Annual time series of labor and material cost for lines and stations operated by RUS borrowers.

Most-recent data:

2021

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part I - Section C)

Primary key:

report_date, borrower_id_rus, labor_or_material, operation_or_maintenance, lines_or_stations

Usage Warnings#

  • Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_lines_stations_labor_materials_cost.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_lines_stations_labor_materials_cost.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus12__yearly_lines_stations_labor_materials_cost.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_lines_stations_labor_materials_cost.parquet"
)

Columns#

Field Name

Type

Description

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

borrower_name_rus

string

The name of the RUS (Rural Utilities Service) borrower.

cost

number

Cost value.

labor_or_material

string

Indicates whether the cost reported is for labor or material.

lines_or_stations

string

Indicates whether the cost reported pertains to lines or stations.

operation_or_maintenance

string

Indicates whether the expenditure is for operation or maintenance.

report_date

date

Date reported.

state

string

Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

out_rus12__yearly_loans#

Noticed any irregularities with this table? File a report.

Annual time series of loans provided by RUS borrowers.

Most-recent data:

2021

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part H - Section F - Subsections II & IV)

Primary key:

This table has no primary key. This table has no primary key because some borrowers report multiple loan values from the same entity in a given year.

Usage Warnings#

  • Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Additional Details#

This table also includes loan guarantees where the RUS borrower backs a loan from another entity and is therefore liable to pay any remaining balance should the original borrower default.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_loans.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_loans.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus12__yearly_loans.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_loans.parquet"
)

Columns#

Field Name

Type

Description

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

borrower_name_rus

string

The name of the RUS (Rural Utilities Service) borrower.

for_rural_development

boolean

Whether or not the investment or loan is for rural development. This includes investments in any/all types of projects or products that were made to improve the economy and/or quality of life in the specified area.

is_loan_guarantee

boolean

Indicates a third-party loan that the reporting utility (referred as a borrower) has co-signed, taking on responsibility for repayment if the primary borrower defaults.

loan_balance

number

The amount of money still owned on a loan at the end of the reporting year.

loan_maturity_date

date

The date on which a loan is scheduled to be fully paid.

loan_original_amount

number

The original amount of a loan.

loan_recipient

string

The organization that received a loan.

report_date

date

Date reported.

state

string

Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

out_rus12__yearly_long_term_debt#

Noticed any irregularities with this table? File a report.

Annual time series of long-term debt and debt service requirements for RUS borrowers.

Most-recent data:

2021

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part H - Section H)

Primary key:

This table has no primary key. This table has no primary key because some borrowers report multiple debt values from the same entity in a given year.

Usage Warnings#

  • Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_long_term_debt.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_long_term_debt.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus12__yearly_long_term_debt.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_long_term_debt.parquet"
)

Columns#

Field Name

Type

Description

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

borrower_name_rus

string

The name of the RUS (Rural Utilities Service) borrower.

debt_description

string

Description of debt or loan. On the original form, there are nine provided descriptions and a section to add other free-form descriptions.

debt_ending_balance

number

The amount of principal still owned on the debt at the end of the report year.

debt_interest

number

The interest expense on the debt for the report year.

debt_principal

number

The principal paid on the debt during the report year.

debt_total

number

The total amount of debt.

report_date

date

Date reported.

state

string

Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

out_rus12__yearly_meeting_and_board#

Noticed any irregularities with this table? File a report.

Annual time series of annual meeting and board information for RUS borrowers.

Most-recent data:

2021

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part H - Section I)

Primary key:

report_date, borrower_id_rus

Usage Warnings#

  • Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_meeting_and_board.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_meeting_and_board.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus12__yearly_meeting_and_board.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_meeting_and_board.parquet"
)

Columns#

Field Name

Type

Description

board_members_num

integer

The total number of board members.

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

borrower_name_rus

string

The name of the RUS (Rural Utilities Service) borrower.

does_manager_have_written_contract

boolean

Whether or not the RUS borrower’s manager has a written contract.

fees_and_expenses_for_board_members

integer

The total amount of fees and expenses for board members.

last_annual_meeting_date

datetime

The date of the last annual meeting.

members_num

integer

The total number of members.

members_present_at_meeting_num

integer

The number of members present at the last annual meeting.

members_voting_by_proxy_or_mail_num

integer

The number of members voting by mail or by proxy.

report_date

date

Date reported.

state

string

Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

was_quorum_present

boolean

Whether or not quorum was met.

out_rus12__yearly_non_utility_plant_changes#

Noticed any irregularities with this table? File a report.

Annual time series of changes in non-utility plant.

Most-recent data:

2021

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part H - Section C)

Primary key:

report_date, borrower_id_rus, non_utility_plant_item

Usage Warnings#

  • Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_non_utility_plant_changes.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_non_utility_plant_changes.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus12__yearly_non_utility_plant_changes.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_non_utility_plant_changes.parquet"
)

Columns#

Field Name

Type

Description

additions

number

Cost of acquisition of items classified within the account.

adjustments_and_transfers

number

Amount of adjustments and transfers within a class of assets.

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

borrower_name_rus

string

The name of the RUS (Rural Utilities Service) borrower.

ending_balance

number

Account balance at end of year.

non_utility_plant_item

string

Category describing non-utility plant asset items.

report_date

date

Date reported.

retirements

number

Cost of disposal of items classified within the account.

state

string

Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

out_rus12__yearly_plant_costs#

Noticed any irregularities with this table? File a report.

Annual time series of costs of net energy generated by plant.

Most-recent data:

2021

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part F - Section D)

Primary key:

This table has no primary key. This table has no primary key because there is one plant (Walter Scott) that has duplicate records every year. Based on other RUS-12 tables, it is likely that these duplicate records are the borrowers’ ownership portion of the plant as well as the total plant but there is no clear indication in this table.

The primary key of this table otherwise would be: [‘report_date’, ‘borrower_id_rus’, ‘plant_name_rus’, ‘cost_group’, ‘cost_type’].

Usage Warnings#

  • Some columns contain subtotals; use caution when choosing columns to aggregate.

  • Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Additional Details#

The cost column in this table is expected to be largely non-null, the cost_per_mwh and cost_per_mmbtu columns only apply to some cost_type’s and even plant_type’s and thus are expected to contain many nulls.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_plant_costs.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_plant_costs.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus12__yearly_plant_costs.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_plant_costs.parquet"
)

Columns#

Field Name

Type

Description

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

borrower_name_rus

string

The name of the RUS (Rural Utilities Service) borrower.

cost

number

Cost value.

cost_group

string

High-level category of cost type.

cost_per_mmbtu

number

Unit cost of energy production in cost per MMBTU

cost_per_mwh

number

Unit cost of energy production in cost per MWh

cost_type

string

Detailed category of cost type.

is_total

boolean

Whether or not this record represents a total.

plant_name_rus

string

Name of the plant as reported to RUS.

plant_type

string

Type of plant.

report_date

date

Date reported.

state

string

Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

out_rus12__yearly_plant_factors_and_maximum_demand#

Noticed any irregularities with this table? File a report.

Annual time series of plant factors and maximum demand for plants owned by RUS borrowers.

Most-recent data:

2021

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Parts D, E, F IC, F CC, G - Section C)

Primary key:

This table has no primary key. The primary key for this table would be report_date, borrower_id_rus, plant_name_rus, and plant_type but the plant_type column is not reported before 2009 causing lots of duplicate records. Because none of the other fields are static year-over-year, there is no reliable way to backfill the plant_type column in instances where there are different plant_types reported for the same plant in later years.

Usage Warnings#

  • Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Additional Details#

The plant_type field was only included in the data starting in 2009. We backfill records with only one reported plant_type per borrower/plant name/year. Records with multiple plant_types per borrower/plant name/year are left with null plant_type because there is no reliable way to determine which plant_type is associated with which record back in time. Plants who’s names are reported differently pre and post 2009 will also miss being backfilled.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_plant_factors_and_maximum_demand.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_plant_factors_and_maximum_demand.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus12__yearly_plant_factors_and_maximum_demand.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_plant_factors_and_maximum_demand.parquet"
)

Columns#

Field Name

Type

Description

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

borrower_name_rus

string

The name of the RUS (Rural Utilities Service) borrower.

capacity_factor

number

Fraction of potential generation that was actually reported for a plant part. Energy generated over time period / nameplate capacity * time period (hours/years/etc.).

capacity_factor_running

number

Fraction of potential generation over the time period a plant was in operation. Energy generated over time period / nameplate capacity * time period (hours/years/etc.).

load_factor

number

Fraction of consumption vs demand reported for a plant over a given timeframe. Energy consumed over time period / peak demand * time period (hours/years/etc.).

peak_gross_demand_mw

number

The highest average power output recorded over any single 15 minute interval during the reporting period.

peak_gross_demand_nameplate_mw

number

The theoretical or nameplate peak the plant could produce under the best operating conditions during the reporting period.

plant_name_rus

string

Name of the plant as reported to RUS.

plant_type

string

Type of plant.

report_date

date

Date reported.

state

string

Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

out_rus12__yearly_plant_labor#

Noticed any irregularities with this table? File a report.

Annual time series of labor and payroll information for plants owned by RUS borrowers.

Most-recent data:

2021

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Parts D, E, F, G - Section B)

Primary key:

This table has no primary key. The primary key should be report_date, borrower_id_rus, plant_name_rus, and plant_type, but this table did not report plant_type before 2009 and there are respondents who report multiple rows per plant pre-2009. The data cannot be backfilled because there is no way to distinguish between duplicate rows pre-2009.

Usage Warnings#

  • Some columns contain subtotals; use caution when choosing columns to aggregate.

  • Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Additional Details#

Note the lack of plant_type pre-2009 leading to a lack of reliable primary keys.

Plant-level data from core_rus12__yearly_plant_operations_by_plant can be matched to data from this table, but you must also join this data usingthe borrower_id_rus column as well as the report_date and plant_name. This is because the core_rus12__yearly_plant_operations_by_plant table includes records from each plant owner. The RUS instructions note that:

you should use caution when using total plant data since there are cases where more than one Borrower shares units at the same plant which means that you will be getting duplicate plant total records (and there is no guarantee that the total plant records entered by two borrowers for the same plant will be identical).

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_plant_labor.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_plant_labor.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus12__yearly_plant_labor.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_plant_labor.parquet"
)

Columns#

Field Name

Type

Description

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

borrower_name_rus

string

The name of the RUS (Rural Utilities Service) borrower.

employee_hours_worked_total

number

Total number of hours worked by employees.

employees_full_time_num

integer

Number of employees hired full-time for normal operations of the system.

employees_part_time_num

integer

Number employees regularly employed on a part-time basis. Exclude employees hired for short periods of time to complete special jobs.

payroll_maintenance

number

The amount of payroll spent on plant maintenance.

payroll_operations

number

The amount of payroll spent on plant operations.

payroll_other_accounts

number

The amount of plant payroll spent on accounts other than maintenance and operations.

plant_name_rus

string

Name of the plant as reported to RUS.

plant_type

string

Type of plant.

report_date

date

Date reported.

state

string

Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

out_rus12__yearly_plant_operations_by_borrower#

Noticed any irregularities with this table? File a report.

Annual time series of borrower portion of plant operational data including fuel consumption and operational hours.

Most-recent data:

2021

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part D, E, F (CC), F (IC) & G - Section A)

Primary key:

This table has no primary key. This table has no primary key because there are a handful of plants that have duplicate records. The primary key of this table otherwise would be: [report_date, borrower_id_rus, plant_name_rus, plant_name_rus, unit_id_rus, plant_type, is_full_ownership_portion, is_partly_owned_by_borrower].

Usage Warnings#

  • Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Additional Details#

The data in this table comes from five different portions of RUS 12 corresponding to different plant types (steam, hydroelectric, combined_cycle, internal_combustion and nuclear). The original form includes plant operations data for each plant type reported in multiple ways: with records corresponding to the portion of plants that borrowers own as well as the whole plant. We split these records into two tables, core_rus12__yearly_plant_operations_by_borrower, which contains records reported at the borrower level, and core_rus12__yearly_plant_operations_by_plant, which contains records reported at the plant level.Records that are wholly owned by one borrower show up in both core_rus12__yearly_plant_operations_by_borrower and core_rus12__yearly_plant_operations_by_plant.

There are two boolean columns used to delineate which records are associated with the borrowers’ share vs the whole plant - which is documented in _OR_PowerSupply Plant File Documentation.rtf in the newer years in the RUS 12 archive. One of these two fields - is_partly_owned_by_borrower - was not reported before 2009. For the pre-2009 years, we assume that all records that report TRUE for is_full_ownership_portion should end up in the by-plant table while all records should end up in the by-borrower portion of the table.Like the post-2009 records, this involves records from the original tables ending up in both of these PUDL tables.

Also there are two cleaning steps that we performed that alter the original data slightly:

  • One dropped with unexpected ownership label and duplicate data: There is a Wisdom steam plant record that is labeled to be both fully owned by borrower and partly owned for one year, which is an unexpected combo based on the _OR_PowerSupply Plant File Documentation.rst documentation file in the rus12 archive. Luckily this plant has exactly the same records as the other Wisdom steam plant that year with more expected ownership labels.* Replaced two string values for unit_id_rus: There are two instances from 2018 of unit_id_rus’s that have string values in them - WSL GT 12 and WSL ST 10. Based on pre-cleaned data, we were able to clearly identify that we can use just the numeric values in these bad strings - 12 and 10 respectively. This enables us to have an integer type for this unit_id_rus column.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_plant_operations_by_borrower.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_plant_operations_by_borrower.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus12__yearly_plant_operations_by_borrower.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_plant_operations_by_borrower.parquet"
)

Columns#

Field Name

Type

Description

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

borrower_name_rus

string

The name of the RUS (Rural Utilities Service) borrower.

capacity_mw

number

Total installed (nameplate) capacity, in megawatts.

fuel_consumption_coal_lbs

number

Annual pounds of coal consumed for fuel.This field is only reported for plant_type steam.

fuel_consumption_gas_cubic_feet

number

Annual cubic feet of natural gas consumed for fuel.This field is only reported for plant_type’s combined_cycle, combined_cycle and steam.

fuel_consumption_oil_gallons

number

Annual gallons of oil consumed for fuel.This field is only reported for plant_type’s combined_cycle, combined_cycle and steam.

fuel_consumption_other

number

Annual other fuel consumed. Neither units nor type of fuel are documented.This field is only reported for plant_type’s combined_cycle, combined_cycle and steam.

gross_generation_mwh

number

Gross electricity generation for the specified period in megawatt-hours (MWh).

is_full_ownership_portion

boolean

Whether or not the plant record represents the full plant - regardless of whether its fully owned by the borrower.

is_partly_owned_by_borrower

boolean

Whether or not the plant record is partially owned by the borrower. This column was not reported before 2009.

operating_hours_in_service

number

Number of operating hours in service.

operating_hours_on_standby

number

Number of operating hours on standby.

operating_hours_out_of_service_scheduled

number

Number of operating hours out of service which were scheduled.

operating_hours_out_of_service_unscheduled

number

Number of operating hours out of service which were unscheduled.

ownership_pct

number

Percentage of the plant owned by the respondent.

plant_name_rus

string

Name of the plant as reported to RUS.

plant_type

string

Type of plant.

report_date

date

Date reported.

state

string

Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

times_started

number

Number of times the plant was started. This field is only reported for plant_type’s steam and nuclear.

unit_id_rus

integer

RUS-assigned unit identification code.

out_rus12__yearly_plant_operations_by_plant#

Noticed any irregularities with this table? File a report.

Annual time series of whole plant operational data including fuel consumption and operational hours.

Most-recent data:

2021

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part D, E, F (CC), F (IC) & G - Section A)

Primary key:

This table has no primary key. This table has no primary key because there are a handful of plants that have duplicate records. The primary key of this table otherwise would be: [report_date, borrower_id_rus, plant_name_rus, plant_name_rus, unit_id_rus, plant_type, is_full_ownership_portion, is_partly_owned_by_borrower].

Usage Warnings#

  • Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Additional Details#

The data in this table comes from five different portions of RUS 12 corresponding to different plant types (steam, hydroelectric, combined_cycle, internal_combustion and nuclear). The original form includes plant operations data for each plant type reported in multiple ways: with records corresponding to the portion of plants that borrowers own as well as the whole plant. We split these records into two tables, core_rus12__yearly_plant_operations_by_borrower, which contains records reported at the borrower level, and core_rus12__yearly_plant_operations_by_plant, which contains records reported at the plant level.Records that are wholly owned by one borrower show up in both core_rus12__yearly_plant_operations_by_borrower and core_rus12__yearly_plant_operations_by_plant.

There are two boolean columns used to delineate which records are associated with the borrowers’ share vs the whole plant - which is documented in _OR_PowerSupply Plant File Documentation.rtf in the newer years in the RUS 12 archive. One of these two fields - is_partly_owned_by_borrower - was not reported before 2009. For the pre-2009 years, we assume that all records that report TRUE for is_full_ownership_portion should end up in the by-plant table while all records should end up in the by-borrower portion of the table.Like the post-2009 records, this involves records from the original tables ending up in both of these PUDL tables.

Also there are two cleaning steps that we performed that alter the original data slightly:

  • One dropped with unexpected ownership label and duplicate data: There is a Wisdom steam plant record that is labeled to be both fully owned by borrower and partly owned for one year, which is an unexpected combo based on the _OR_PowerSupply Plant File Documentation.rst documentation file in the rus12 archive. Luckily this plant has exactly the same records as the other Wisdom steam plant that year with more expected ownership labels.* Replaced two string values for unit_id_rus: There are two instances from 2018 of unit_id_rus’s that have string values in them - WSL GT 12 and WSL ST 10. Based on pre-cleaned data, we were able to clearly identify that we can use just the numeric values in these bad strings - 12 and 10 respectively. This enables us to have an integer type for this unit_id_rus column..

Plant-level data can be matched to the :ref:core_rus12__yearly_plant_labor` and forthcoming core_rus12__yearly_plant_factors_and_maximum_demand, which report Section B and C data for all plants. The RUS instructions note that:

you should use caution when using total plant data since there are cases where more than one Borrower shares units at the same plant which means that you will be getting duplicate plant total records (and there is no guarantee that the total plant records entered by two borrowers for the same plant will be identical).

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_plant_operations_by_plant.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_plant_operations_by_plant.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus12__yearly_plant_operations_by_plant.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_plant_operations_by_plant.parquet"
)

Columns#

Field Name

Type

Description

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

borrower_name_rus

string

The name of the RUS (Rural Utilities Service) borrower.

capacity_mw

number

Total installed (nameplate) capacity, in megawatts.

fuel_consumption_coal_lbs

number

Annual pounds of coal consumed for fuel.This field is only reported for plant_type steam.

fuel_consumption_gas_cubic_feet

number

Annual cubic feet of natural gas consumed for fuel.This field is only reported for plant_type’s combined_cycle, combined_cycle and steam.

fuel_consumption_oil_gallons

number

Annual gallons of oil consumed for fuel.This field is only reported for plant_type’s combined_cycle, combined_cycle and steam.

fuel_consumption_other

number

Annual other fuel consumed. Neither units nor type of fuel are documented.This field is only reported for plant_type’s combined_cycle, combined_cycle and steam.

gross_generation_mwh

number

Gross electricity generation for the specified period in megawatt-hours (MWh).

is_partly_owned_by_borrower

boolean

Whether or not the plant record is partially owned by the borrower. This column was not reported before 2009.

operating_hours_in_service

number

Number of operating hours in service.

operating_hours_on_standby

number

Number of operating hours on standby.

operating_hours_out_of_service_scheduled

number

Number of operating hours out of service which were scheduled.

operating_hours_out_of_service_unscheduled

number

Number of operating hours out of service which were unscheduled.

ownership_pct

number

Percentage of the plant owned by the respondent.

plant_name_rus

string

Name of the plant as reported to RUS.

plant_type

string

Type of plant.

report_date

date

Date reported.

state

string

Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

times_started

number

Number of times the plant was started. This field is only reported for plant_type’s steam and nuclear.

unit_id_rus

integer

RUS-assigned unit identification code.

out_rus12__yearly_renewable_plants#

Noticed any irregularities with this table? File a report.

Annual time series of renewable energy plant generation information for RUS borrowers.

Most-recent data:

2021

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part C RE)

Primary key:

report_date, borrower_id_rus, plant_name_rus, prime_mover_id

Usage Warnings#

  • Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_renewable_plants.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_renewable_plants.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus12__yearly_renewable_plants.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_renewable_plants.parquet"
)

Columns#

Field Name

Type

Description

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

borrower_name_rus

string

The name of the RUS (Rural Utilities Service) borrower.

capacity_factor

number

Fraction of potential generation that was actually reported for a plant part. Energy generated over time period / nameplate capacity * time period (hours/years/etc.).

capacity_mw

number

Total installed (nameplate) capacity, in megawatts.

comments

string

General comments field.

employees_num

integer

Number of employees.

invested

number

The amount of money invested.

net_generation_mwh

number

Net electricity generation for the specified period in megawatt-hours (MWh).

opex_per_mwh

number

Total production expenses (USD per MWh generated).

ownership_pct

number

Percentage of the plant owned by the respondent.

plant_name_rus

string

Name of the plant as reported to RUS.

power_cost_per_mwh

number

The cost of power per mwh.

primary_renewable_fuel_type

string

Primary renewable fuel type used by the plant.

primary_renewable_fuel_type_id

integer

Unique numeric identifier for each renewable fuel type.

prime_mover_id

integer

Unique numeric identifier for each prime mover type used by RUS borrowers.

prime_mover_type

string

Type of prime mover (e.g. Hydro, Internal Combustion).

renewable_fuel_pct

number

Percentage of renewable fuel used.

report_date

date

Date reported.

rus_funding

number

Amount of funding received from the Rural Utilities Service (RUS).

state

string

Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

out_rus12__yearly_sources_and_distribution#

Noticed any irregularities with this table? File a report.

Annual time series of MWh and cost of energy sources and distribution by RUS borrowers.

Most-recent data:

2021

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part C)

Primary key:

report_date, borrower_id_rus, source_of_energy

Usage Warnings#

  • Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Additional Details#

See the sources_and_distribution_by_plant_type table for a breakdown of plant-type-specific cost, capacity, plant_num, and net_energy_received values. Also note that there are several source_of_energy values that don’t have a corresponding cost value.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_sources_and_distribution.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_sources_and_distribution.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus12__yearly_sources_and_distribution.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_sources_and_distribution.parquet"
)

Columns#

Field Name

Type

Description

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

borrower_name_rus

string

The name of the RUS (Rural Utilities Service) borrower.

cost

number

Cost value.

net_energy_received_mwh

number

The net amount of energy received into the system.

report_date

date

Date reported.

source_of_energy

string

The source of energy (not plant type).

state

string

Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

out_rus12__yearly_sources_and_distribution_by_plant_type#

Noticed any irregularities with this table? File a report.

Annual time series of capacity, plant num, MWh, and cost of energy by plant type for RUS borrowers.

Most-recent data:

2021

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part C)

Primary key:

report_date, borrower_id_rus, plant_type

Usage Warnings#

  • Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Additional Details#

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_sources_and_distribution_by_plant_type.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_sources_and_distribution_by_plant_type.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus12__yearly_sources_and_distribution_by_plant_type.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_sources_and_distribution_by_plant_type.parquet"
)

Columns#

Field Name

Type

Description

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

borrower_name_rus

string

The name of the RUS (Rural Utilities Service) borrower.

capacity_mw

number

Total installed (nameplate) capacity, in megawatts.

cost

number

Cost value.

net_energy_received_mwh

number

The net amount of energy received into the system.

plant_num

integer

Number of plants.

plant_type

string

Type of plant.

report_date

date

Date reported.

state

string

Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

out_rus12__yearly_statement_of_operations#

Noticed any irregularities with this table? File a report.

Annual time series of opex and cost of electric service for RUS borrowers by time period.

Most-recent data:

2021

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part A - Section A)

Primary key:

report_date, borrower_id_rus, opex_group, opex_type

Usage Warnings#

  • Some columns contain subtotals; use caution when choosing columns to aggregate.

  • Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_statement_of_operations.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_statement_of_operations.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus12__yearly_statement_of_operations.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_statement_of_operations.parquet"
)

Columns#

Field Name

Type

Description

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

borrower_name_rus

string

The name of the RUS (Rural Utilities Service) borrower.

is_total

boolean

Whether or not this record represents a total.

opex_group

string

High level section from the statement of operations table. Most of these types have subcomponents broken out in the opex_type column.

opex_report_month

number

Amount of operational expense, cost or income during the report month.

opex_type

string

Type of item from the statement of operations.

opex_ytd

number

The year-to-date amount of operational expense, cost or income.

opex_ytd_budget

number

The year-to-date budget for amount of operational expense, cost or income.

report_date

date

Date reported.

state

string

Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

out_rus12__yearly_utility_plant_changes#

Noticed any irregularities with this table? File a report.

Annual time series of changes in utility plant

Most-recent data:

2021

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part H - Section A)

Primary key:

report_date, borrower_id_rus, utility_plant_group, utility_plant_item

Usage Warnings#

  • Some columns contain subtotals; use caution when choosing columns to aggregate.

  • Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Additional Details#

There are a few layers of nesting in this table, which can be distinguished by the categories in the utility_plant_group column:

  • intangible_plant: only has one total utility_plant_item

  • production_plant: has several sub-component utility_plant_item and a standard total

  • transmission_plant: has several sub-component utility_plant_item and a standard total

  • distribution_plant: has several sub-component utility_plant_item and a standard total

  • electric_plant_in_service: has two sub-components and a total which includes those sub-components and the totals of the above categories

  • utility_plant_in_service: has several sub-components and the total is the sum of those sub-components as well as the total of electric_plant_in_service

  • total_utility_plant: has one sub-component and the total is the sum of the sub-component and utility_plant_in_service

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_utility_plant_changes.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_utility_plant_changes.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus12__yearly_utility_plant_changes.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus12__yearly_utility_plant_changes.parquet"
)

Columns#

Field Name

Type

Description

additions

number

Cost of acquisition of items classified within the account.

adjustments_and_transfers

number

Amount of adjustments and transfers within a class of assets.

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

borrower_name_rus

string

The name of the RUS (Rural Utilities Service) borrower.

ending_balance

number

Account balance at end of year.

is_total

boolean

Whether or not this record represents a total.

report_date

date

Date reported.

retirements

number

Cost of disposal of items classified within the account.

state

string

Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

utility_plant_group

string

High-level category of utility plant asset type.

utility_plant_item

string

Sub-category of utility_plant_group describing utility plant asset item.

out_rus7__yearly_balance_sheet_assets#

Noticed any irregularities with this table? File a report.

Annual time series of assets and other debts from the balance sheet.

Most-recent data:

2021

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part C)

Primary key:

report_date, borrower_id_rus, asset_type

Usage Warnings#

  • Some columns contain subtotals; use caution when choosing columns to aggregate.

  • Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_balance_sheet_assets.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_balance_sheet_assets.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus7__yearly_balance_sheet_assets.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_balance_sheet_assets.parquet"
)

Columns#

Field Name

Type

Description

asset_type

string

Type of asset being reported to the core_ferc1__yearly_balance_sheet_assets_sched110 table.

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

borrower_name_rus

string

The name of the RUS (Rural Utilities Service) borrower.

ending_balance

number

Account balance at end of year.

is_total

boolean

Whether or not this record represents a total.

report_date

date

Date reported.

state

string

Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

out_rus7__yearly_balance_sheet_liabilities#

Noticed any irregularities with this table? File a report.

Annual time series of liabilities and other credits from the balance sheet.

Most-recent data:

2021

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part C)

Primary key:

report_date, borrower_id_rus, liability_type

Usage Warnings#

  • Some columns contain subtotals; use caution when choosing columns to aggregate.

  • Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_balance_sheet_liabilities.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_balance_sheet_liabilities.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus7__yearly_balance_sheet_liabilities.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_balance_sheet_liabilities.parquet"
)

Columns#

Field Name

Type

Description

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

borrower_name_rus

string

The name of the RUS (Rural Utilities Service) borrower.

ending_balance

number

Account balance at end of year.

is_total

boolean

Whether or not this record represents a total.

liability_type

string

Type of liability being reported to the core_ferc1__yearly_balance_sheet_liabilities_sched110 table.

report_date

date

Date reported.

state

string

Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

out_rus7__yearly_customer_energy_efficiency_and_conservation_loans#

Noticed any irregularities with this table? File a report.

Annual time series of the repayment status of loans made by a borrower to customers for investments in energy efficiency and conservation initiatives.

Most-recent data:

2021

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part J)

Primary key:

report_date, borrower_id_rus, loan_status

Usage Warnings#

  • Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Additional Details#

Energy Resources Conservation (ERC) loans are defined by the RUS as loans made by an RUS borrower to its consumers for the cost of labor and materials for the following energy conservation measures: caulking, weather-stripping, ceiling insulation, wall insulation, floor insulation, duct insulation, pipe insulation, water heater insulation storm windows, thermal windows, storm or thermal doors, clock thermostats and attic ventilation fans.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_customer_energy_efficiency_and_conservation_loans.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_customer_energy_efficiency_and_conservation_loans.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus7__yearly_customer_energy_efficiency_and_conservation_loans.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_customer_energy_efficiency_and_conservation_loans.parquet"
)

Columns#

Field Name

Type

Description

actual_pct

number

Observed percentage.

anticipated_pct

number

Expected percentage.

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

borrower_name_rus

string

The name of the RUS (Rural Utilities Service) borrower.

loan_status

string

The repayment status of a loan.

report_date

date

Date reported.

state

string

Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

ytd_dollars

number

Balance this current year, in U.S. dollars.

out_rus7__yearly_distribution_services#

Noticed any irregularities with this table? File a report.

Annual time series of distribution services.

Most-recent data:

2021

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part B)

Primary key:

report_date, borrower_id_rus, service_status

Usage Warnings#

  • Some columns contain subtotals; use caution when choosing columns to aggregate.

  • Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_distribution_services.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_distribution_services.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus7__yearly_distribution_services.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_distribution_services.parquet"
)

Columns#

Field Name

Type

Description

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

borrower_name_rus

string

The name of the RUS (Rural Utilities Service) borrower.

is_total

boolean

Whether or not this record represents a total.

report_date

date

Date reported.

service_status

string

Status of services (e.g., idle, retired) in report period. Idle services exclude seasonals.

services

number

Number of services in system at end of year.

state

string

Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

out_rus7__yearly_employee_statistics#

Noticed any irregularities with this table? File a report.

Annual time series of statistics about employment and payroll.

Most-recent data:

2021

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part H)

Primary key:

report_date, borrower_id_rus

Usage Warnings#

  • Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_employee_statistics.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_employee_statistics.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus7__yearly_employee_statistics.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_employee_statistics.parquet"
)

Columns#

Field Name

Type

Description

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

borrower_name_rus

string

The name of the RUS (Rural Utilities Service) borrower.

employee_hours_worked_over_time

integer

The number of overtime hours worked by employees.

employee_hours_worked_regular_time

integer

The number of regular (non-overtime) hours worked by employees.

employees_fte_num

integer

The number of full time employees.

payroll_capitalized

integer

The amount of payroll spent that was capitalized.

payroll_expensed

integer

The amount of payroll spent that was expensed.

payroll_other

integer

The amount of payroll spent that was funded by other means - not capitalized or expensed.

report_date

date

Date reported.

state

string

Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

out_rus7__yearly_energy_efficiency#

Noticed any irregularities with this table? File a report.

Annual time series of investments in and impacts of energy efficiency programs.

Most-recent data:

2021

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part P)

Primary key:

report_date, borrower_id_rus, customer_class, observation_period

Usage Warnings#

  • The savings_mmbtu likely contains values with incorrect units.

  • Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_energy_efficiency.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_energy_efficiency.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus7__yearly_energy_efficiency.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_energy_efficiency.parquet"
)

Columns#

Field Name

Type

Description

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

borrower_name_rus

string

The name of the RUS (Rural Utilities Service) borrower.

customer_class

string

High level categorization of customer type.

customers_num

number

Number of customers.

invested

number

The amount of money invested.

observation_period

string

The date range that any given record pertains to. Ex: ‘december’ implies that this record covers the month of December only, while ‘avg’ implies this record pertains to the average of the reporting period.

report_date

date

Date reported.

savings_mmbtu

number

The estimated amount of energy savings from energy efficiency programs. Warning: We found values much larger than expected that we have not yet cleaned - this is likely a reporting unit error.

state

string

Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

out_rus7__yearly_energy_purchased#

Noticed any irregularities with this table? File a report.

Annual time series of energy purchased by RUS borrowers.

Most-recent data:

2020

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part K)

Primary key:

This table has no primary key. The primary key would probably be report_date, borrower_id_rus, fuel_type_code, supplier_code_rus, renewable_energy_program if not for certain EIA utilities represented as Miscellaneous (supplier code 700000).

Usage Warnings#

  • Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Additional Details#

A number of records report purchases from an unspecified entity. These fields all have reported utility names beginning with an *, including *Miscellaneous, *Adjustments, and *Residential Renewable Supplier.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_energy_purchased.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_energy_purchased.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus7__yearly_energy_purchased.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_energy_purchased.parquet"
)

Columns#

Field Name

Type

Description

average_energy_cost_dollars_per_mwh

number

The average cost of energy per MWh.

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

borrower_name_rus

string

The name of the RUS (Rural Utilities Service) borrower.

comments

string

General comments field.

fuel_cost_adjustment

number

The variable fuel surcharge component of a distribution cooperative’s wholesale purchased power bill, reflecting pass-through of actual fuel cost fluctuations from the supplying utility, reported separately from base power charges and wheeling costs. Included in the total cost.

fuel_type_code_rus

integer

Unique numeric identifier for RUS fuel types.

is_supplier_eia_respondent

boolean

Whether the utility supplying energy to a RUS borrower is an EIA respondent.

purchased_energy_cost_total

number

The total cost of purchased energy. Includes fuel cost adjustment and wheeling and other charges.

purchased_mwh

number

The total electricity purchased.

report_date

date

Date reported.

state

string

Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

supplier_code_rus

string

Unique numeric identifier for the utility supplying energy to a RUS borrower.

utility_name_eia

string

The name of the utility.

wheeling_and_other_charges

number

The cost of wheeling and other charges or credits related to fuel. Included in the total cost.

out_rus7__yearly_external_financial_risk_ratio#

Noticed any irregularities with this table? File a report.

Annual time series of ratio of investments and loan guarantee balances to total utility plant assets.

Most-recent data:

2021

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part Q - Section III)

Primary key:

report_date, borrower_id_rus

Usage Warnings#

  • Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_external_financial_risk_ratio.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_external_financial_risk_ratio.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus7__yearly_external_financial_risk_ratio.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_external_financial_risk_ratio.parquet"
)

Columns#

Field Name

Type

Description

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

borrower_name_rus

string

The name of the RUS (Rural Utilities Service) borrower.

external_financial_risk_ratio

number

total investments + loan guarantee balances / total utility plant assets. This ratio shows how much a utility is financially exposed to outside entities relative to its own assets.

report_date

date

Date reported.

state

string

Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

out_rus7__yearly_investments#

Noticed any irregularities with this table? File a report.

Annual time series of investments, loan guarantees and loans.

Most-recent data:

2021

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part P - Section I)

Primary key:

This table has no primary key. This is a list of all investments or loans in each year and borrowers can have multiple records with the same investment_description.

Usage Warnings#

  • Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Additional Details#

Reporting of investments is required by 7 CFR 1717, Subpart N. Investment categories reported on this Part correspond to Balance Sheet items in Part C.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_investments.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_investments.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus7__yearly_investments.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_investments.parquet"
)

Columns#

Field Name

Type

Description

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

borrower_name_rus

string

The name of the RUS (Rural Utilities Service) borrower.

excluded_investments

number

Excluded investment.

for_rural_development

boolean

Whether or not the investment or loan is for rural development. This includes investments in any/all types of projects or products that were made to improve the economy and/or quality of life in the specified area.

included_investments

number

Included investment.

income_or_loss

number

Income or loss from investment.

investment_description

string

Description of investment. This is a free-form text field and thus contains a wide variety of values.

investment_type_code

integer

Investment type code.

report_date

date

Date reported.

state

string

Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

out_rus7__yearly_loans#

Noticed any irregularities with this table? File a report.

Annual time series of loans provided by RUS borrowers.

Most-recent data:

2021

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part Q - Sections II & IV)

Primary key:

This table has no primary key. Borrowers may receive multiple loans from lending_organizations in a given year.

Usage Warnings#

  • Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Additional Details#

This table also includes loan guarantees where the RUS borrower backs a loan from another entity and is therefore liable to pay any remaining balance should the original borrower default.

In 2006, the loan maturity date for borrower ND0051’s loan from ERC - Paulson, David was reported as 2/8/2820. There is no clear way to determine the correct maturity date given that 2006 is the first year of data we have and the same loan does not appear in future years. For this reason we’ve nulled the date.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_loans.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_loans.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus7__yearly_loans.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_loans.parquet"
)

Columns#

Field Name

Type

Description

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

borrower_name_rus

string

The name of the RUS (Rural Utilities Service) borrower.

for_rural_development

boolean

Whether or not the investment or loan is for rural development. This includes investments in any/all types of projects or products that were made to improve the economy and/or quality of life in the specified area.

is_loan_guarantee

boolean

Indicates a third-party loan that the reporting utility (referred as a borrower) has co-signed, taking on responsibility for repayment if the primary borrower defaults.

loan_balance

number

The amount of money still owned on a loan at the end of the reporting year.

loan_maturity_date

date

The date on which a loan is scheduled to be fully paid.

loan_original_amount

number

The original amount of a loan.

loan_recipient

string

The organization that received a loan.

report_date

date

Date reported.

state

string

Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

out_rus7__yearly_long_term_debt#

Noticed any irregularities with this table? File a report.

Annual time series of long term debt and debt service requirements.

Most-recent data:

2021

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part N)

Primary key:

This table has no primary key. This table has no native primary key. It is a list of all debts in each year and borrowers can have multiple records with the same investment_description.

Usage Warnings#

  • Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_long_term_debt.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_long_term_debt.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus7__yearly_long_term_debt.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_long_term_debt.parquet"
)

Columns#

Field Name

Type

Description

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

borrower_name_rus

string

The name of the RUS (Rural Utilities Service) borrower.

debt_description

string

Description of debt or loan. On the original form, there are nine provided descriptions and a section to add other free-form descriptions.

debt_ending_balance

number

The amount of principal still owned on the debt at the end of the report year.

debt_interest

number

The interest expense on the debt for the report year.

debt_principal

number

The principal paid on the debt during the report year.

debt_total

number

The total amount of debt.

report_date

date

Date reported.

state

string

Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

out_rus7__yearly_long_term_leases#

Noticed any irregularities with this table? File a report.

Annual time series of long term leases by property type.

Most-recent data:

2021

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part L)

Primary key:

This table has no primary key. Borrowers may receive multiple leases from lending_organizations in a given year.

Usage Warnings#

  • Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_long_term_leases.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_long_term_leases.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus7__yearly_long_term_leases.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_long_term_leases.parquet"
)

Columns#

Field Name

Type

Description

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

borrower_name_rus

string

The name of the RUS (Rural Utilities Service) borrower.

lending_organization

string

The organization that provided a lease or loan.

property_type

string

The type of property leased.

rental_cost_ytd

number

Year-to-date rental cost for leased property.

report_date

date

Date reported.

state

string

Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

out_rus7__yearly_materials_and_supplies#

Noticed any irregularities with this table? File a report.

Annual time series of cost of electric vs. other materials that were purchased, salvaged, used, or sold.

Most-recent data:

2021

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part F)

Primary key:

report_date, borrower_id_rus, electric_or_other_materials

Usage Warnings#

  • Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_materials_and_supplies.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_materials_and_supplies.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus7__yearly_materials_and_supplies.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_materials_and_supplies.parquet"
)

Columns#

Field Name

Type

Description

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

borrower_name_rus

string

The name of the RUS (Rural Utilities Service) borrower.

electric_or_other_materials

string

Whether the cost is for electric materials or other materials

materials_adjustment

number

An adjustment value for the cost of materials and supplies.

materials_ending_balance

number

The balance at the end of the report year for materials and supplies.

materials_purchased

number

The cost of materials and supplies purchased.

materials_salvaged

number

The cost of materials and supplies salvaged.

materials_sold

number

The cost of materials and supplies sold.

materials_used

number

The cost of materials and supplies used.

report_date

date

Date reported.

state

string

Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

out_rus7__yearly_meeting_and_board#

Noticed any irregularities with this table? File a report.

Annual time series of governance information about RUS borrowers’ annual member meetings as well as information about their board.

Most-recent data:

2021

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part M)

Primary key:

report_date, borrower_id_rus

Usage Warnings#

  • Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_meeting_and_board.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_meeting_and_board.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus7__yearly_meeting_and_board.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_meeting_and_board.parquet"
)

Columns#

Field Name

Type

Description

board_members_num

integer

The total number of board members.

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

borrower_name_rus

string

The name of the RUS (Rural Utilities Service) borrower.

does_manager_have_written_contract

boolean

Whether or not the RUS borrower’s manager has a written contract.

fees_and_expenses_for_board_members

integer

The total amount of fees and expenses for board members.

last_annual_meeting_date

datetime

The date of the last annual meeting.

members_num

integer

The total number of members.

members_present_at_meeting_num

integer

The number of members present at the last annual meeting.

members_voting_by_proxy_or_mail_num

integer

The number of members voting by mail or by proxy.

report_date

date

Date reported.

state

string

Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

was_quorum_present

boolean

Whether or not quorum was met.

out_rus7__yearly_owed_by_customers#

Noticed any irregularities with this table? File a report.

Annual time series of debt owed by customers.

Most-recent data:

2021

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part J)

Primary key:

report_date, borrower_id_rus

Usage Warnings#

  • Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_owed_by_customers.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_owed_by_customers.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus7__yearly_owed_by_customers.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_owed_by_customers.parquet"
)

Columns#

Field Name

Type

Description

amount_due_over_60_days

number

Reported amount of dollars due over 60 days from consumers for electric service. Includes both connected and disconnected customers.

amount_written_off_ytd

number

Total charges due from consumers for electric service written off during the current year to Account 144.1, representing the write-off of uncollectible accounts.

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

borrower_name_rus

string

The name of the RUS (Rural Utilities Service) borrower.

report_date

date

Date reported.

state

string

Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

out_rus7__yearly_patronage_capital#

Noticed any irregularities with this table? File a report.

Annual time series of patronage capital distributed and received.

Most-recent data:

2021

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part I)

Primary key:

report_date, borrower_id_rus, patronage_type

Usage Warnings#

  • Some columns contain subtotals; use caution when choosing columns to aggregate.

  • Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_patronage_capital.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_patronage_capital.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus7__yearly_patronage_capital.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_patronage_capital.parquet"
)

Columns#

Field Name

Type

Description

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

borrower_name_rus

string

The name of the RUS (Rural Utilities Service) borrower.

is_total

boolean

Whether or not this record represents a total.

patronage_cumulative

number

Amount of patronage distributed or received cumulatively. Received patronage capital is not reported cumulatively and thus will be null.

patronage_report_year

number

Amount of patronage distributed or received within report year.

patronage_type

string

Type of patronage capital distributed or received.

report_date

date

Date reported.

state

string

Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

out_rus7__yearly_power_requirements#

Noticed any irregularities with this table? File a report.

Annual time series of power requirements - revenue and generation summary.

Most-recent data:

2021

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part O)

Primary key:

report_date, borrower_id_rus

Usage Warnings#

  • Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Additional Details#

This table includes totals of electric sales and revenue which also appear in core_rus7__yearly_power_requirements_electric_sales with a customer_class of total. This table includes all other power requirements - not broken out by customer class, so we include these electric requirements in this table as well.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_power_requirements.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_power_requirements.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus7__yearly_power_requirements.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_power_requirements.parquet"
)

Columns#

Field Name

Type

Description

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

borrower_name_rus

string

The name of the RUS (Rural Utilities Service) borrower.

electric_sales_mwh

number

Total MWh Sold to electric sales. Total of lines 1b thru 9b on the original form.

electric_sales_revenue

integer

Total Revenue Received From Sales of Electric Energy. Total of lines 1c thru 9c on the original form.

generated_mwh

number

The total electricity generated.

interchange_mwh

number

The net interchange of electricity. The net amount of electricity exchanged in purchases and sales.

is_peak_coincident

boolean

Whether or not the peak_mw is coincident or non-coincident peak.

other_electric_revenue

integer

Electric revenue other than electric_sales_revenue.

own_use_mwh

number

The electricity in MWh used for the borrower’s own internal use.

peak_mw

number

The peak system MWh - the sum of all MW.

purchased_mwh

number

The total electricity purchased.

purchases_and_generation_cost

integer

The cost of purchases and generation of electricity.

report_date

date

Date reported.

state

string

Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

transmission_revenue

integer

Transmission revenue.

out_rus7__yearly_power_requirements_electric_customers#

Noticed any irregularities with this table? File a report.

Annual time series of power requirements - number of customers served by customer type.

Most-recent data:

2021

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part O)

Primary key:

report_date, borrower_id_rus, customer_class, observation_period

Usage Warnings#

  • Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_power_requirements_electric_customers.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_power_requirements_electric_customers.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus7__yearly_power_requirements_electric_customers.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_power_requirements_electric_customers.parquet"
)

Columns#

Field Name

Type

Description

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

borrower_name_rus

string

The name of the RUS (Rural Utilities Service) borrower.

customer_class

string

High level categorization of customer type.

customers_num

number

Number of customers.

observation_period

string

The date range that any given record pertains to. Ex: ‘december’ implies that this record covers the month of December only, while ‘avg’ implies this record pertains to the average of the reporting period.

report_date

date

Date reported.

state

string

Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

out_rus7__yearly_power_requirements_electric_sales#

Noticed any irregularities with this table? File a report.

Annual time series of power requirements - revenue and energy sold by customer type.

Most-recent data:

2021

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part O)

Primary key:

report_date, borrower_id_rus, customer_class

Usage Warnings#

  • Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_power_requirements_electric_sales.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_power_requirements_electric_sales.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus7__yearly_power_requirements_electric_sales.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_power_requirements_electric_sales.parquet"
)

Columns#

Field Name

Type

Description

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

borrower_name_rus

string

The name of the RUS (Rural Utilities Service) borrower.

customer_class

string

High level categorization of customer type.

report_date

date

Date reported.

revenue

number

Amount of revenue.

sales_mwh

number

Quantity of electricity sold in MWh.

state

string

Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

out_rus7__yearly_service_interruptions#

Noticed any irregularities with this table? File a report.

Annual time series of service interruptions by cause.

Most-recent data:

2021

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part G)

Primary key:

report_date, borrower_id_rus, service_interruption_cause, observation_period

Usage Warnings#

  • Some columns contain subtotals; use caution when choosing columns to aggregate.

  • Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_service_interruptions.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_service_interruptions.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus7__yearly_service_interruptions.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_service_interruptions.parquet"
)

Columns#

Field Name

Type

Description

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

borrower_name_rus

string

The name of the RUS (Rural Utilities Service) borrower.

is_total

boolean

Whether or not this record represents a total.

observation_period

string

The date range that any given record pertains to. Ex: ‘december’ implies that this record covers the month of December only, while ‘avg’ implies this record pertains to the average of the reporting period.

report_date

date

Date reported.

saidi_minutes

number

Cumulative duration (minutes) of interruption for the average customer during the report year.

service_interruption_cause

string

Source of service interruption.

state

string

Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

out_rus7__yearly_statement_of_operations#

Noticed any irregularities with this table? File a report.

Annual time series of opex and cost of electric service for RUS borrowers by time period.

Most-recent data:

2021

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part A)

Primary key:

report_date, borrower_id_rus, opex_group, opex_type

Usage Warnings#

  • Some columns contain subtotals; use caution when choosing columns to aggregate.

  • Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_statement_of_operations.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_statement_of_operations.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus7__yearly_statement_of_operations.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_statement_of_operations.parquet"
)

Columns#

Field Name

Type

Description

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

borrower_name_rus

string

The name of the RUS (Rural Utilities Service) borrower.

is_total

boolean

Whether or not this record represents a total.

opex_group

string

High level section from the statement of operations table. Most of these types have subcomponents broken out in the opex_type column.

opex_report_month

number

Amount of operational expense, cost or income during the report month.

opex_type

string

Type of item from the statement of operations.

opex_ytd

number

The year-to-date amount of operational expense, cost or income.

opex_ytd_budget

number

The year-to-date budget for amount of operational expense, cost or income.

report_date

date

Date reported.

state

string

Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

out_rus7__yearly_transmission_and_distribution_mileage#

Noticed any irregularities with this table? File a report.

Annual time series of miles of transmission and distribution infrastructure.

Most-recent data:

2021

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part B)

Primary key:

report_date, borrower_id_rus, line_type

Usage Warnings#

  • Some columns contain subtotals; use caution when choosing columns to aggregate.

  • Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Additional Details#

Note that according to RUS, ‘underbuild in overhead lines or joint runs in underground installations do not increase the number of line miles except for distribution underbuild on transmission poles. In such cases, distribution pole line miles would be increased by the number of underbuild miles involved.’

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_transmission_and_distribution_mileage.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_transmission_and_distribution_mileage.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus7__yearly_transmission_and_distribution_mileage.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_transmission_and_distribution_mileage.parquet"
)

Columns#

Field Name

Type

Description

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

borrower_name_rus

string

The name of the RUS (Rural Utilities Service) borrower.

is_total

boolean

Whether or not this record represents a total.

line_type

string

The type of line mileage reported (e.g., transmission, overhead distribution).

miles

number

Line length at the end of the reported period, in miles.

report_date

date

Date reported.

state

string

Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

out_rus7__yearly_utility_plant_changes#

Noticed any irregularities with this table? File a report.

Annual time series of changes in utility plant

Most-recent data:

2021

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part E)

Primary key:

report_date, borrower_id_rus, utility_plant_group, utility_plant_item

Usage Warnings#

  • Some columns contain subtotals; use caution when choosing columns to aggregate.

  • Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_utility_plant_changes.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_utility_plant_changes.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_rus7__yearly_utility_plant_changes.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_rus7__yearly_utility_plant_changes.parquet"
)

Columns#

Field Name

Type

Description

additions

number

Cost of acquisition of items classified within the account.

adjustments_and_transfers

number

Amount of adjustments and transfers within a class of assets.

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

borrower_name_rus

string

The name of the RUS (Rural Utilities Service) borrower.

ending_balance

number

Account balance at end of year.

is_total

boolean

Whether or not this record represents a total.

report_date

date

Date reported.

retirements

number

Cost of disposal of items classified within the account.

state

string

Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

utility_plant_group

string

High-level category of utility plant asset type.

utility_plant_item

string

Sub-category of utility_plant_group describing utility plant asset item.

out_sec10k__changelog_company_name#

Noticed any irregularities with this table? File a report.

Changelog table tracking changes in the names each SEC 10-K filer has used.

Most-recent data:

2023

Processing:

Data is ready for use in analyses, but for practical reasons has not been denormalized and remains in narrow format.

Source:

U.S. Securities and Exchange Commission (SEC) Form 10-K

Primary key:

This table has no primary key.

Usage Warnings#

  • PUDL does not currently update its copy of this data.

Additional Details#

We use the company name reported in association with each name change block in the company information table to fill in the most recent value of company_name_new. Roughly 1,000 reported “name changes” in which the old and new names were identical have been dropped.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_sec10k__changelog_company_name.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_sec10k__changelog_company_name.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_sec10k__changelog_company_name.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_sec10k__changelog_company_name.parquet"
)

Columns#

Field Name

Type

Description

central_index_key

string

Identifier of the company in SEC database.

company_name_new

string

Name of company after name change.

company_name_old

string

Name of company prior to name change.

name_change_date

date

Date of last name change of the company.

out_sec10k__parents_and_subsidiaries#

Noticed any irregularities with this table? File a report.

A denormalized table containing information about parent companies that file SEC Form 10-K and their subsidiaries, which may or may not file Form 10-K.

Most-recent data:

2023

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

U.S. Securities and Exchange Commission (SEC) Form 10-K

Primary key:

filename_sec10k, subsidiary_company_id_sec10k

Usage Warnings#

  • This data has been extracted probabilistically using a machine learning model and contains some incompletions and errors. It should not be treated as ground truth data.

  • PUDL does not currently update its copy of this data.

Additional Details#

Company ownership fractions are extracted from SEC 10-K Exhibit 21. Information about the companies is extracted primarily from the headers of the SEC 10-K filing. Subsidiaries that file Form 10-K will have much more information available than those that only appear as subsidiaries in Exhibit 21.

SEC 10-K filers and EIA utilities are matched using probabilistic record linkage. Exhibit 21 subsidiaries that don’t file a Form 10-K are matched to EIA utilities using the company name.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_sec10k__parents_and_subsidiaries.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_sec10k__parents_and_subsidiaries.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_sec10k__parents_and_subsidiaries.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_sec10k__parents_and_subsidiaries.parquet"
)

Columns#

Field Name

Type

Description

filename_sec10k

string

Unique portion of the filename associated with the SEC 10-K filing in the EDGAR database. The full source URL can be reconstructed by prepending https://www.sec.gov/Archives/edgar/data/ and adding the .txt file type extension.

filing_date

date

Date on which the filing was submitted.

fraction_owned

number

Proportion of generator ownership attributable to this utility.

parent_company_business_city

string

City where the parent company’s place of business is located.

parent_company_business_state

string

State where the parent company’s place of business is located.

parent_company_business_street_address

string

Street address of the parent company’s place of business.

parent_company_business_street_address_2

string

Second line of the street address of the parent company’s place of business.

parent_company_business_zip_code

string

Zip code of the parent company’s place of business.

parent_company_business_zip_code_4

string

Zip code suffix of the company’s place of business.

parent_company_central_index_key

string

Central index key (CIK) of the parent company.

parent_company_incorporation_state

string

Two letter state code where parent company is incorporated.

parent_company_industry_id_sic

string

Four-digit Standard Industrial Classification (SIC) code identifying the parent company’s primary industry. SIC codes have been replaced by NAICS codes in many applications, but are still used by the SEC. See e.g. https://www.osha.gov/data/sic-manual for code definitions.

parent_company_industry_name_sic

string

Text description of the parent company’s Standard Industrial Classification (SIC)

parent_company_mail_city

string

City of the parent company’s mailing address.

parent_company_mail_state

string

State of the parent company’s mailing address.

parent_company_mail_street_address

string

Street portion of the parent company’s mailing address.

parent_company_mail_street_address_2

string

Second line of the street portion of the parent company’s mailing address.

parent_company_mail_zip_code

string

Zip code of the parent company’s mailing address.

parent_company_mail_zip_code_4

string

Zip code suffix of the parent company’s mailing address.

parent_company_name

string

Name of the parent company.

parent_company_phone_number

string

Phone number of the parent company.

parent_company_taxpayer_id_irs

string

Taxpayer ID of the parent company with the IRS.

parent_company_utility_id_eia

integer

The EIA utility ID of the parent company.

parent_company_utility_name_eia

string

The EIA reported utility name of the parent company.

report_date

date

Date reported.

subsidiary_company_business_city

string

City where the subsidiary company’s place of business is located.

subsidiary_company_business_state

string

State where the subsidiary company’s place of business is located.

subsidiary_company_business_street_address

string

Street address of the subsidiary company’s place of business.

subsidiary_company_business_street_address_2

string

Second line of the street address of the subsidiary company’s place of business.

subsidiary_company_business_zip_code

string

Zip code of the subsidiary company’s place of business.

subsidiary_company_business_zip_code_4

string

Zip code suffix of the subsidiary company’s place of business.

subsidiary_company_central_index_key

string

Central index key (CIK) of the subsidiary company.

subsidiary_company_id_sec10k

string

PUDL-assigned ID for subsidiaries found in SEC 10-K Exhibit 21. The ID is created by concatenating the CIK of the company whose filing the subsidiary was found in, the subsidiary company’s name, and location of incorporation. It is not guaranteed to be stable across different releases of PUDL and so should never be hard-coded in analyses.

subsidiary_company_incorporation_state

string

Two letter state code where subisidary company is incorporated.

subsidiary_company_industry_id_sic

string

Four-digit Standard Industrial Classification (SIC) code identifying the subsidiary company’s primary industry. SIC codes have been replaced by NAICS codes in many applications, but are still used by the SEC. See e.g. https://www.osha.gov/data/sic-manual for code definitions.

subsidiary_company_industry_name_sic

string

Text description of the subsidiary company’s Standard Industrial Classification (SIC)

subsidiary_company_location

string

Location of subsidiary company. This is the full US state name or country name and occasionally a two digit code that was not mapped to a full name during cleaning.

subsidiary_company_mail_city

string

City of the subsidiary company’s mailing address.

subsidiary_company_mail_state

string

State of the parent company’s mailing address.

subsidiary_company_mail_street_address

string

Street portion of the subsidiary company’s mailing address.

subsidiary_company_mail_street_address_2

string

Second line of the street portion of the subsidiary company’s mailing address.

subsidiary_company_mail_zip_code

string

Zip code of the subsidiary company’s mailing address.

subsidiary_company_mail_zip_code_4

string

Zip code suffix of the subsidiary company’s mailing address.

subsidiary_company_name

string

Name of subsidiary company.

subsidiary_company_phone_number

string

Phone number of the subsidiary company.

subsidiary_company_taxpayer_id_irs

string

Taxpayer ID of the subsidiary company with the IRS.

subsidiary_company_utility_id_eia

integer

The EIA utility ID of the subsidiary company.

subsidiary_company_utility_name_eia

string

The EIA reported utility name of the subsidiary company.

out_sec10k__quarterly_company_information#

Noticed any irregularities with this table? File a report.

Quarterly time series of company and filing data extracted from SEC 10-K filings.

Most-recent data:

2023

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

U.S. Securities and Exchange Commission (SEC) Form 10-K

Primary key:

filename_sec10k, central_index_key

Usage Warnings#

  • PUDL does not currently update its copy of this data.

Additional Details#

In addition to the information provided by the core_sec10k__quarterly_company_information table, this output table merges in the associated utility_id_eia (and utility name) if it is available, as well as the report and filing dates associated with the filing each record was extracted from, as well as providing a link to the source URL for the filing.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_sec10k__quarterly_company_information.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_sec10k__quarterly_company_information.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_sec10k__quarterly_company_information.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_sec10k__quarterly_company_information.parquet"
)

Columns#

Field Name

Type

Description

business_city

string

City where the company’s place of business is located.

business_postal_code

string

Non-US postal code of the company’s place of business.

business_state

string

State where the company’s place of business is located.

business_street_address

string

Street address of the company’s place of business.

business_street_address_2

string

Second line of the street address of the company’s place of business.

business_zip_code

string

Zip code of the company’s place of business.

business_zip_code_4

string

Zip code suffix of the company’s place of business.

central_index_key

string

Identifier of the company in SEC database.

company_name

string

Name of the reporting company.

filename_sec10k

string

Unique portion of the filename associated with the SEC 10-K filing in the EDGAR database. The full source URL can be reconstructed by prepending https://www.sec.gov/Archives/edgar/data/ and adding the .txt file type extension.

filer_count

integer

A counter indicating which observation of company data within an SEC 10-K filing header the record pertains to.

filing_date

date

Date on which the filing was submitted.

filing_number_sec

string

Filing number used internally by the SEC commission to track filing.

film_number

string

Document control number used in the SEC EDGAR database. The first four digits can be used to access scans of the document in the SEC’s Virtual Private Reference Room.

fiscal_year_end

string

The end date of an SEC filing company’s fiscal year, in MMDD format.

incorporation_state

string

Two letter state code where company is incorporated.

industry_group_sic

string

A higher level industry category defined within the Standard Industrial Classification (SIC) system.

industry_id_sic

string

Four-digit Standard Industrial Classification (SIC) code identifying the company’s primary industry. SIC codes have been replaced by NAICS codes in many applications, but are still used by the SEC. See e.g. https://www.osha.gov/data/sic-manual for code definitions.

industry_name_sic

string

Text description of Standard Industrial Classification (SIC)

mail_city

string

City of the company’s mailing address.

mail_postal_code

string

Non-US postal code of the company’s mailing address.

mail_state

string

State of the company’s mailing address.

mail_street_address

string

Street portion of the company’s mailing address.

mail_street_address_2

string

Second line of the street portion of the company’s mailing address.

mail_zip_code

string

Zip code of the company’s mailing address.

mail_zip_code_4

string

Zip code suffix of the company’s mailing address.

phone_number

string

Phone number for utility contact 1.

report_date

date

Date reported.

sec10k_type

string

Specific version of SEC 10-K that was filed. 10-k: the standard annual report. 10-k/a: an amended version of the annual report. 10-k405: filed to report insider trading that was not reported in a timely fashion. 10-k405/a: an amended version of the 10-k405. 10-kt: submitted in lieu of or in addition to a standard 10-K annual report when a company changes the end of its fiscal year (e.g. due to a merger) leaving the company with a longer or shorter reporting period. 10-kt/a: an amended version of the 10-kt. 10-ksb: the annual report for small businesses, also known as penny stocks. 10-ksb/a: an amended version of the 10-ksb.

sec_act

string

SEC Act through which the form was enacted, e.g. 1934 act.

source_url

string

URL pointing to the original source of the data in the record.

taxpayer_id_irs

string

Taxpayer ID of the company with the IRS.

utility_id_eia

integer

The EIA Utility Identification number.

utility_name_eia

string

The name of the utility.

out_sec10k__quarterly_filings#

Noticed any irregularities with this table? File a report.

Quarterly time series of metadata describing SEC 10-K filings.

Most-recent data:

2023

Processing:

Data has been expanded into a wide/denormalized format, with IDs and codes accompanied by human-readable names and descriptions.

Source:

U.S. Securities and Exchange Commission (SEC) Form 10-K

Primary key:

filename_sec10k

Usage Warnings#

  • PUDL does not currently update its copy of this data.

Additional Details#

Each SEC 10-K filing is submitted by a single company, but may contain information about numerous other companies. This table indicates the company submitting the filing, as well as some information about the overall filing. Each filing is guaranteed to have a unique filename, but ~1% of all filings are one company submitting the same form multiple times on the same day, so the filename is the only available natural primary key. This output table adds a link to the source URL for the filing, which is constructed from the filename.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_sec10k__quarterly_filings.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_sec10k__quarterly_filings.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/out_sec10k__quarterly_filings.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/out_sec10k__quarterly_filings.parquet"
)

Columns#

Field Name

Type

Description

central_index_key

string

Identifier of the company in SEC database.

company_name

string

Name of the reporting company.

exhibit_21_version

string

Version of exhibit 21 submitted (if applicable).

filename_sec10k

string

Unique portion of the filename associated with the SEC 10-K filing in the EDGAR database. The full source URL can be reconstructed by prepending https://www.sec.gov/Archives/edgar/data/ and adding the .txt file type extension.

filing_date

date

Date on which the filing was submitted.

report_date

date

Date reported.

sec10k_type

string

Specific version of SEC 10-K that was filed. 10-k: the standard annual report. 10-k/a: an amended version of the annual report. 10-k405: filed to report insider trading that was not reported in a timely fashion. 10-k405/a: an amended version of the 10-k405. 10-kt: submitted in lieu of or in addition to a standard 10-K annual report when a company changes the end of its fiscal year (e.g. due to a merger) leaving the company with a longer or shorter reporting period. 10-kt/a: an amended version of the 10-kt. 10-ksb: the annual report for small businesses, also known as penny stocks. 10-ksb/a: an amended version of the 10-ksb.

source_url

string

URL pointing to the original source of the data in the record.

out_vcerare__hourly_available_capacity_factor#

Noticed any irregularities with this table? File a report.

Hourly time series of estimated county-averaged capacity factors for wind and solar generating facilities across the contiguous United States (US), to be used as a tool and input for resource adequacy modeling and planning.

Most-recent data:

2023

Processing:

Data is ready for use in analyses, but for practical reasons has not been denormalized and remains in narrow format.

Source:

Vibrant Clean Energy Resource Adequacy Renewable Energy (RARE) Power Dataset

Primary key:

state, place_name, datetime_utc

Usage Warnings#

  • The hourly capacity factors are normalized to unity for maximal power output. To convert to units of power, the user must multiply by the installed capacity within the county.

  • Hourly capacity factors are spatially averaged across each county over the contiguous USA. There are a handful of counties that are too small to pick up representation on the HRRR operational forecast grid. As such, these counties will have no wind or solar power production curves.

  • Due to power production performance being correlated with panel temperatures, during cold sunny periods, some solar capacity factor values are greater than 1 (but less that 1.1).

Additional Details#

The data in this table were produced by Vibrant Clean Energy, and are licensed to the public under the Creative Commons Attribution 4.0 International license (CC-BY-4.0).

The technologies provided are:

  1. Onshore wind assuming a 100m hub height and 120m rotor diameter;

  2. Offshore wind assuming a 140m hub height and 120m rotor diameter;

  3. Utility solar assuming a fixed axis panel tilted at latitude.

The foundation of the capacity factors provided here is the NOAA HRRR operational numerical weather prediction model. The HRRR covers the entire contiguous US at a horizontal resolution of 3 km. Forecasts are initialized each hour of the year. Forecast hour two (2) is used as the input data for the power algorithms. This forecast hour is chosen to trade-off the impact of the measurement and data assimilation procedure of the HRRR with the physics of the model to derive the most complete picture of the atmosphere at the forecast time horizon.

For wind capacity factors: vertical slices of the atmosphere are considered across the defined rotor swept area. Bringing together wind speed, density, temperature and icing information, a power capacity is estimated using a representative power coefficient (Cp) curve to determine the power from a given wind speed, atmospheric density and temperature. There is no wake modeling included in the dataset.

For solar capacity factors: pertinent surface weather variables are pulled such as incoming short wave radiation, direct normal irradiance (calculated in the HRRR 2016 forward), surface temperature and other parameters. These are used in a non-linear I-V curve translation to power capacity factors.

Access methods#

Warning

This table will likely not fit in your computer’s memory. Make sure to filter or aggregate to reduce the amount of data read in. This is easier in polars and duckdb than in pandas, so we only provide examples for these.

import polars as pl
df = (
    # Use scan_parquet (lazy evaluation) to plan the query before execution
    pl.scan_parquet(
        "s3://pudl.catalyst.coop/nightly/out_vcerare__hourly_available_capacity_factor.parquet",
        storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
    )
    # Select only a subset of columns to reduce network and memory usage
    .select(
        [
            "report_year",
            "datetime_utc",
            "hour_of_year",
            "county_id_fips",
            "state",
            "place_name",
            "capacity_factor_solar_pv",
        ]
    )
    .filter(
        # Select data for only a single year and state to reduce network and memory usage
        (pl.col("report_year") == 2022)
        & (pl.col("state") == "CO")
    )
    .collect()
)
SELECT
    -- Select only a subset of columns to reduce network and memory usage
    report_year,
    datetime_utc,
    hour_of_year,
    county_id_fips,
    state,
    place_name,
    capacity_factor_solar_pv
FROM 's3://pudl.catalyst.coop/nightly/out_vcerare__hourly_available_capacity_factor.parquet';
WHERE
    -- Select data for only a single year and state to reduce network and memory usage
    report_year = 2022
    AND state = "CO";

Columns#

Field Name

Type

Description

capacity_factor_offshore_wind

number

Estimated capacity factor (0-1) calculated for offshore wind assuming a 140m hub height and 120m rotor diameter.Based on outputs from the NOAA HRRR operational numerical weather prediction model. Capacity factors are normalized to unity for maximal power output. Vertical slices of the atmosphere are considered across the defined rotor swept area. Bringing together wind speed, density, temperature and icing information, a power capacity is estimated using a representative power coefficient (Cp) curve to determine the power from a given wind speed, atmospheric density and temperature. There is no wake modeling included in the dataset.

capacity_factor_onshore_wind

number

Estimated capacity factor (0-1) calculated for onshore wind assuming a 100m hub height and 120m rotor diameter.Based on outputs from the NOAA HRRR operational numerical weather prediction model. Capacity factors are normalized to unity for maximal power output. Vertical slices of the atmosphere are considered across the defined rotor swept area. Bringing together wind speed, density, temperature and icing information, a power capacity is estimated using a representative power coefficient (Cp) curve to determine the power from a given wind speed, atmospheric density and temperature. There is no wake modeling included in the dataset.

capacity_factor_solar_pv

number

Estimated capacity factor (0-1) calculated for solar PV assuming a fixed axis panel tilted at latitude and DC power outputs. Due to power production performance being correlated with panel temperatures, during cold sunny periods, some solar capacity factor values are greater than 1 (but less that 1.1).All values are based on outputs from the NOAA HRRR operational numerical weather prediction model. Capacity factors are normalized to unity for maximal power output. Pertinent surface weather variables are pulled such as incoming short wave radiation, direct normal irradiance (calculated in the HRRR 2016 forward), surface temperature and other parameters. These are used in a non-linear I-V curve translation to power capacity factors.

county_id_fips

string

County ID from the Federal Information Processing Standard Publication 6-4.

datetime_utc

datetime

Date and time converted to Coordinated Universal Time (UTC).

hour_of_year

integer

Integer between 1 and 8670 representing the hour in a given year.

latitude

number

Latitude of the place centroid (e.g., county centroid).

longitude

number

Longitude of the place centroid (e.g., county centroid).

place_name

string

County or lake name, sourced from the latest Census PEP vintage based on county FIPS ID. Lake names originate from VCE RARE directly, and may also appear several times–once for each state it touches. FIPS ID values for lakes have been nulled.

report_year

integer

Four-digit year in which the data was reported.

state

string

Two letter US state abbreviation.

core_eia176__yearly_gas_disposition#

Noticed any irregularities with this table? File a report.

Annual time series of a company’s natural and supplemental gas disposition for the report state.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA Form 176 – Annual Report of Natural and Supplemental Gas Supply and Disposition (Part 6, Lines 9, 12-20)

Primary key:

operator_id_eia, report_year

Additional Details#

The deliveries_out_of_state_volume_mcf (Line 14.0) are reported as one aggregated volume, calculated by summing the original granular data. Similarly, disposition_to_other_volume_mcf (Line 18.4) is summed from the original granular data and reported as one aggregate field.

The delivered_gas_heat_content_mmbtu_per_mcf is expected to be between 0.8 and 1.2 by the EIA. We find that less than 0.5 percent of data falls outside of these expected bounds.

The total_disposition_mcf field includes disposition to consumers which is reported in core_eia176__yearly_gas_disposition_by_consumer. Note that the reported total disposition and the sum of values in this table and core_eia176__yearly_gas_disposition_by_consumer often don’t match as would be expected. Thus, we have preserved both the total field and the sub-components in these two tables.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia176__yearly_gas_disposition.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia176__yearly_gas_disposition.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia176__yearly_gas_disposition.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia176__yearly_gas_disposition.parquet"
)

Columns#

Field Name

Type

Description

delivered_gas_heat_content_mmbtu_per_mcf

number

The average annual heat content of gas delivered directly to consumers.

disposition_distribution_companies_mcf

number

Disposition to distribution companies within the report state that does not fall into one of the other reported categories in lines 10.1-17.0. Reference conditions for measurement are 14.73 psia and 60° Fahrenheit.

disposition_other_pipelines_mcf

number

Disposition to other pipelines within the report state that does not fall into one of the other reported categories in lines 10.1-17.0. Reference conditions for measurement are 14.73 psia and 60° Fahrenheit.

disposition_out_of_state_mcf

number

Total volume of the operator’s deliveries across or to state lines or U.S. borders. This has been summed from the detailed data reported by each company on Line 14.0 of the original form in order to preserve the primary key of the table. Reference conditions for measurement are 14.73 psia and 60° Fahrenheit.

disposition_storage_operators_mcf

number

Disposition to storage operators within the report state that does not fall into one of the other reported categories in lines 10.1-17.0. Reference conditions for measurement are 14.73 psia and 60° Fahrenheit.

losses_mcf

number

Known loss volumes as a result of leaks, damage, accidents, migration and blow down within the report state where the events took place, as well as estimated losses from leaks occurring during distribution activities. Reference conditions for measurement are 14.73 psia and 60° Fahrenheit.

operating_state

string

State that the distribution utility is reporting for. Prior to 2004, this may be a list of states.

operational_consumption_compressors_mcf

number

Volume of natural gas consumed as fuel for the operator’s own pipeline distribution or storage compressor use within the report state. Reference conditions for measurement are 14.73 psia and 60° Fahrenheit.

operational_consumption_facility_space_heat_mcf

number

Volume of natural gas consumed as fuel for space heat of the operator’s own facilities within the report state. Reference conditions for measurement are 14.73 psia and 60° Fahrenheit.

operational_consumption_lng_vaporization_liquefaction_mcf

number

Volume of natural gas consumed by the operator for vaporization, liquefaction, and LNG fuel within the report state. Reference conditions for measurement are 14.73 psia and 60° Fahrenheit.

operational_consumption_new_pipeline_fill_mcf

number

Volume of natural gas consumed as fuel for the operator’s new pipeline fill within the report state. Reference conditions for measurement are 14.73 psia and 60° Fahrenheit.

operational_consumption_other_detail

string

Free-text detail describing the operator’s specified ‘other purposes’ for operational natural-gas consumption, corresponding to the volume reported in operational_consumption_other_mcf.

operational_consumption_other_mcf

number

Volume of natural gas consumed as fuel by the operator for other purposes within the report state. Reference conditions for measurement are 14.73 psia and 60° Fahrenheit.

operational_consumption_vehicle_fuel_mcf

number

Volume of natural gas used in the operator’s company-owned fleet within the report state. Reference conditions for measurement are 14.73 psia and 60° Fahrenheit.

operational_lng_storage_injections_mcf

number

The total volume added to LNG storage operated by the company within the report state, regardless of ownership of the gas. For LNG import and export marine terminals, this should exclude injections of LNG during the course of routine operations for handling imports, as well as LNG held in storage for future use. Reference conditions for measurement are 14.73 psia and 60° Fahrenheit.

operational_storage_underground_mcf

number

The total volume added to underground storage operated by the company within the report state, regardless of ownership of the gas. This includes new fields. Reference conditions for measurement are 14.73 psia and 60° Fahrenheit.

operator_id_eia

string

The unique EIA identifier for an operator in a given state. The last two letters of the ID indicate the state.

other_disposition_all_other_mcf

number

Other disposition within the report state that does not fall into one of the other reported categories in lines 10.1-17.0. This has been summed from the detailed data reported by each company on Line 18.4 of the original form in order to preserve the primary key of the table. Reference conditions for measurement are 14.73 psia and 60° Fahrenheit.

producer_lease_use_mcf

number

The total volume of gas used in the company’s well, field and lease operations. This field should only be reported by producers. Reference conditions for measurement are 14.73 psia and 60° Fahrenheit.

producer_returned_for_repressuring_reinjection_mcf

number

The volume of gas delivered directly from the operator’s system to oil or gas fields located within the report state for repressuring, pressure maintenance, and cycling operations. This field should only be reported by producers. Reference conditions for measurement are 14.73 psia and 60° Fahrenheit.

report_year

integer

Four-digit year in which the data was reported.

total_disposition_mcf

number

Total disposition within the report state, as reported by the operator. This includes disposition to consumers which is reported in core_eia176__yearly_gas_disposition_by_consumer. Note that the reported total disposition and the sum of values in this table and core_eia176__yearly_gas_disposition_by_consumer often don’t match as it would be expected. Reference conditions for measurement are 14.73 psia and 60° Fahrenheit.

unaccounted_for_mcf

number

The difference between gas supply and disposition. A positive entry indicates supply in excess of accounted-for disposition. A negative entry indicates accounted-for disposition exceeds reported supply. This is calculated as the difference between Part 4 Line 7.0 and Part 6 Line 19.0, and is reported as Line 20.0 in the original form.

core_eia176__yearly_gas_disposition_by_consumer#

Noticed any irregularities with this table? File a report.

Annual time series of a company’s natural gas deliveries to end-use consumers within the report state.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA Form 176 – Annual Report of Natural and Supplemental Gas Supply and Disposition (Part 6, Lines 10.0-11.6)

Primary key:

report_year, operator_id_eia, customer_class, revenue_class

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia176__yearly_gas_disposition_by_consumer.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia176__yearly_gas_disposition_by_consumer.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia176__yearly_gas_disposition_by_consumer.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia176__yearly_gas_disposition_by_consumer.parquet"
)

Columns#

Field Name

Type

Description

consumers

integer

Number of end-use consumers within the report state.

customer_class

string

High level categorization of customer type (e.g., commercial, residential).

operating_state

string

State that the operator is reporting for.

operator_id_eia

string

The unique EIA identifier for an operator in a given state. The last two letters of the ID indicate the state.

report_year

integer

Four-digit year in which the data was reported.

revenue

number

Revenue including taxes, rounded to the nearest whole dollar.

revenue_class

string

Source of revenue: whether revenue originates from gas owned directly by the operator (sales) or gas transported by the operator (transport).

volume_mcf

number

Total volume of natural gas deliveries in the report state. Reference conditions for measurement are 14.73 psia and 60° Fahrenheit.

core_eia191__monthly_gas_storage#

Noticed any irregularities with this table? File a report.

Monthly time series of monthly underground natural gas storage activity reported by operators.

Most-recent data:

2025

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA Form 191 – Monthly Underground Natural Gas Storage Report

Primary key:

storage_field_id_eia, report_date

Usage Warnings#

  • total_field_capacity_mcf is not reliably equal to the sum of working_gas_capacity_mcf and base_gas_mcf: approximately 23% of records differ. This reflects loose EIA definitions and operator self-reporting practices, not data errors. Do not assume additivity among these three fields.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia191__monthly_gas_storage.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia191__monthly_gas_storage.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia191__monthly_gas_storage.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia191__monthly_gas_storage.parquet"
)

Columns#

Field Name

Type

Description

base_gas_mcf

number

Volume of base gas (cushion gas) in the underground storage reservoir. Base gas is the volume of gas intended as permanent inventory in a reservoir to maintain adequate pressure and deliverability rates.

company_name

string

Name of the reporting company.

county

string

County name.

field_name

string

Name of the natural gas storage field.

field_type

string

Type of underground natural gas storage facility. One of: depleted field (a reservoir previously used for natural gas production), aquifer (a porous rock structure used to store gas), or salt dome (a cavern created within a salt formation).

gas_field_id_eia

integer

EIA-assigned numeric identifier for a natural gas storage field.

maximum_daily_delivery_mcf

number

Maximum daily quantity of gas that can be withdrawn from the underground storage reservoir under normal operating conditions.

operational_status

string

Operational status of the underground natural gas storage reservoir.

region

string

EIA storage region in which the underground natural gas storage field is located.

report_date

date

Date reported.

reservoir_id_eia

integer

EIA-assigned numeric identifier for the individual storage reservoir within a natural gas storage field.

reservoir_name

string

Name of the individual underground natural gas storage reservoir.

state

string

Two letter US state abbreviation.

storage_field_id_eia

string

EIA-assigned identifier for an underground natural gas storage reservoir. IDs are assigned per company and state, so the same physical reservoir may have different IDs if ownership changes.

total_field_capacity_mcf

number

Total design capacity of the underground natural gas storage field (base gas plus working gas).

working_gas_capacity_mcf

number

Volume of working gas (the portion of stored gas available for withdrawal) in the underground storage reservoir.

core_eia860__assn_boiler_cooling#

Noticed any irregularities with this table? File a report.

Association table providing connections between EIA boiler IDs and EIA cooling system IDs.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA Form 860 – Annual Electric Generator Report

Primary key:

report_date, plant_id_eia, boiler_id, cooling_id_eia

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860__assn_boiler_cooling.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860__assn_boiler_cooling.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia860__assn_boiler_cooling.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860__assn_boiler_cooling.parquet"
)

Columns#

Field Name

Type

Description

boiler_id

string

Alphanumeric boiler ID.

cooling_id_eia

string

EIA Identification code for cooling system (if multiple cooling systems are not distinguished by separate IDs, the word ‘PLANT’ is listed to encompass the cooling system for the entire plant)

data_maturity

string

Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.

plant_id_eia

integer

The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.

report_date

date

Date reported.

core_eia860__assn_boiler_generator#

Noticed any irregularities with this table? File a report.

Association table providing connections between boilers and generators.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA Form 860 – Annual Electric Generator Report (Schedule 6, Part A)

Primary key:

plant_id_eia, report_date, generator_id, boiler_id

Usage Warnings#

  • Contains information from multiple raw inputs.

Additional Details#

Augmented with various heuristics within PUDL.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860__assn_boiler_generator.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860__assn_boiler_generator.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia860__assn_boiler_generator.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860__assn_boiler_generator.parquet"
)

Columns#

Field Name

Type

Description

bga_source

string

The source from where the unit_id_pudl is compiled. The unit_id_pudl comes directly from EIA 860, or string association (which looks at all the boilers and generators that are not associated with a unit and tries to find a matching string in the respective collection of boilers or generator), or from a unit connection (where the unit_id_eia is employed to find additional boiler generator connections).

boiler_generator_assn_type_code

string

Indicates whether boiler associations with generator during the year were actual or theoretical. Only available before 2013.

boiler_id

string

Alphanumeric boiler ID.

data_maturity

string

Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.

generator_id

string

Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string!

plant_id_eia

integer

The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.

report_date

date

Date reported.

steam_plant_type_code

integer

Code that describes types of steam plants from EIA 860. See steam_plant_types_eia table for more details.

unit_id_eia

string

EIA-assigned unit identification code.

unit_id_pudl

integer

Dynamically assigned PUDL unit id. WARNING: This ID is not guaranteed to be static long term as the input data and algorithm may evolve over time.

core_eia860__assn_boiler_stack_flue#

Noticed any irregularities with this table? File a report.

Association table providing connections between EIA boiler IDs and EIA stack and/or flue system IDs.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA Form 860 – Annual Electric Generator Report

Primary key:

report_date, plant_id_eia, boiler_id, stack_flue_id_pudl

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860__assn_boiler_stack_flue.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860__assn_boiler_stack_flue.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia860__assn_boiler_stack_flue.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860__assn_boiler_stack_flue.parquet"
)

Columns#

Field Name

Type

Description

boiler_id

string

Alphanumeric boiler ID.

flue_id_eia

string

The flue identification value reported to EIA. The flue is a duct, pipe, or opening that transports exhast gases through the stack. This field was reported in conjunction with stack_id_eia until 2013 when stack_flue_id_eia took their place.

plant_id_eia

integer

The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.

report_date

date

Date reported.

stack_flue_id_eia

string

The stack or flue identification value reported to EIA. This denotes the place where emissions from the combustion process are released into the atmosphere. Prior to 2013, this was reported as stack_id_eia and flue_id_eia.

stack_flue_id_pudl

string

A stack and/or flue identification value created by PUDL for use as part of the primary key for the stack flue equipment and boiler association tables. For 2013 and onward, this value is equal to the value for stack_flue_id_eia. Prior to 2013, this value is equal to the value for stack_id_eia and the value for flue_id_eia separated by an underscore or just the stack_flue_eia in cases where flue_id_eia is NA.

stack_id_eia

string

The stack identification value reported to EIA. Stacks or chimneys are the place where emissions from the combustion process are released into the atmosphere. This field was reported in conjunction with flue_id_eia until 2013 when stack_flue_id_eia took their place.

core_eia860__assn_yearly_boiler_emissions_control_equipment#

Noticed any irregularities with this table? File a report.

Association table providing connections between EIA boiler IDs and emissions control IDs for NOx, SO2, mercury, and particulate monitoring.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA Form 860 – Annual Electric Generator Report

Primary key:

report_date, plant_id_eia, boiler_id, emission_control_id_type, emission_control_id_eia

Usage Warnings#

  • The relationship between the IDs is sometimes many to many.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860__assn_yearly_boiler_emissions_control_equipment.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860__assn_yearly_boiler_emissions_control_equipment.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia860__assn_yearly_boiler_emissions_control_equipment.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860__assn_yearly_boiler_emissions_control_equipment.parquet"
)

Columns#

Field Name

Type

Description

boiler_id

string

Alphanumeric boiler ID.

data_maturity

string

Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.

emission_control_id_eia

string

The emission control ID used to collect SO2, NOx, particulate, and mercury emissions data. This column should be used in conjunction with emissions_control_type as it’s not guaranteed to be unique.

emission_control_id_type

string

The type of emissions control id: SO2, NOx, particulate, or mercury.

plant_id_eia

integer

The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.

report_date

date

Date reported.

core_eia860__scd_boilers#

Noticed any irregularities with this table? File a report.

Slowly changing dimension (SCD) table describing attributes of boilers which may vary from year to year. Compiled from across all EIA-860 data.

Most-recent data:

2025

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA Form 860 – Annual Electric Generator Report

Primary key:

plant_id_eia, boiler_id, report_date

Usage Warnings#

  • Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.

  • Contains information from multiple raw inputs.

Additional Details#

This is one of two tables where canonical values for boilers are set. It contains values which are expected to vary slowly, while core_eia__entity_boilers contains those expected to remain fixed. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process. All tables downstream of this one inherit the canonical values established here.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860__scd_boilers.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860__scd_boilers.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia860__scd_boilers.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860__scd_boilers.parquet"
)

Columns#

Field Name

Type

Description

air_flow_100pct_load_cubic_feet_per_minute

number

Total air flow including excess air at 100 percent load, reported at standard temperature and pressure (i.e. 68 F and one atmosphere pressure).

boiler_fuel_code_1

string

The code representing the most predominant type of energy that fuels the boiler.

boiler_fuel_code_2

string

The code representing the second most predominant type of energy that fuels the boiler.

boiler_fuel_code_3

string

The code representing the third most predominant type of energy that fuels the boiler.

boiler_fuel_code_4

string

The code representing the fourth most predominant type of energy that fuels the boiler.

boiler_id

string

Alphanumeric boiler ID.

boiler_operating_date

date

Date the boiler began or is planned to begin commercial operation.

boiler_retirement_date

date

Date of the scheduled or effected retirement of the boiler.

boiler_status

string

EIA short code identifying boiler operational status.

boiler_type

string

EIA short code indicating the standards under which the boiler is operating as described in the U.S. EPA regulation under 40 CFR.

compliance_year_mercury

integer

Year boiler was or is expected to be in compliance with federal, state and/or local regulations for mercury emissions.

compliance_year_nox

integer

Year boiler was or is expected to be in compliance with federal, state and/or local regulations for nitrogen oxide emissions.

compliance_year_particulate

integer

Year boiler was or is expected to be in compliance with federal, state and/or local regulations for particulate matter emissions.

compliance_year_so2

integer

Year boiler was or is expected to be in compliance with federal, state and/or local regulations for sulfur dioxide emissions.

data_maturity

string

Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.

efficiency_100pct_load

number

Boiler efficiency percentage when burning at 100 percent load to the nearest 0.1 percent.

efficiency_50pct_load

number

Boiler efficiency percentage when burning at 50 percent load to the nearest 0.1 percent.

firing_rate_using_coal_tons_per_hour

number

Design firing rate at maximum continuous steam flow for coal to the nearest 0.1 ton per hour.

firing_rate_using_gas_mcf_per_hour

number

Design firing rate at maximum continuous steam flow for gas to the nearest 0.1 cubic feet per hour.

firing_rate_using_oil_bbls_per_hour

number

Design firing rate at maximum continuous steam flow for pet coke to the nearest 0.1 barrels per hour.

firing_rate_using_other_fuels

number

Design firing rate at maximum continuous steam flow for energy sources other than coal, petroleum, or natural gas.

firing_type_1

string

EIA short code indicating the type of firing used by this boiler.

firing_type_2

string

EIA short code indicating the type of firing used by this boiler.

firing_type_3

string

EIA short code indicating the type of firing used by this boiler.

fly_ash_reinjection

boolean

Indicates whether the boiler is capable of re-injecting fly ash.

hrsg

boolean

indicates if the boiler is a heat recovery steam generator (HRSG).

max_steam_flow_lbs_per_hour

number

Maximum continuous steam flow at 100 percent load.

mercury_control_existing_strategy_1

string

Existing strategy to comply with the most stringent mercury regulation.

mercury_control_existing_strategy_2

string

Existing strategy to comply with the most stringent mercury regulation.

mercury_control_existing_strategy_3

string

Existing strategy to comply with the most stringent mercury regulation.

mercury_control_existing_strategy_4

string

Existing strategy to comply with the most stringent mercury regulation.

mercury_control_existing_strategy_5

string

Existing strategy to comply with the most stringent mercury regulation.

mercury_control_existing_strategy_6

string

Existing strategy to comply with the most stringent mercury regulation.

mercury_control_proposed_strategy_1

string

Proposed strategy to comply with the most stringent mercury regulation.

mercury_control_proposed_strategy_2

string

Proposed strategy to comply with the most stringent mercury regulation.

mercury_control_proposed_strategy_3

string

Proposed strategy to comply with the most stringent mercury regulation.

new_source_review

boolean

Indicates whether the boiler is subject to New Source Review requirements.

new_source_review_date

date

Month of issued New Source Review permit.

new_source_review_permit

string

New Source Review permit number.

nox_control_existing_caaa_compliance_strategy_1

string

Existing strategies to meet the nitrogen oxide requirements of Title IV of the Clean Air Act Amendment of 1990.

nox_control_existing_caaa_compliance_strategy_2

string

Existing strategies to meet the nitrogen oxide requirements of Title IV of the Clean Air Act Amendment of 1990.

nox_control_existing_caaa_compliance_strategy_3

string

Existing strategies to meet the nitrogen oxide requirements of Title IV of the Clean Air Act Amendment of 1990.

nox_control_existing_strategy_1

string

Existing strategy to comply with the most stringent nitrogen oxide regulation.

nox_control_existing_strategy_2

string

Existing strategy to comply with the most stringent nitrogen oxide regulation.

nox_control_existing_strategy_3

string

Existing strategy to comply with the most stringent nitrogen oxide regulation.

nox_control_manufacturer

string

Name of nitrogen oxide control manufacturer.

nox_control_manufacturer_code

string

Code indicating the nitrogen oxide control burner manufacturer.

nox_control_out_of_compliance_strategy_1

string

If boiler is not in compliance with nitrogen oxide regulations, strategy for compliance.

nox_control_out_of_compliance_strategy_2

string

If boiler is not in compliance with nitrogen oxide regulations, strategy for compliance.

nox_control_out_of_compliance_strategy_3

string

If boiler is not in compliance with nitrogen oxide regulations, strategy for compliance.

nox_control_planned_caaa_compliance_strategy_1

string

Planned strategies to meet the nitrogen oxide requirements of Title IV of the Clean Air Act Amendment of 1990.

nox_control_planned_caaa_compliance_strategy_2

string

Planned strategies to meet the nitrogen oxide requirements of Title IV of the Clean Air Act Amendment of 1990.

nox_control_planned_caaa_compliance_strategy_3

string

Planned strategies to meet the nitrogen oxide requirements of Title IV of the Clean Air Act Amendment of 1990.

nox_control_proposed_strategy_1

string

Proposed strategy to comply with the most stringent nitrogen oxide regulation.

nox_control_proposed_strategy_2

string

Proposed strategy to comply with the most stringent nitrogen oxide regulation.

nox_control_proposed_strategy_3

string

Proposed strategy to comply with the most stringent nitrogen oxide regulation.

nox_control_status_code

string

Nitrogen oxide control status code.

particulate_control_out_of_compliance_strategy_1

string

If boiler is not in compliance with particulate matter regulations, strategy for compliance.

particulate_control_out_of_compliance_strategy_2

string

If boiler is not in compliance with particulate matter regulations, strategy for compliance.

particulate_control_out_of_compliance_strategy_3

string

If boiler is not in compliance with particulate matter regulations, strategy for compliance.

plant_id_eia

integer

The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.

regulation_mercury

string

Most stringent type of statute or regulation code under which the boiler is operating for mercury control standards.

regulation_nox

string

EIA short code for most stringent type of statute or regulation code under which the boiler is operating for nitrogen oxide control standards.

regulation_particulate

string

EIA short code for most stringent type of statute or regulation code under which the boiler is operating for particulate matter control standards.

regulation_so2

string

EIA short code for most stringent type of statute or regulation code under which the boiler is operating for sulfur dioxide control standards.

report_date

date

Date reported.

so2_control_existing_caaa_compliance_strategy_1

string

Existing strategies to meet the sulfur dioxide requirements of Title IV of the Clean Air Act Amendment of 1990.

so2_control_existing_caaa_compliance_strategy_2

string

Existing strategies to meet the sulfur dioxide requirements of Title IV of the Clean Air Act Amendment of 1990.

so2_control_existing_caaa_compliance_strategy_3

string

Existing strategies to meet the sulfur dioxide requirements of Title IV of the Clean Air Act Amendment of 1990.

so2_control_existing_strategy_1

string

Existing strategy to comply with the most stringent sulfur dioxide regulation.

so2_control_existing_strategy_2

string

Existing strategy to comply with the most stringent sulfur dioxide regulation.

so2_control_existing_strategy_3

string

Existing strategy to comply with the most stringent sulfur dioxide regulation.

so2_control_out_of_compliance_strategy_1

string

If boiler is not in compliance with sulfur dioxide regulations, strategy for compliance.

so2_control_out_of_compliance_strategy_2

string

If boiler is not in compliance with sulfur dioxide regulations, strategy for compliance.

so2_control_out_of_compliance_strategy_3

string

If boiler is not in compliance with sulfur dioxide regulations, strategy for compliance.

so2_control_planned_caaa_compliance_strategy_1

string

Planned strategies to meet the sulfur dioxide requirements of Title IV of the Clean Air Act Amendment of 1990.

so2_control_planned_caaa_compliance_strategy_2

string

Planned strategies to meet the sulfur dioxide requirements of Title IV of the Clean Air Act Amendment of 1990.

so2_control_planned_caaa_compliance_strategy_3

string

Planned strategies to meet the sulfur dioxide requirements of Title IV of the Clean Air Act Amendment of 1990.

so2_control_proposed_strategy_1

string

Proposed strategy to comply with the most stringent sulfur dioxide regulation.

so2_control_proposed_strategy_2

string

Proposed strategy to comply with the most stringent sulfur dioxide regulation.

so2_control_proposed_strategy_3

string

Proposed strategy to comply with the most stringent sulfur dioxide regulation.

standard_nox_rate

number

Numeric value for the unit of measurement specified for nitrogen oxide.

standard_particulate_rate

number

Numeric value for the unit of measurement specified for particulate matter.

standard_so2_percent_scrubbed

number

The percent of sulfur dioxide to be scrubbed specified by the most stringent sulfur dioxide regulation.

standard_so2_rate

number

Numeric value for the unit of measurement specified for sulfur dioxide.

turndown_ratio

number

The turndown ratio for the boiler.

unit_nox

string

Numeric value for the unit of measurement specified for nitrogen oxide.

unit_particulate

string

Numeric value for the unit of measurement specified for particulate matter.

unit_so2

string

Numeric value for the unit of measurement specified for sulfur dioxide.

waste_heat_input_mmbtu_per_hour

number

Design waste-heat input rate at maximum continuous steam flow where a waste-heat boiler is a boiler that receives all or a substantial portion of its energy input from the noncumbustible exhaust gases of a separate fuel-burning process (MMBTU per hour).

wet_dry_bottom

string

Wet or Dry Bottom where Wet Bottom is defined as slag tanks that are installed at furnace throat to contain and remove molten ash from the furnace, and Dry Bottom is defined as having no slag tanks at furnace throat area, throat area is clear, and bottom ash drops through throat to bottom ash water hoppers.

core_eia860__scd_emissions_control_equipment#

Noticed any irregularities with this table? File a report.

Slowly changing dimension (SCD) table describing attributes of emissions control equipment reported to EIA, including cost, type, operating status, retirement date, and install year.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA Form 860 – Annual Electric Generator Report

Primary key:

report_date, plant_id_eia, emission_control_id_pudl

Usage Warnings#

  • Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.

Additional Details#

Includes control ids for sulfur dioxide (SO2), particulate matter, mercury, nitrogen oxide (NOX), and acid (HCl) gas monitoring.

This table inherits canonicalized values for plants and utilities. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860__scd_emissions_control_equipment.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860__scd_emissions_control_equipment.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia860__scd_emissions_control_equipment.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860__scd_emissions_control_equipment.parquet"
)

Columns#

Field Name

Type

Description

acid_gas_control

boolean

Indicates whether the emissions control equipment controls acid (HCl) gas.

data_maturity

string

Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.

emission_control_equipment_cost

number

The total cost to install a piece of emission control equipment.

emission_control_equipment_type_code

string

Short code indicating the type of emission control equipment installed.

emission_control_id_pudl

number

A PUDL-generated ID used to distinguish emission control units in the same report year and plant id. This ID should not be used to track units over time or between plants.

emission_control_operating_date

date

The date a piece of emissions control equipment began operating. Derived from month and year columns in the raw data.

emission_control_retirement_date

date

The expected or actual retirement date for a piece of emissions control equipment. Derived from month and year columns in the raw data.

mercury_control_id_eia

string

Mercury control identification number. This ID is not a unique identifier.

nox_control_id_eia

string

Nitrogen oxide control identification number. This ID is not a unique identifier.

operational_status_code

string

The operating status of the asset.

particulate_control_id_eia

string

Particulate matter control identification number. This ID is not a unique identifier.

plant_id_eia

integer

The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.

report_date

date

Date reported.

so2_control_id_eia

string

Sulfur dioxide control identification number. This ID is not a unique identifier.

core_eia860__scd_generators#

Noticed any irregularities with this table? File a report.

Slowly changing dimension (SCD) table describing attributes of generators which may vary from year to year. Compiled from across EIA-860 and EIA-923 data.

Most-recent data:

2026

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA Form 860 – Annual Electric Generator Report

Primary key:

plant_id_eia, generator_id, report_date

Usage Warnings#

  • Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.

  • Contains information from multiple raw inputs.

Additional Details#

This is one of two tables where canonical values for generators are set. It contains values which are expected to vary slowly, while core_eia__entity_generators contains those expected to remain fixed. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process. All tables downstream of this one inherit the canonical values established here.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860__scd_generators.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860__scd_generators.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia860__scd_generators.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860__scd_generators.parquet"
)

Columns#

Field Name

Type

Description

can_burn_multiple_fuels

boolean

Whether the generator can burn multiple fuels.

can_cofire_fuels

boolean

Whether the generator can co-fire fuels.

can_switch_oil_gas

boolean

Whether the generator can switch between oil and natural gas.

capacity_mw

number

Total installed (nameplate) capacity, in megawatts.

carbon_capture

boolean

Indicates whether the generator uses carbon capture technology.

current_planned_generator_operating_date

date

The most recently updated effective date on which the generator is scheduled to start operation

data_maturity

string

Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.

deliver_power_transgrid

boolean

Indicate whether the generator can deliver power to the transmission grid.

distributed_generation

boolean

Whether the generator is considered distributed generation

energy_source_1_transport_1

string

Primary mode of transport for energy source 1.

energy_source_1_transport_2

string

Secondary mode of transport for energy source 1.

energy_source_1_transport_3

string

Tertiary mode of transport for energy source 1.

energy_source_2_transport_1

string

Primary mode of transport for energy source 2.

energy_source_2_transport_2

string

Secondary mode of transport for energy source 2.

energy_source_2_transport_3

string

Tertiary mode of transport for energy source 2.

energy_source_code_1

string

The code representing the most predominant type of energy that fuels the generator.

energy_source_code_2

string

The code representing the second most predominant type of energy that fuels the generator

energy_source_code_3

string

The code representing the third most predominant type of energy that fuels the generator

energy_source_code_4

string

The code representing the fourth most predominant type of energy that fuels the generator

energy_source_code_5

string

The code representing the fifth most predominant type of energy that fuels the generator

energy_source_code_6

string

The code representing the sixth most predominant type of energy that fuels the generator

energy_storage_capacity_mwh

number

Energy storage capacity in MWh (e.g. for batteries).

ferc_qualifying_facility

boolean

Indicates whether or not a generator is a qualifying FERC cogeneration facility.

fuel_type_code_pudl

string

Simplified fuel type code used in PUDL

generator_id

string

Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string!

generator_retirement_date

date

Date of the scheduled or effected retirement of the generator.

minimum_load_mw

number

The minimum load at which the generator can operate at continuosuly.

nameplate_power_factor

number

The nameplate power factor of the generator.

net_capacity_mwdc

number

Generation capacity in megawatts of direct current that is subject to a net metering agreement. Typically used for behind-the-meter solar PV.

operational_status

string

The operating status of the asset. For generators this is based on which tab the generator was listed in in EIA 860.

operational_status_code

string

The operating status of the asset.

other_modifications_date

date

Planned effective date that the generator is scheduled to enter commercial operation after any other planned modification is complete.

other_planned_modifications

boolean

Indicates whether there are there other modifications planned for the generator.

owned_by_non_utility

boolean

Whether any part of generator is owned by a nonutilty

ownership_code

string

Identifies the ownership for each generator.

planned_derate_date

date

Planned effective month that the generator is scheduled to enter operation after the derate modification.

planned_energy_source_code_1

string

New energy source code for the planned repowered generator.

planned_generator_retirement_date

date

Planned effective date of the scheduled retirement of the generator.

planned_modifications

boolean

Indicates whether there are any planned capacity uprates/derates, repowering, other modifications, or generator retirements scheduled for the next 5 years.

planned_net_summer_capacity_derate_mw

number

Decrease in summer capacity expected to be realized from the derate modification to the equipment.

planned_net_summer_capacity_uprate_mw

number

Increase in summer capacity expected to be realized from the modification to the equipment.

planned_net_winter_capacity_derate_mw

number

Decrease in winter capacity expected to be realized from the derate modification to the equipment.

planned_net_winter_capacity_uprate_mw

number

Increase in winter capacity expected to be realized from the uprate modification to the equipment.

planned_new_capacity_mw

number

The expected new namplate capacity for the generator.

planned_new_prime_mover_code

string

New prime mover for the planned repowered generator.

planned_repower_date

date

Planned effective date that the generator is scheduled to enter operation after the repowering is complete.

planned_uprate_date

date

Planned effective date that the generator is scheduled to enter operation after the uprate modification.

plant_id_eia

integer

The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.

prime_mover_code

string

Code for the type of prime mover (e.g. CT, CG)

reactive_power_output_mvar

number

Reactive Power Output (MVAr)

report_date

date

Date reported.

startup_source_code_1

string

The code representing the first, second, third or fourth start-up and flame stabilization energy source used by the combustion unit(s) associated with this generator.

startup_source_code_2

string

The code representing the first, second, third or fourth start-up and flame stabilization energy source used by the combustion unit(s) associated with this generator.

startup_source_code_3

string

The code representing the first, second, third or fourth start-up and flame stabilization energy source used by the combustion unit(s) associated with this generator.

startup_source_code_4

string

The code representing the first, second, third or fourth start-up and flame stabilization energy source used by the combustion unit(s) associated with this generator.

summer_capacity_estimate

boolean

Whether the summer capacity value was an estimate

summer_capacity_mw

number

The net summer capacity.

summer_estimated_capability_mw

number

EIA estimated summer capacity (in MWh).

synchronized_transmission_grid

boolean

Indicates whether standby generators (SB status) can be synchronized to the grid.

technology_description

string

High level description of the technology used by the generator to produce electricity.

time_cold_shutdown_full_load_code

string

The minimum amount of time required to bring the unit to full load from shutdown.

turbines_inverters_hydrokinetics

integer

Number of wind turbines, or hydrokinetic buoys.

turbines_num

integer

Number of wind turbines, or hydrokinetic buoys.

uprate_derate_completed_date

date

The date when the uprate or derate was completed.

uprate_derate_during_year

boolean

Was an uprate or derate completed on this generator during the reporting year?

utility_id_eia

integer

The EIA Utility Identification number.

winter_capacity_estimate

boolean

Whether the winter capacity value was an estimate

winter_capacity_mw

number

The net winter capacity.

winter_estimated_capability_mw

number

EIA estimated winter capacity (in MWh).

core_eia860__scd_generators_energy_storage#

Noticed any irregularities with this table? File a report.

Slowly changing dimension (SCD) table describing attributes of energy storage which may vary from year to year.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA Form 860 – Annual Electric Generator Report (Schedule 3)

Primary key:

plant_id_eia, generator_id, report_date

Additional Details#

This table includes only those values that are unique to energy storage. The rest of the columns that are reported in the EIA-860 Energy Storage tabs are included in core_eia860__scd_generators and core_eia__entity_generators.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860__scd_generators_energy_storage.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860__scd_generators_energy_storage.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia860__scd_generators_energy_storage.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860__scd_generators_energy_storage.parquet"
)

Columns#

Field Name

Type

Description

generator_id

string

Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string!

generator_id_direct_support_1

string

The EIA Generator ID of the primary unit whose generation this energy storage device is intended to firm or store.

generator_id_direct_support_2

string

The EIA Generator ID of the secondary unit whose generation this energy storage device is intended to firm or store.

generator_id_direct_support_3

string

The EIA Generator ID of the tertiary unit whose generation this energy storage device is intended to firm or store.

is_ac_coupled

boolean

Indicates if this energy storage device is AC-coupled (means the energy storage device and the PV system are not installed on the same side of an inverter).

is_dc_coupled

boolean

Indicates if this energy storage device is DC-coupled (means the energy storage device and the PV system are on the same side of an inverter and the battery can still charge from the grid).

is_dc_coupled_tightly

boolean

Indicates if this energy storage device is DC tightly coupled (means the energy storage device and the PV system are on the same side of an inverter and the battery cannot charge from the grid).

is_direct_support

boolean

Indicates if this energy storage device is intended for dedicated generator firming or storing excess generation of other units.

is_independent

boolean

Indicates if this energy storage device is independent (not coupled with another generators)

is_transmission_and_distribution_asset_support

boolean

Indicate if the energy storage system is intended to support a specific substation, transmission or distribution asset.

max_charge_rate_mw

number

Maximum charge rate in MW.

max_discharge_rate_mw

number

Maximum discharge rate in MW.

plant_id_eia

integer

The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.

plant_id_eia_direct_support_1

integer

The EIA Plant ID of the primary unit whose generation this energy storage device is intended to firm or store.

plant_id_eia_direct_support_2

integer

The EIA Plant ID of the secondary unit whose generation this energy storage device is intended to firm or store.

plant_id_eia_direct_support_3

integer

The EIA Plant ID of the tertiary unit whose generation this energy storage device is intended to firm or store.

report_date

date

Date reported.

served_arbitrage

boolean

Whether the energy storage device served arbitrage applications during the reporting year

served_backup_power

boolean

Whether the energy storage device served backup power applications during the reporting year.

served_co_located_renewable_firming

boolean

Whether the energy storage device served renewable firming applications during the reporting year.

served_frequency_regulation

boolean

Whether the energy storage device served frequency regulation applications during the reporting year.

served_load_following

boolean

Whether the energy storage device served load following applications during the reporting year.

served_load_management

boolean

Whether the energy storage device served load management applications during the reporting year.

served_ramping_spinning_reserve

boolean

Whether the this energy storage device served ramping / spinning reserve applications during the reporting year.

served_system_peak_shaving

boolean

Whether the energy storage device served system peak shaving applications during the reporting year.

served_transmission_and_distribution_deferral

boolean

Whether the energy storage device served renewable firming applications during the reporting year.

served_voltage_or_reactive_power_support

boolean

Whether the energy storage device served voltage or reactive power support applications during the reporting year.

storage_enclosure_code

string

A code representing the enclosure type that best describes where the generator is located.

storage_technology_code_1

string

The electro-chemical storage technology used for this battery applications.

storage_technology_code_2

string

The electro-chemical storage technology used for this battery applications.

storage_technology_code_3

string

The electro-chemical storage technology used for this battery applications.

storage_technology_code_4

string

The electro-chemical storage technology used for this battery applications.

stored_excess_wind_and_solar_generation

boolean

Whether the energy storage device was used to store excess wind/solar generation during the reporting year.

core_eia860__scd_generators_multifuel#

Noticed any irregularities with this table? File a report.

Slowly changing dimension (SCD) table describing attributes of generators that may vary from year to year, pertaining to fuel-switching and the use of multiple fuels.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA Form 860 – Annual Electric Generator Report (Schedule 3)

Primary key:

report_date, utility_id_eia, generator_id, plant_id_eia

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860__scd_generators_multifuel.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860__scd_generators_multifuel.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia860__scd_generators_multifuel.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860__scd_generators_multifuel.parquet"
)

Columns#

Field Name

Type

Description

can_burn_multiple_fuels

boolean

Whether the generator can burn multiple fuels.

can_cofire_100_oil

boolean

Whether the generator can co-fire 100 oil.

can_cofire_fuels

boolean

Whether the generator can co-fire fuels.

can_cofire_oil_and_gas

boolean

Whether the generator can co-fire oil and gas.

can_fuel_switch

boolean

Whether a unit is able to switch fuels.

can_switch_oil_gas

boolean

Whether the generator can switch between oil and natural gas.

can_switch_when_operating

boolean

Indicates whether a fuel switching generator can switch fuels while operating.

capacity_mw

number

Total installed (nameplate) capacity, in megawatts.

cofire_energy_source_1

string

The codes representing the type of fuel that will be co-fired.

cofire_energy_source_2

string

The codes representing the type of fuel that will be co-fired.

cofire_energy_source_3

string

The codes representing the type of fuel that will be co-fired.

cofire_energy_source_4

string

The codes representing the type of fuel that will be co-fired.

cofire_energy_source_5

string

The codes representing the type of fuel that will be co-fired.

cofire_energy_source_6

string

The codes representing the type of fuel that will be co-fired.

county

string

County name.

current_planned_generator_operating_date

date

The most recently updated effective date on which the generator is scheduled to start operation

energy_source_code_1

string

The code representing the most predominant type of energy that fuels the generator.

energy_source_code_2

string

The code representing the second most predominant type of energy that fuels the generator

fuel_switch_energy_source_1

string

The codes representing the type of fuel that will be able to be used as a sole source of fuel for this unit.

fuel_switch_energy_source_2

string

The codes representing the type of fuel that will be able to be used as a sole source of fuel for this unit.

fuel_switch_energy_source_3

string

The codes representing the type of fuel that will be able to be used as a sole source of fuel for this unit.

fuel_switch_energy_source_4

string

The codes representing the type of fuel that will be able to be used as a sole source of fuel for this unit.

fuel_switch_energy_source_5

string

The codes representing the type of fuel that will be able to be used as a sole source of fuel for this unit.

fuel_switch_energy_source_6

string

The codes representing the type of fuel that will be able to be used as a sole source of fuel for this unit.

generator_id

string

Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string!

has_air_permit_limits

boolean

Whether air permit limits are a factor that limits the generator’s ability to switch between oil and natural gas.

has_factors_that_limit_switching

boolean

Whether there are factors that limit the generator’s ability to switch between oil and natural gas.

has_other_factors_that_limit_switching

boolean

Whether there are factors other than air permit limits and storage that limit the generator’s ability to switch between oil and natural gas.

has_regulatory_limits

boolean

Whether there are factors that limit the operation of the generator when running on 100 percent oil

has_storage_limits

boolean

Whether limited on-site fuel storage is a factor that limits the generator’s ability to switch between oil and natural gas.

max_oil_heat_input

number

The maximum oil heat input (percent of MMBTUs) expected for proposed unit when co-firing with natural gas

max_oil_output_mw

number

The maximum output (net MW) expected for proposed unit, when making the maximum use of oil and co-firing natural gas.

net_summer_capacity_natural_gas_mw

number

The maximum net summer output achievable when running on natural gas.

net_summer_capacity_oil_mw

number

The maximum net summer output achievable when running on oil.

net_winter_capacity_natural_gas_mw

number

The maximum net winter output achievable when running on natural gas.

net_winter_capacity_oil_mw

number

The maximum net summer output achievable when running on oil.

operational_status_code

string

The operating status of the asset.

plant_id_eia

integer

The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.

plant_name_eia

string

Plant name.

prime_mover_code

string

Code for the type of prime mover (e.g. CT, CG)

report_date

date

Date reported.

sector_id_eia

integer

EIA assigned sector ID, corresponding to high level NAICS sector, designated by the primary purpose, regulatory status and plant-level combined heat and power status

sector_name_eia

string

EIA assigned sector name, corresponding to high level NAICS sector, designated by the primary purpose, regulatory status and plant-level combined heat and power status

state

string

Two letter US state abbreviation.

summer_capacity_mw

number

The net summer capacity.

technology_description

string

High level description of the technology used by the generator to produce electricity.

time_to_switch_gas_to_oil

string

The time required to switch the generator from running 100 percent natural gas to running 100 percent oil.

time_to_switch_oil_to_gas

string

The time required to switch the generator from running 100 percent oil to running 100 percent natural gas.

utility_id_eia

integer

The EIA Utility Identification number.

utility_name_eia

string

The name of the utility.

winter_capacity_mw

number

The net winter capacity.

core_eia860__scd_generators_solar#

Noticed any irregularities with this table? File a report.

Slowly changing dimension (SCD) table describing attributes of solar generators that may vary from year to year.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA Form 860 – Annual Electric Generator Report (Schedule 3.3)

Primary key:

plant_id_eia, generator_id, report_date

Additional Details#

This table includes only those values that are unique to solar generators. The rest of the columns that are reported in the EIA-860 Solar tabs are included in core_eia860__scd_generators and core_eia__entity_generators.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860__scd_generators_solar.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860__scd_generators_solar.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia860__scd_generators_solar.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860__scd_generators_solar.parquet"
)

Columns#

Field Name

Type

Description

azimuth_angle_deg

number

Indicates the azimuth angle of the unit for fixed tilt or single-axis technologies.

generator_id

string

Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string!

net_metering_capacity_mwdc

number

The DC megawatt capacity that is part of a net metering agreement.

plant_id_eia

integer

The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.

report_date

date

Date reported.

standard_testing_conditions_capacity_mwdc

number

The net capacity of this photovoltaic generator in direct current under standard test conditions (STC) of 1000 W/m^2 solar irradiance and 25 degrees Celsius PV module temperature. This was only reported in 2013 and 2014.

tilt_angle_deg

number

Indicates the tilt angle of the unit for fixed tilt or single-axis technologies.

uses_bifacial_panels

boolean

Indicates whether bifacial solar panels are used at this solar generating unit.

uses_material_crystalline_silicon

boolean

Indicates whether any solar photovoltaic panels at this generator are made of crystalline silicon.

uses_material_other

boolean

Indicates whether any solar photovoltaic panels at this generator are made of other materials.

uses_material_thin_film_a_si

boolean

Indicates whether any solar photovoltaic panels at this generator are made of thin-film amorphous silicon (A-Si).

uses_material_thin_film_cdte

boolean

Indicates whether any solar photovoltaic panels at this generator are made of thin-film cadmium telluride (CdTe).

uses_material_thin_film_cigs

boolean

Indicates whether any solar photovoltaic panels at this generator are made of thin-film copper indium gallium diselenide (CIGS).

uses_material_thin_film_other

boolean

Indicates whether any solar photovoltaic panels at this generator are made of other thin-film material.

uses_net_metering_agreement

boolean

Indicates if the output from this generator is part of a net metering agreement.

uses_technology_dish_engine

boolean

Indicates whether dish engines are used at this solar generating unit.

uses_technology_dual_axis_tracking

boolean

Indicates whether dual-axis tracking technologies are used at this solar generating unit.

uses_technology_east_west_fixed_tilt

boolean

Indicates whether east west fixed tilt technologies are used at this solar generating unit.

uses_technology_fixed_tilt

boolean

Indicates whether fixed tilt technologies are used at this solar generating unit.

uses_technology_lenses_mirrors

boolean

Indicates whether lenses or mirrors are used at this solar generating unit.

uses_technology_linear_fresnel

boolean

Indicates whether linear fresnel technologies are used at this solar generating unit.

uses_technology_other

boolean

Indicates whether other solar technologies are used at this solar generating unit.

uses_technology_parabolic_trough

boolean

Indicates whether parabolic trough technologies s are used at this solar generating unit.

uses_technology_power_tower

boolean

Indicates whether power towers are used at this solar generating unit.

uses_technology_single_axis_tracking

boolean

Indicates whether single-axis tracking technologies are used at this solar generating unit.

uses_virtual_net_metering_agreement

boolean

Indicates if the output from this generator is part of a virtual net metering agreement.

virtual_net_metering_capacity_mwdc

number

The DC capacity in MW that is part of a virtual net metering agreement.

core_eia860__scd_generators_wind#

Noticed any irregularities with this table? File a report.

Slowly changing dimension (SCD) table describing attributes of wind generators that may vary from year to year.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA Form 860 – Annual Electric Generator Report (Schedule 3.2)

Primary key:

plant_id_eia, generator_id, report_date

Additional Details#

This table includes only those values that are unique to wind generators. The rest of the columns that are reported in the EIA-860 Wind tabs are included in core_eia860__scd_generators and core_eia__entity_generators.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860__scd_generators_wind.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860__scd_generators_wind.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia860__scd_generators_wind.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860__scd_generators_wind.parquet"
)

Columns#

Field Name

Type

Description

design_wind_speed_mph

number

Average annual wind speed that turbines at this wind site were designed for.

generator_id

string

Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string!

obstacle_id_faa

string

The Federal Aviation Administration (FAA) obstacle number assigned to this generator. If more than one obstacle number exists, the one that best represents the turbines. References the obstacle numbers reported in the FAA’s Digital Obstacle File: https://www.faa.gov/air_traffic/flight_info/aeronav/digital_products/dof/ This field was only reported from 2013 through 2015.

plant_id_eia

integer

The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.

predominant_turbine_manufacturer

string

Name of predominant manufacturer of turbines at this generator.

predominant_turbine_model

string

Predominant model number of turbines at this generator.

report_date

date

Date reported.

turbine_hub_height_feet

number

The hub height of turbines at this generator. If more than one value exists, the one that best represents the turbines.

wind_quality_class

integer

The wind quality class for turbines at this generator. See table core_eia__codes_wind_quality_class for specifications about each class.

core_eia860__scd_ownership#

Noticed any irregularities with this table? File a report.

Slowly changing dimension (SCD) table describing attributes of generator ownership. Includes only jointly or third-party owned generators.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA Form 860 – Annual Electric Generator Report (Schedule 4)

Primary key:

report_date, plant_id_eia, generator_id, owner_utility_id_eia

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860__scd_ownership.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860__scd_ownership.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia860__scd_ownership.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860__scd_ownership.parquet"
)

Columns#

Field Name

Type

Description

data_maturity

string

Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.

fraction_owned

number

Proportion of generator ownership attributable to this utility.

generator_id

string

Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string!

operator_utility_id_eia

integer

The EIA utility Identification number for the operator utility.

owner_city

string

City of owner.

owner_country

string

Three letter ISO-3166 country code.

owner_state

string

Two letter ISO-3166 political subdivision code.

owner_street_address

string

Steet address of owner.

owner_utility_id_eia

integer

The EIA utility Identification number for the owner company that is responsible for the day-to-day operations of the generator, not the operator utility.

owner_utility_name_eia

string

The name of the EIA owner utility.

owner_zip_code

string

Zip code of owner.

plant_id_eia

integer

The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.

report_date

date

Date reported.

core_eia860__scd_plants#

Noticed any irregularities with this table? File a report.

Slowly changing dimension (SCD) table describing attributes of plants which may vary from year to year. Compiled from across all EIA-860 and EIA-923 data.

Most-recent data:

2026

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA Form 860 – Annual Electric Generator Report

Primary key:

plant_id_eia, report_date

Usage Warnings#

  • Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.

  • Contains information from multiple raw inputs.

Additional Details#

This is one of two tables where canonical values for plants are set. It contains values which are expected to vary slowly, while core_eia__entity_plants contains those expected to remain fixed. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process. All tables downstream of this one inherit the canonical values established here.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860__scd_plants.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860__scd_plants.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia860__scd_plants.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860__scd_plants.parquet"
)

Columns#

Field Name

Type

Description

ash_impoundment

boolean

Is there an ash impoundment (e.g. pond, reservoir) at the plant?

ash_impoundment_lined

boolean

If there is an ash impoundment at the plant, is the impoundment lined?

ash_impoundment_status

string

If there is an ash impoundment at the plant, the ash impoundment status as of December 31 of the reporting year.

balancing_authority_code_eia

string

EIA short code identifying a balancing authority. May include Canadian and Mexican BAs.

balancing_authority_name_eia

string

Name of the balancing authority.

data_maturity

string

Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.

datum

string

Geodetic coordinate system identifier (e.g. NAD27, NAD83, or WGS84).

energy_storage

boolean

Indicates if the facility has energy storage capabilities.

ferc_cogen_docket_no

string

The docket number relating to the FERC cogenerator status. See FERC Form 556.

ferc_cogen_status

boolean

Indicates whether the plant has FERC qualifying facility cogenerator status. See FERC Form 556.

ferc_exempt_wholesale_generator

boolean

Indicates whether the plant has FERC qualifying facility exempt wholesale generator status

ferc_exempt_wholesale_generator_docket_no

string

The docket number relating to the FERC qualifying facility exempt wholesale generator status.

ferc_qualifying_facility_docket_no

string

The docket number relating to the FERC qualifying facility cogenerator status. See FERC Form 556.

ferc_small_power_producer

boolean

Indicates whether the plant has FERC qualifying facility small power producer status. See FERC Form 556.

ferc_small_power_producer_docket_no

string

The docket number relating to the FERC qualifying facility small power producer status. See FERC Form 556.

grid_voltage_1_kv

number

Plant’s grid voltage at point of interconnection to transmission or distribution facilities

grid_voltage_2_kv

number

Plant’s grid voltage at point of interconnection to transmission or distribution facilities

grid_voltage_3_kv

number

Plant’s grid voltage at point of interconnection to transmission or distribution facilities

has_net_metering

boolean

Whether the plant has a net metering agreement in effect during the reporting year. (Only displayed for facilities that report the sun or wind as an energy source). This field was only reported up until 2015

iso_rto_code

string

The code of the plant’s ISO or RTO. NA if not reported in that year.

liquefied_natural_gas_storage

boolean

Indicates if the facility have the capability to store the natural gas in the form of liquefied natural gas.

natural_gas_local_distribution_company

string

Names of Local Distribution Company (LDC), connected to natural gas burning power plants.

natural_gas_pipeline_name_1

string

The name of the owner or operator of natural gas pipeline that connects directly to this facility or that connects to a lateral pipeline owned by this facility.

natural_gas_pipeline_name_2

string

The name of the owner or operator of natural gas pipeline that connects directly to this facility or that connects to a lateral pipeline owned by this facility.

natural_gas_pipeline_name_3

string

The name of the owner or operator of natural gas pipeline that connects directly to this facility or that connects to a lateral pipeline owned by this facility.

natural_gas_storage

boolean

Indicates if the facility have on-site storage of natural gas.

nerc_region

string

NERC region in which the plant is located

pipeline_notes

string

Additional owner or operator of natural gas pipeline.

plant_id_eia

integer

The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.

primary_purpose_id_naics

integer

North American Industry Classification System (NAICS) code that best describes the primary purpose of the reporting plant

regulatory_status_code

string

Indicates whether the plant is regulated or non-regulated.

report_date

date

Date reported.

reporting_frequency_code

string

Code that specifies what time period data has to be reported (i.e. monthly data or annual totals) and how often the power plant reports this data to EIA. See reporting_frequencies_eia for more details.

sector_id_eia

integer

EIA assigned sector ID, corresponding to high level NAICS sector, designated by the primary purpose, regulatory status and plant-level combined heat and power status

sector_name_eia

string

EIA assigned sector name, corresponding to high level NAICS sector, designated by the primary purpose, regulatory status and plant-level combined heat and power status

service_area

string

Service area in which plant is located; for unregulated companies, it’s the electric utility with which plant is interconnected

transmission_distribution_owner_id

integer

EIA-assigned code for owner of transmission/distribution system to which the plant is interconnected.

transmission_distribution_owner_name

string

Name of the owner of the transmission or distribution system to which the plant is interconnected.

transmission_distribution_owner_state

string

State location for owner of transmission/distribution system to which the plant is interconnected.

utility_id_eia

integer

The EIA Utility Identification number.

water_source

string

Name of water source associated with the plant.

core_eia860__scd_utilities#

Noticed any irregularities with this table? File a report.

Slowly changing dimension (SCD) table describing attributes of utilities which may vary from year to year. Compiled from all EIA data.

Most-recent data:

2026

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA Form 860 – Annual Electric Generator Report

Primary key:

utility_id_eia, report_date

Usage Warnings#

  • Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.

  • Contains information from multiple raw inputs.

Additional Details#

This is one of two tables where canonical values for utilities are set. It contains values which are expected to vary slowly, while core_eia__entity_utilities contains those expected to remain fixed. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process. All tables downstream of this one inherit the canonical values established here.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860__scd_utilities.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860__scd_utilities.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia860__scd_utilities.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860__scd_utilities.parquet"
)

Columns#

Field Name

Type

Description

address_2

string

Second line of the address.

attention_line

string

Mail attention name of the operator/owner.

city

string

Name of the city.

contact_firstname

string

First name of utility contact 1.

contact_firstname_2

string

First name of utility contact 2.

contact_lastname

string

Last name of utility contact 1.

contact_lastname_2

string

Last name of utility contact 2.

contact_title

string

Title of of utility contact 1.

contact_title_2

string

Title of utility contact 2.

data_maturity

string

Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.

entity_type

string

Entity type of principal owner.

phone_extension

string

Phone extension for utility contact 1

phone_extension_2

string

Phone extension for utility contact 2

phone_number

string

Phone number for utility contact 1.

phone_number_2

string

Phone number for utility contact 2.

plants_reported_asset_manager

boolean

Is the reporting entity an asset manager of power plants reported on Schedule 2 of the form?

plants_reported_operator

boolean

Is the reporting entity an operator of power plants reported on Schedule 2 of the form?

plants_reported_other_relationship

boolean

Does the reporting entity have any other relationship to the power plants reported on Schedule 2 of the form?

plants_reported_owner

boolean

Is the reporting entity an owner of power plants reported on Schedule 2 of the form?

report_date

date

Date reported.

state

string

Two letter US state abbreviation.

street_address

string

Physical street address.

utility_id_eia

integer

The EIA Utility Identification number.

zip_code

string

Five digit US Zip Code.

zip_code_4

string

Four digit US Zip Code suffix.

core_eia860m__changelog_generators#

Noticed any irregularities with this table? File a report.

Changelog table tracking changes in generator characteristics reported in EIA-860M.

Most-recent data:

2026

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA Form 860M – Monthly Update to the Annual Electric Generator Report

Primary key:

plant_id_eia, generator_id, report_date

Additional Details#

EIA-860M includes generator tables with the most up-to-date catalog of EIA generators and their operational status and other generator characteristics. EIA-860M is reported monthly, although for the vast majority of the generators nothing changes month-to-month. This table is a changelog of that monthly reported generator data. There is a record corresponding to the first instance of a generator and associated characteristics with a report_date column and a valid_until_date column. Whenever any of the reported EIA-860M data was changed for a record, there will be a new changelog record with a new report_date.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860m__changelog_generators.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860m__changelog_generators.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia860m__changelog_generators.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia860m__changelog_generators.parquet"
)

Columns#

Field Name

Type

Description

balancing_authority_code_eia

string

EIA short code identifying a balancing authority. May include Canadian and Mexican BAs.

capacity_mw

number

Total installed (nameplate) capacity, in megawatts.

county

string

County name.

current_planned_generator_operating_date

date

The most recently updated effective date on which the generator is scheduled to start operation

data_maturity

string

Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.

energy_source_code_1

string

The code representing the most predominant type of energy that fuels the generator.

energy_storage_capacity_mwh

number

Energy storage capacity in MWh (e.g. for batteries).

fuel_type_code_pudl

string

Simplified fuel type code used in PUDL

generator_id

string

Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string!

generator_operating_date

date

Date the generator began commercial operation. If harvested values are inconsistent, we default to using the most recently reported date.

generator_retirement_date

date

Date of the scheduled or effected retirement of the generator.

latitude

number

Latitude of the plant’s location, in degrees.

longitude

number

Longitude of the plant’s location, in degrees.

net_capacity_mwdc

number

Generation capacity in megawatts of direct current that is subject to a net metering agreement. Typically used for behind-the-meter solar PV.

operational_status

string

The operating status of the asset. For generators this is based on which tab the generator was listed in in EIA 860.

operational_status_code

string

The operating status of the asset.

planned_derate_date

date

Planned effective month that the generator is scheduled to enter operation after the derate modification.

planned_generator_retirement_date

date

Planned effective date of the scheduled retirement of the generator.

planned_net_summer_capacity_derate_mw

number

Decrease in summer capacity expected to be realized from the derate modification to the equipment.

planned_net_summer_capacity_uprate_mw

number

Increase in summer capacity expected to be realized from the modification to the equipment.

planned_repower_date

date

Planned effective date that the generator is scheduled to enter operation after the repowering is complete.

planned_uprate_date

date

Planned effective date that the generator is scheduled to enter operation after the uprate modification.

plant_id_eia

integer

The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.

plant_name_eia

string

Plant name.

prime_mover_code

string

Code for the type of prime mover (e.g. CT, CG)

report_date

date

Date reported.

sector_id_eia

integer

EIA assigned sector ID, corresponding to high level NAICS sector, designated by the primary purpose, regulatory status and plant-level combined heat and power status

state

string

Two letter US state abbreviation.

summer_capacity_mw

number

The net summer capacity.

technology_description

string

High level description of the technology used by the generator to produce electricity.

utility_id_eia

integer

The EIA Utility Identification number.

utility_name_eia

string

The name of the utility.

valid_until_date

date

The record in the changelog is valid until this date. The record is valid from the report_date up until but not including the valid_until_date.

winter_capacity_mw

number

The net winter capacity.

core_eia861__assn_balancing_authority#

Noticed any irregularities with this table? File a report.

Association table providing connections between state, balancing authority, and utility in a given year.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA Form 861 – Annual Electric Power Industry Report

Primary key:

report_date, balancing_authority_id_eia, utility_id_eia, state

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__assn_balancing_authority.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__assn_balancing_authority.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__assn_balancing_authority.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__assn_balancing_authority.parquet"
)

Columns#

Field Name

Type

Description

balancing_authority_id_eia

integer

EIA balancing authority ID. This is often (but not always!) the same as the utility ID associated with the same legal entity.

report_date

date

Date reported.

state

string

Two letter US state abbreviation.

utility_id_eia

integer

The EIA Utility Identification number.

core_eia861__assn_utility#

Noticed any irregularities with this table? File a report.

Association table providing connections between utility and state in a given year.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA Form 861 – Annual Electric Power Industry Report

Primary key:

report_date, utility_id_eia, state

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__assn_utility.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__assn_utility.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__assn_utility.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__assn_utility.parquet"
)

Columns#

Field Name

Type

Description

report_date

date

Date reported.

state

string

Two letter US state abbreviation.

utility_id_eia

integer

The EIA Utility Identification number.

core_eia861__yearly_advanced_metering_infrastructure#

Noticed any irregularities with this table? File a report.

Annual time series of advanced metering infrastructure (AMI) and automated meter reading (AMR) by state, sector, and balancing authority.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA Form 861 – Annual Electric Power Industry Report

Primary key:

balancing_authority_code_eia, customer_class, report_date, state, utility_id_eia

Additional Details#

The energy served (in MWH) for AMI systems is provided. Form EIA-861 respondents also report the number of standard meters (non AMR/AMI) in their system. Historical Changes: We started collecting the number of standard meters in 2013. The monthly survey collected these data from January 2011 to January 2017.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_advanced_metering_infrastructure.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_advanced_metering_infrastructure.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__yearly_advanced_metering_infrastructure.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_advanced_metering_infrastructure.parquet"
)

Columns#

Field Name

Type

Description

advanced_metering_infrastructure

integer

Number of meters that measure and record usage data at a minimum, in hourly intervals and provide usage data at least daily to energy companies and may also provide data to consumers. Data are used for billing and other purposes. Advanced meters include basic hourly interval meters and extend to real-time meters with built-in two-way communication capable of recording and transmitting instantaneous data.

automated_meter_reading

integer

Number of meters that collect data for billing purposes only and transmit this data one way, usually from the customer to the distribution utility.

balancing_authority_code_eia

string

EIA short code identifying a balancing authority. May include Canadian and Mexican BAs.

customer_class

string

High level categorization of customer type (e.g., commercial, residential).

daily_digital_access_customers

integer

Number of customers able to access daily energy usage through a webportal or other electronic means.

data_maturity

string

Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.

direct_load_control_customers

integer

Number of customers with direct load control: a A demand response activity by which the program sponsor remotely shuts down or cycles a customer’s electrical equipment (e.g. air conditioner, water heater) on short notice.

energy_served_ami_mwh

number

Amount of energy served through AMI meters. AMI meters can transmit data in both directions, between the delivery entity and the customer.

entity_type

string

Entity type of principal owner.

home_area_network

integer

Number of AMI meters with home area network (HAN) gateway enabled.

non_amr_ami

integer

Number of non-AMR/AMI meters. Usually electromechanical or solid state meters measuring aggregated kWh where data are manually retrieved over monthly billing cycles for billing purposes only. Standard meters may also include functions to measure time-of-use and/or demand with data manually retrieved over monthly billing cycles.

report_date

date

Date reported.

short_form

boolean

Whether the reported information comes from the short form. In the case of form EIA 861, a shorter version of the form was created in 2012 to reduce respondent burden on smaller utilities and increase our processing efficiency.

state

string

Two letter US state abbreviation.

utility_id_eia

integer

The EIA Utility Identification number.

utility_name_eia

string

The name of the utility.

core_eia861__yearly_balancing_authority#

Noticed any irregularities with this table? File a report.

Annual time series of balancing authorities.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA Form 861 – Annual Electric Power Industry Report

Primary key:

report_date, balancing_authority_id_eia

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_balancing_authority.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_balancing_authority.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__yearly_balancing_authority.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_balancing_authority.parquet"
)

Columns#

Field Name

Type

Description

balancing_authority_code_eia

string

EIA short code identifying a balancing authority. May include Canadian and Mexican BAs.

balancing_authority_id_eia

integer

EIA balancing authority ID. This is often (but not always!) the same as the utility ID associated with the same legal entity.

balancing_authority_name_eia

string

Name of the balancing authority.

report_date

date

Date reported.

core_eia861__yearly_demand_response#

Noticed any irregularities with this table? File a report.

Annual time series of demand response programs by state, sector, and balancing authority.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA Form 861 – Annual Electric Power Industry Report

Primary key:

balancing_authority_code_eia, customer_class, report_date, state, utility_id_eia

Additional Details#

The EIA861 demand-side management (DSM) table (split into three normalized tables in PUDL) contain data through 2012. The form changed in 2013 and split the contents of the DSM table into energy efficiency and demand response tables. Though similar, the information collected before and after 2012 are not comparable enough to combine into a singular, continuous table. We were discouraged from doing so after contacting a representative from EIA.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_demand_response.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_demand_response.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__yearly_demand_response.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_demand_response.parquet"
)

Columns#

Field Name

Type

Description

actual_peak_demand_savings_mw

number

Demand reduction actually achieved by demand response activities. Measured at the time of the company’s annual system peak hour.

balancing_authority_code_eia

string

EIA short code identifying a balancing authority. May include Canadian and Mexican BAs.

customer_class

string

High level categorization of customer type (e.g., commercial, residential).

customer_incentives_cost

number

Total cost of customer incentives in a given report year. Customer incentives are the total financial value provided to a customer for program participation, whether, for example, cash payment, or lowered tariff rates relative to non-participants, in-kind services (e.g. design work), or other benefits directly provided to the customer for their program participation.

customers

number

Number of customers.

data_maturity

string

Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.

energy_savings_mwh

number

The energy savings incurred in a given reporting year by participation in demand response programs.

other_costs

number

Additional costs.

potential_peak_demand_savings_mw

number

The total demand savings that could occur at the time of the system peak hour assuming all demand response is called.

report_date

date

Date reported.

short_form

boolean

Whether the reported information comes from the short form. In the case of form EIA 861, a shorter version of the form was created in 2012 to reduce respondent burden on smaller utilities and increase our processing efficiency.

state

string

Two letter US state abbreviation.

utility_id_eia

integer

The EIA Utility Identification number.

utility_name_eia

string

The name of the utility.

core_eia861__yearly_demand_response_water_heater#

Noticed any irregularities with this table? File a report.

Annual time series of grid-connected water heaters enrolled in demand response programs.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA Form 861 – Annual Electric Power Industry Report

Primary key:

balancing_authority_code_eia, report_date, state, utility_id_eia

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_demand_response_water_heater.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_demand_response_water_heater.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__yearly_demand_response_water_heater.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_demand_response_water_heater.parquet"
)

Columns#

Field Name

Type

Description

balancing_authority_code_eia

string

EIA short code identifying a balancing authority. May include Canadian and Mexican BAs.

data_maturity

string

Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.

num_water_heaters

integer

The number of grid-enabled water heaters added to the respondent’s program this year - if the respondent has DSM program for grid-enabled water heaters (as defined by DOE’s Office of Energy Efficiency and Renewable Energy).

report_date

date

Date reported.

state

string

Two letter US state abbreviation.

utility_id_eia

integer

The EIA Utility Identification number.

core_eia861__yearly_demand_side_management_ee_dr#

Noticed any irregularities with this table? File a report.

Annual time series of The impact of energy efficiency and load management programs on total energy sold (MWh) and peak demand (MW) by customer class.

Most-recent data:

2012

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA Form 861 – Annual Electric Power Industry Report

Primary key:

This table has no primary key.

Usage Warnings#

  • The original data is no longer being collected or reported in this way.

Additional Details#

The raw EIA861 demand-side management (DSM) table (split into three normalized tables in PUDL) contain data through 2012. The form changed in 2013 and split the contents of the DSM table into energy efficiency and demand response tables. Though similar, the information collected before and after 2012 are not comparable enough to combine into a singular, continuous table. We were discouraged from doing so after contacting a representative from EIA.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_demand_side_management_ee_dr.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_demand_side_management_ee_dr.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__yearly_demand_side_management_ee_dr.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_demand_side_management_ee_dr.parquet"
)

Columns#

Field Name

Type

Description

annual_indirect_program_cost

number

Costs that have not been included in any program category, but could be meaningfully identified with operating the company’s DSM programs (e.g., Administrative, Marketing, Monitoring & Evaluation, Company-Earned Incentives, Other).

annual_total_cost

number

The sum of direct program costs, indirect program costs, and incentive payments associated with utility demand side management programs.

customer_class

string

High level categorization of customer type (e.g., commercial, residential).

data_maturity

string

Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.

energy_efficiency_annual_actual_peak_reduction_mw

number

The peak reduction incurred in a given reporting year by all participants in efficiency programs.

energy_efficiency_annual_direct_cost

number

The sum of actual direct costs (excluding incentive payments) incurred from energy efficiency programs in a given reporting year.

energy_efficiency_annual_effects_mwh

number

The change in energy use incurred in a given reporting year by all participants in energy efficiency programs.

energy_efficiency_annual_incentive_cost

number

The cost of incentive payments incurred in a given reporting year from energy efficiency programs. Incentives are the total financial value provided to a customer for program participation, whether cash payment, in-kind services (e.g. design work), or other benefits directly provided customer for their program participation.

energy_efficiency_incremental_actual_peak_reduction_mw

number

The peak reduction incurred in a given reporting year by new participants in existing energy efficiency programs and all participants in new energy efficiency programs.

energy_efficiency_incremental_effects_mwh

number

The change in energy use incurred in a given reporting year by new participants in existing energy efficiency programs and all participants in new energy efficiency programs.

load_management_annual_actual_peak_reduction_mw

number

The peak reduction incurred in a given reporting year by all participants in load management programs.

load_management_annual_direct_cost

number

The sum of actual direct costs (excluding incentive payments) incurred from load management programs in a given reporting year.

load_management_annual_effects_mwh

number

The change in energy use incurred in a given reporting year by all participants in load management programs.

load_management_annual_incentive_cost

number

The cost of incentive payments incurred in a given reporting year from load management programs. Incentives are the total financial value provided to a customer for program participation, whether cash payment, in-kind services (e.g. design work), or other benefits directly provided customer for their program participation.

load_management_annual_potential_peak_reduction_mw

number

The potential amount of peak reduction that could be incurred in a given reporting year by all participants in load management programs.

load_management_incremental_actual_peak_reduction_mw

number

The peak reduction incurred in a given reporting year by new participants in existing load management programs and all participants in new load management programs.

load_management_incremental_effects_mwh

number

The change in energy use incurred in a given reporting year by new participants in existing load management programs and all participants in new load management programs.

load_management_incremental_potential_peak_reduction_mw

number

The potential amount of peak reduction that could be incurred in a given reporting year by new participants in existing load management programs and all participants in new load management programs.

nerc_region

string

NERC region in which the plant is located

price_responsiveness_customers

integer

The number of customers participating in the respondent’s incentive-based demand response programs.

report_date

date

Date reported.

state

string

Two letter US state abbreviation.

time_responsiveness_customers

integer

The number of cusomters participating in the respondent’s time-based rate programs.

utility_id_eia

integer

The EIA Utility Identification number.

core_eia861__yearly_demand_side_management_misc#

Noticed any irregularities with this table? File a report.

Annual time series of demand-side management (DSM) program information.

Most-recent data:

2012

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA Form 861 – Annual Electric Power Industry Report

Primary key:

This table has no primary key.

Usage Warnings#

  • The original data is no longer being collected or reported in this way.

Additional Details#

Includes boolean fields about whether the energy savings estimates/calculations were independently verified and whether the utility runs time and or price responsive programs. Also contains information on whether any of the respondent’s DSM activities are reported under another company, and if so which one.

The raw EIA861 demand-side management (DSM) table (split into three normalized tables in PUDL) contain data through 2012. The form changed in 2013 and split the contents of the DSM table into energy efficiency and demand response tables. Though similar, the information collected before and after 2012 are not comparable enough to combine into a singular, continuous table. We were discouraged from doing so after contacting a representative from EIA.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_demand_side_management_misc.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_demand_side_management_misc.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__yearly_demand_side_management_misc.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_demand_side_management_misc.parquet"
)

Columns#

Field Name

Type

Description

data_maturity

string

Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.

energy_savings_estimates_independently_verified

boolean

Whether savings estimates are based on a forecast or the report of one or more independent evaluators.

energy_savings_independently_verified

boolean

Whether reported energy savings were verified through an independent evaluation.

entity_type

string

Entity type of principal owner.

has_time_responsive_programs

boolean

Whether the respondent operates any time-based rate programs (e.g., real-time pricing, critical peak pricing, variable peak pricing and time-of-use rates administered through a tariff).

major_program_changes

boolean

Whether there have been any major changes to the respondent’s demand-side management programs (e.g., terminated programs, new information or financing programs, or a shift to programs with dual load building objectives and energy efficiency objectives), program tracking procedures, or reporting methods that affect the comparison of demand-side management data reported on this schedule to data from previous years.

nerc_region

string

NERC region in which the plant is located

price_responsive_programs

boolean

Whether the respondent operates any incentive-based demand response programs (e.g., market incentives, financial incentives, direct load control, interruptible programs, demand bidding/buyback, emergency demand response, capacity market programs, and ancillary service market programs).

report_date

date

Date reported.

reported_as_another_company

string

The name of the company if a respondent’s demand-side management activities are reported on Schedule 6 of another company’s form.

short_form

boolean

Whether the reported information comes from the short form. In the case of form EIA 861, a shorter version of the form was created in 2012 to reduce respondent burden on smaller utilities and increase our processing efficiency.

state

string

Two letter US state abbreviation.

utility_id_eia

integer

The EIA Utility Identification number.

utility_name_eia

string

The name of the utility.

core_eia861__yearly_demand_side_management_sales#

Noticed any irregularities with this table? File a report.

Annual time series of electricity sales related to demand-side management (DSM).

Most-recent data:

2012

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA Form 861 – Annual Electric Power Industry Report

Primary key:

This table has no primary key.

Usage Warnings#

  • The original data is no longer being collected or reported in this way.

Additional Details#

The raw EIA861 demand-side management (DSM) table (split into three normalized tables in PUDL) contain data through 2012. The form changed in 2013 and split the contents of the DSM table into energy efficiency and demand response tables. Though similar, the information collected before and after 2012 are not comparable enough to combine into a singular, continuous table. We were discouraged from doing so after contacting a representative from EIA.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_demand_side_management_sales.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_demand_side_management_sales.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__yearly_demand_side_management_sales.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_demand_side_management_sales.parquet"
)

Columns#

Field Name

Type

Description

data_maturity

string

Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.

nerc_region

string

NERC region in which the plant is located

report_date

date

Date reported.

sales_for_resale_mwh

number

The amount of electricity sold for resale purposes. This entry should include sales for resale to power marketers (reported separately in previous years), full and partial requirements customers, firm power customers and nonfirm customers.

sales_to_ultimate_consumers_mwh

number

The amount of electricity sold to customers purchasing electricity for their own use and not for resale.

state

string

Two letter US state abbreviation.

utility_id_eia

integer

The EIA Utility Identification number.

core_eia861__yearly_distributed_generation_fuel#

Noticed any irregularities with this table? File a report.

Annual time series of the energy sources used for utility or customer-owned distributed generation capacity.

Most-recent data:

2015

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA Form 861 – Annual Electric Power Industry Report

Primary key:

This table has no primary key.

Usage Warnings#

  • The original data is no longer being collected or reported in this way.

Additional Details#

From 2016 on, there’s no great match for the data found in this table. You may be able to compute an analog for fuel percents using the capacity and tech class figures in core_eia861__yearly_non_net_metering_customer_fuel_class, but no corresponding consumption or generation information is available in the new format.

The raw EIA861 distributed generation (DG) table (split into three normalized tables in PUDL) was renamed in 2016 to Non-Net Metering to prevent double counting. The data in the Non-Net Metering table (2016+) are split by sector, contain fuel cell information, and convert capacity reported in DC units to AC units.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_distributed_generation_fuel.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_distributed_generation_fuel.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__yearly_distributed_generation_fuel.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_distributed_generation_fuel.parquet"
)

Columns#

Field Name

Type

Description

data_maturity

string

Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.

estimated_or_actual_fuel_data

string

Whether the reported fuel data is estimated or actual.

fuel_class

string

Fuel types specific to EIA 861 distributed generation table: [‘gas’, ‘oil’, ‘other’, ‘renewable’, ‘water’, ‘wind’, ‘wood’]

fuel_pct

number

Percent of fuel

report_date

date

Date reported.

state

string

Two letter US state abbreviation.

utility_id_eia

integer

The EIA Utility Identification number.

core_eia861__yearly_distributed_generation_misc#

Noticed any irregularities with this table? File a report.

Annual time series of the capacity and quantity of utility or customer-owned distributed generation.

Most-recent data:

2015

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA Form 861 – Annual Electric Power Industry Report

Primary key:

This table has no primary key.

Usage Warnings#

  • The original data is no longer being collected or reported in this way.

Additional Details#

From 2016 on, data similar to that found in this table can be found in core_eia861__yearly_non_net_metering_misc.

The raw EIA861 distributed generation (DG) table (split into three normalized tables in PUDL) was renamed in 2016 to Non-Net Metering to prevent double counting. The data in the Non-Net Metering table (2016+) are split by sector, contain fuel cell information, and convert capacity reported in DC units to AC units.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_distributed_generation_misc.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_distributed_generation_misc.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__yearly_distributed_generation_misc.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_distributed_generation_misc.parquet"
)

Columns#

Field Name

Type

Description

backup_capacity_mw

number

The total nameplate capacity of generators that are used only for emergency backup service.

data_maturity

string

Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.

distributed_generation_owned_capacity_mw

number

Amount of distributed generation capacity owned by the respondent.

estimated_or_actual_capacity_data

string

Whether the reported capacity data is estimated or actual.

generators_num_less_1_mw

integer

Total number of generators less than 1 MW.

generators_number

integer

Total number of generators

report_date

date

Date reported.

state

string

Two letter US state abbreviation.

total_capacity_less_1_mw

number

The total amount of capacity from generators with less than 1 MW of nameplate capacity.

utility_id_eia

integer

The EIA Utility Identification number.

utility_name_eia

string

The name of the utility.

core_eia861__yearly_distributed_generation_tech#

Noticed any irregularities with this table? File a report.

Annual time series of the technology used for utility or customer-owned distributed generation.

Most-recent data:

2015

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA Form 861 – Annual Electric Power Industry Report

Primary key:

This table has no primary key.

Usage Warnings#

  • The original data is no longer being collected or reported in this way.

Additional Details#

From 2016 on, data similar to that found in this table can be found in core_eia861__yearly_non_net_metering_customer_fuel_class.

The raw EIA861 distributed generation (DG) table (split into three normalized tables in PUDL) was renamed in 2016 to Non-Net Metering to prevent double counting. The data in the Non-Net Metering table (2016+) are split by sector, contain fuel cell information, and convert capacity reported in DC units to AC units.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_distributed_generation_tech.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_distributed_generation_tech.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__yearly_distributed_generation_tech.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_distributed_generation_tech.parquet"
)

Columns#

Field Name

Type

Description

capacity_mw

number

Total installed (nameplate) capacity, in megawatts.

data_maturity

string

Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.

estimated_or_actual_tech_data

string

Whether the reported technology data is estimated or actual.

report_date

date

Date reported.

state

string

Two letter US state abbreviation.

tech_class

string

Type of technology specific to EIA 861 distributed generation and net generation tables: [‘backup’, ‘chp_cogen’, ‘combustion_turbine’, ‘fuel_cell’, ‘hydro’, ‘internal_combustion’, ‘other’, ‘pv’, ‘steam’, ‘storage_pv’, ‘storage_nonpv’, ‘all_storage’, ‘total’, ‘virtual_pv’, ‘virtual_pv_under_1mw’, ‘virtual_pv_over_1mw’, ‘wind’].

utility_id_eia

integer

The EIA Utility Identification number.

core_eia861__yearly_distribution_systems#

Noticed any irregularities with this table? File a report.

Annual time series of distribution circuits and circuits with voltage optimization by state.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA Form 861 – Annual Electric Power Industry Report

Primary key:

This table has no primary key.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_distribution_systems.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_distribution_systems.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__yearly_distribution_systems.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_distribution_systems.parquet"
)

Columns#

Field Name

Type

Description

circuits_with_voltage_optimization

integer

Number of distribution circuits that employ voltage/VAR optimization (VVO).

data_maturity

string

Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.

distribution_circuits

integer

Total number of distribution circuits.

report_date

date

Date reported.

short_form

boolean

Whether the reported information comes from the short form. In the case of form EIA 861, a shorter version of the form was created in 2012 to reduce respondent burden on smaller utilities and increase our processing efficiency.

state

string

Two letter US state abbreviation.

utility_id_eia

integer

The EIA Utility Identification number.

utility_name_eia

string

The name of the utility.

core_eia861__yearly_dynamic_pricing#

Noticed any irregularities with this table? File a report.

Annual time series of enrollment in dynamic pricing programs by state, sector, and balancing authority.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA Form 861 – Annual Electric Power Industry Report

Primary key:

This table has no primary key.

Additional Details#

Respondents check if one or more customers are enrolled in time-of-use pricing, real time pricing, variable peak pricing, critical peak pricing, and critical peak rebates.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_dynamic_pricing.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_dynamic_pricing.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__yearly_dynamic_pricing.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_dynamic_pricing.parquet"
)

Columns#

Field Name

Type

Description

balancing_authority_code_eia

string

EIA short code identifying a balancing authority. May include Canadian and Mexican BAs.

critical_peak_pricing

boolean

Whether customers are participating in critical peak pricing, a program in which rate and/or price structure is designed to encourage reduced consumption during periods of high wholesale market prices or system contingencies, by imposing a pre-specified high rate or price for a limited number of days or hours.

critical_peak_rebate

boolean

Whether customers are participating in critical peak rebates, a program in which rate and/or price structure is designed to encourage reduced consumption during periods of high wholesale market prices or system contingencies, by providing a rebate to the customer on a limited number of days and for a limited number of hours, at the request of the energy provider.

customer_class

string

High level categorization of customer type (e.g., commercial, residential).

customers

number

Number of customers.

data_maturity

string

Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.

real_time_pricing

boolean

Whether the respondent has customers participating in a real time pricing (RTP) program. RTP is a program of rate and price structure in which the retail price for electricity typically fluctuates hourly or more often, to reflect changes in the wholesale price of electricity on either a day- ahead or hour-ahead basis.

report_date

date

Date reported.

short_form

boolean

Whether the reported information comes from the short form. In the case of form EIA 861, a shorter version of the form was created in 2012 to reduce respondent burden on smaller utilities and increase our processing efficiency.

state

string

Two letter US state abbreviation.

time_of_use_pricing

boolean

Whether the respondent has customers participating in a time-of-use pricing programs (TOU). TOU is a program in which customers pay different prices at different times of the day. On-peak prices are higher and off-peak prices are lower than a “standard” rate. Price schedule is fixed and predefined, based on season, day of week, and time of day.

utility_id_eia

integer

The EIA Utility Identification number.

utility_name_eia

string

The name of the utility.

variable_peak_pricing

boolean

Whether the respondent has customers participating in a variable peak pricing program (VPP). VPP is a program in which a form of TOU pricing allows customers to purchase their generation supply at prices set on a daily basis with varying on-peak and constant off-peak rates. Under the VPP program, the on-peak price for each weekday becomes available the previous day (typically late afternoon) and the customer is billed for actual consumption during the billing cycle at these prices.

core_eia861__yearly_energy_efficiency#

Noticed any irregularities with this table? File a report.

Annual time series of incremental energy savings, peak demand savings, weighted average life cycle, and associated costs for the reporting year and life cycle of energy efficiency programs.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA Form 861 – Annual Electric Power Industry Report

Primary key:

This table has no primary key.

Additional Details#

The EIA861 demand-side management (DSM) table (split into three normalized tables in PUDL) contain data through 2012. The form changed in 2013 and split the contents of the DSM table into energy efficiency and demand response tables. Though similar, the information collected before and after 2012 are not comparable enough to combine into a singular, continuous table. We were discouraged from doing so after contacting a representative from EIA.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_energy_efficiency.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_energy_efficiency.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__yearly_energy_efficiency.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_energy_efficiency.parquet"
)

Columns#

Field Name

Type

Description

balancing_authority_code_eia

string

EIA short code identifying a balancing authority. May include Canadian and Mexican BAs.

customer_class

string

High level categorization of customer type (e.g., commercial, residential).

customer_incentives_incremental_cost

number

The cost of customer incentives resulting from new participants in existing energy efficiency programs and all participants in new energy efficiency programs. Customer incentives are the total financial value provided to a customer for program participation, whether, for example, cash payment, or lowered tariff rates relative to non-participants, in-kind services (e.g. design work), or other benefits directly provided to the customer for their program participation.

customer_incentives_incremental_life_cycle_cost

number

All anticipated costs of the customer incentives including reporting year incremental costs and all future costs. Customer incentives are the total financial value provided to a customer for program participation, whether, for example, cash payment, or lowered tariff rates relative to non-participants, in-kind services (e.g. design work), or other benefits directly provided to the customer for their program participation.

customer_other_costs_incremental_life_cycle_cost

number

All anticipated costs other than customer incentives. Includes reporting year incremental costs and all future costs.

data_maturity

string

Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.

incremental_energy_savings_mwh

number

Energy savings in the given report year resulting from new participants in existing demand response programs and all participants in new demand response programs.

incremental_life_cycle_energy_savings_mwh

number

The estimated total changes in energy use for incremental programs and participants over the life of the programs. DSM programs have a useful life, and the net effects of these programs will diminish over time. Considers the useful life of energy efficiency technology by accounting for building demolition, equipment degradation, and program attrition.

incremental_life_cycle_peak_reduction_mw

number

The estimated total changes in peak load for incremental programs and participants over the life of the programs. DSM programs have a useful life, and the net effects of these programs will diminish over time. Considers the useful life of energy efficiency technology by accounting for building demolition, equipment degradation, and program attrition.

incremental_peak_reduction_mw

number

The peak reduction incurred in a given reporting year by new participants in existing energy efficiency programs and all participants in new energy efficiency programs.

other_costs_incremental_cost

number

Costs resulting from new participants in existing energy efficiency programs and all participants in new energy efficiency programs that aren’t directly associated with customer incentives.

report_date

date

Date reported.

short_form

boolean

Whether the reported information comes from the short form. In the case of form EIA 861, a shorter version of the form was created in 2012 to reduce respondent burden on smaller utilities and increase our processing efficiency.

state

string

Two letter US state abbreviation.

utility_id_eia

integer

The EIA Utility Identification number.

utility_name_eia

string

The name of the utility.

weighted_average_life_years

number

The weighted average life of the respondent’s portfolio of energy efficiency programs.

core_eia861__yearly_green_pricing#

Noticed any irregularities with this table? File a report.

Annual time series of green pricing program revenue, sales, and customer count by sector and state.

Most-recent data:

2012

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA Form 861 – Annual Electric Power Industry Report

Primary key:

This table has no primary key.

Usage Warnings#

  • The original data is no longer being collected or reported in this way.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_green_pricing.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_green_pricing.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__yearly_green_pricing.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_green_pricing.parquet"
)

Columns#

Field Name

Type

Description

customer_class

string

High level categorization of customer type (e.g., commercial, residential).

customers

number

Number of customers.

data_maturity

string

Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.

green_pricing_revenue

number

The money derived from premium green pricing rate of the respondent’sprogram.

rec_revenue

number

Amount of revenue collected from Renewable Energy Certificates (RECs).

rec_sales_mwh

number

Amount of sales collected from Renewable Energy Certificates (RECs).

report_date

date

Date reported.

sales_mwh

number

Quantity of electricity sold in MWh.

state

string

Two letter US state abbreviation.

utility_id_eia

integer

The EIA Utility Identification number.

utility_name_eia

string

The name of the utility.

core_eia861__yearly_mergers#

Noticed any irregularities with this table? File a report.

Annual time series of utility mergers and acquisitions.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA Form 861 – Annual Electric Power Industry Report

Primary key:

This table has no primary key.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_mergers.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_mergers.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__yearly_mergers.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_mergers.parquet"
)

Columns#

Field Name

Type

Description

data_maturity

string

Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.

entity_type

string

Entity type of principal owner.

merge_address

string

Address of new parent company.

merge_city

string

City of new parent company.

merge_company

string

Name of the company merged with or acquired.

merge_date

date

Date of merger or acquisition.

merge_state

string

Two letter US state abbreviations and three letter ISO-3166-1 country codes for international mines.

new_parent

string

Name of the new parent company post merger.

report_date

date

Date reported.

state

string

Two letter US state abbreviation.

utility_id_eia

integer

The EIA Utility Identification number.

utility_name_eia

string

The name of the utility.

zip_code

string

Five digit US Zip Code.

zip_code_4

string

Four digit US Zip Code suffix.

core_eia861__yearly_net_metering_customer_fuel_class#

Noticed any irregularities with this table? File a report.

Annual time series of net metering by customer and fuel class.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA Form 861 – Annual Electric Power Industry Report

Primary key:

This table has no primary key.

Usage Warnings#

  • Some years use a slightly different data definition.

Additional Details#

The amount of energy sold to back to the grid. From 2007 - 2009 the data are reported as a lump sum of total energy dispatched by sector. After 2009, the data are broken down by sector and technology type.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_net_metering_customer_fuel_class.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_net_metering_customer_fuel_class.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__yearly_net_metering_customer_fuel_class.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_net_metering_customer_fuel_class.parquet"
)

Columns#

Field Name

Type

Description

balancing_authority_code_eia

string

EIA short code identifying a balancing authority. May include Canadian and Mexican BAs.

capacity_mw

number

Total installed (nameplate) capacity, in megawatts.

customer_class

string

High level categorization of customer type (e.g., commercial, residential).

customers

number

Number of customers.

data_maturity

string

Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.

energy_capacity_mwh

number

The total amount of energy which the system can supply power before recharging is necessary, in megawatt-hours.

report_date

date

Date reported.

short_form

boolean

Whether the reported information comes from the short form. In the case of form EIA 861, a shorter version of the form was created in 2012 to reduce respondent burden on smaller utilities and increase our processing efficiency.

sold_to_utility_mwh

number

The amount of electric energy sold back to the utility through the net metering application.

state

string

Two letter US state abbreviation.

tech_class

string

Type of technology specific to EIA 861 distributed generation and net generation tables: [‘backup’, ‘chp_cogen’, ‘combustion_turbine’, ‘fuel_cell’, ‘hydro’, ‘internal_combustion’, ‘other’, ‘pv’, ‘steam’, ‘storage_pv’, ‘storage_nonpv’, ‘all_storage’, ‘total’, ‘virtual_pv’, ‘virtual_pv_under_1mw’, ‘virtual_pv_over_1mw’, ‘wind’].

utility_id_eia

integer

The EIA Utility Identification number.

utility_name_eia

string

The name of the utility.

core_eia861__yearly_net_metering_misc#

Noticed any irregularities with this table? File a report.

Annual time series of PV current flow type for net metered capacity.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA Form 861 – Annual Electric Power Industry Report

Primary key:

This table has no primary key.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_net_metering_misc.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_net_metering_misc.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__yearly_net_metering_misc.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_net_metering_misc.parquet"
)

Columns#

Field Name

Type

Description

balancing_authority_code_eia

string

EIA short code identifying a balancing authority. May include Canadian and Mexican BAs.

data_maturity

string

Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.

pv_current_flow_type

string

Current flow type for photovoltaics: AC or DC

report_date

date

Date reported.

state

string

Two letter US state abbreviation.

utility_id_eia

integer

The EIA Utility Identification number.

core_eia861__yearly_non_net_metering_customer_fuel_class#

Noticed any irregularities with this table? File a report.

Annual time series of non-net metered distributed generation by sector and technology type.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA Form 861 – Annual Electric Power Industry Report

Primary key:

This table has no primary key.

Additional Details#

This table’s data starts in 2016. For 2015 and earlier, data similar to that found in this table can be found in core_eia861__yearly_distributed_generation_tech.

The raw EIA861 distributed generation (DG) table (split into three normalized tables in PUDL) was renamed in 2016 to Non-Net Metering to prevent double counting. The data in the Non-Net Metering table (2016+) are split by sector, contain fuel cell information, and convert capacity reported in DC units to AC units.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_non_net_metering_customer_fuel_class.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_non_net_metering_customer_fuel_class.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__yearly_non_net_metering_customer_fuel_class.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_non_net_metering_customer_fuel_class.parquet"
)

Columns#

Field Name

Type

Description

balancing_authority_code_eia

string

EIA short code identifying a balancing authority. May include Canadian and Mexican BAs.

capacity_mw

number

Total installed (nameplate) capacity, in megawatts.

customer_class

string

High level categorization of customer type (e.g., commercial, residential).

data_maturity

string

Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.

energy_capacity_mwh

number

The total amount of energy which the system can supply power before recharging is necessary, in megawatt-hours.

report_date

date

Date reported.

state

string

Two letter US state abbreviation.

tech_class

string

Type of technology specific to EIA 861 distributed generation and net generation tables: [‘backup’, ‘chp_cogen’, ‘combustion_turbine’, ‘fuel_cell’, ‘hydro’, ‘internal_combustion’, ‘other’, ‘pv’, ‘steam’, ‘storage_pv’, ‘storage_nonpv’, ‘all_storage’, ‘total’, ‘virtual_pv’, ‘virtual_pv_under_1mw’, ‘virtual_pv_over_1mw’, ‘wind’].

utility_id_eia

integer

The EIA Utility Identification number.

utility_name_eia

string

The name of the utility.

core_eia861__yearly_non_net_metering_misc#

Noticed any irregularities with this table? File a report.

Annual time series of non-net metered distributed generation generators, pv current flow type, backup capacity and utility owned capacity.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA Form 861 – Annual Electric Power Industry Report

Primary key:

This table has no primary key.

Additional Details#

This table’s data starts in 2016. For 2015 and earlier, data similar to that found in this table can be found in core_eia861__yearly_distributed_generation_misc.

The raw EIA861 distributed generation (DG) table (split into three normalized tables in PUDL) was renamed in 2016 to Non-Net Metering to prevent double counting. The data in the Non-Net Metering table (2016+) are split by sector, contain fuel cell information, and convert capacity reported in DC units to AC units.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_non_net_metering_misc.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_non_net_metering_misc.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__yearly_non_net_metering_misc.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_non_net_metering_misc.parquet"
)

Columns#

Field Name

Type

Description

backup_capacity_mw

number

The total nameplate capacity of generators that are used only for emergency backup service.

balancing_authority_code_eia

string

EIA short code identifying a balancing authority. May include Canadian and Mexican BAs.

data_maturity

string

Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.

generators_number

integer

Total number of generators

pv_current_flow_type

string

Current flow type for photovoltaics: AC or DC

report_date

date

Date reported.

state

string

Two letter US state abbreviation.

utility_id_eia

integer

The EIA Utility Identification number.

utility_owned_capacity_mw

number

Total non-net-metered capacity owned by the respondent.

core_eia861__yearly_operational_data_misc#

Noticed any irregularities with this table? File a report.

Annual time series of megawatt hours (MWH) for the sources and disposition of a utility’s electricity.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA Form 861 – Annual Electric Power Industry Report (Schedule 2B)

Primary key:

utility_id_eia, nerc_region, report_date

Usage Warnings#

  • Some values have been redacted.

  • Contains rows where missing values were imputed.

Additional Details#

Respondents are required to report this information to the EIA, but are not required to disclose utility-level data to the public. When a respondent chooses to keep its utility-level data proprietary, it files using EIA utility id 88888. For more details, see EIA-861 Notable Irregularities.

Rows where data_observed is False were labeled as imputed in the raw EIA data. The EIA documentation does not specify what imputation applies to operational data, only net metering and non net-metering distributed data. Less than 1% of rows are labeled as imputed, all of which occur in 2004 or 2005. Imputed rows make up no more than 25% of the rows for any utility.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_operational_data_misc.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_operational_data_misc.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__yearly_operational_data_misc.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_operational_data_misc.parquet"
)

Columns#

Field Name

Type

Description

consumed_by_facility_mwh

number

The amount of electricity used by the facility.

consumed_by_respondent_without_charge_mwh

number

The amount of electricity used by the electric utility in its electric and other departments without charge.

data_maturity

string

Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.

data_observed

boolean

Is the value observed (True) or imputed (False).

entity_type

string

Entity type of principal owner.

exchange_energy_delivered_mwh

number

The amount of exchange energy delivered. Does not include power delivered as part of a tolling arrangement.

exchange_energy_received_mwh

number

The amount of exchange energy received. Does not include power received through tolling arrangements.

furnished_without_charge_mwh

number

The amount of electricity furnished by the electric utility without charge, such as to a municipality under a franchise agreement or for street and highway lighting.

nerc_region

string

NERC region in which the plant is located

net_generation_mwh

number

Net electricity generation for the specified period in megawatt-hours (MWh).

net_power_exchanged_mwh

number

The net amount of energy exchanged. Net exchange is the difference between the amount of exchange received and the amount of exchange delivered. This entry should not include wholesale energy purchased from or sold to regulated companies or unregulated companies for other systems.

net_wheeled_power_mwh

number

The difference between the amount of energy entering the respondent’s system (wheeled received) for transmission through the respondent’s system and the amount of energy leaving the respondent’s system (wheeled delivered). Wheeled net represents the energy losses on the respondent’s system associated with the wheeling of energy for other systems.

report_date

date

Date reported.

retail_sales_mwh

number

MWh of sales to end-use customers in areas where the customer has been given the legal right to select a power supplier other than the traditional, vertically integrated electric utility.

sales_for_resale_mwh

number

The amount of electricity sold for resale purposes. This entry should include sales for resale to power marketers (reported separately in previous years), full and partial requirements customers, firm power customers and nonfirm customers.

short_form

boolean

Whether the reported information comes from the short form. In the case of form EIA 861, a shorter version of the form was created in 2012 to reduce respondent burden on smaller utilities and increase our processing efficiency.

state

string

Two letter US state abbreviation.

summer_peak_demand_mw

number

The maximum hourly summer load (for the months of June through September) based on net energy for the system during the reporting year. Net energy for the system is the sum of energy an electric utility needs to satisfy their service area and includes full and partial wholesale requirements customers, and the losses experienced in delivery. The maximum hourly load is determined by the interval in which the 60-minute integrated demand is the greatest.

total_disposition_mwh

number

Sum of all disposition of electricity listed. Includes sales to ultimate customers, sales for resale, energy furnished without charge, energy consumed by respondent without charge and total energy losses.

total_energy_losses_mwh

number

The total amount of electricity lost from transmission, distribution, and/or unaccounted for. Should be expressed as a positive number.

total_sources_mwh

number

Sum of all sources of electricity listed. Includes net generation, purchases from electricity suppliers, net exchanges (received - delivered), net wheeled (received - delivered), transmission by others, and losses.

transmission_by_other_losses_mwh

number

The amount of energy losses associated with the wheeling of electricity provided to the respondent’s system by other utilities. Transmission by others, losses should always be a negative value.

utility_id_eia

integer

The EIA Utility Identification number.

utility_name_eia

string

The name of the utility.

wheeled_power_delivered_mwh

number

The total amount of energy leaving the respondent’s system that was transmitted through the respondent’s system for delivery to other systems. If wheeling delivered is not precisely known, the value is an estimate based on the respondent’s system’s known percentage of losses for wheeling transactions.

wheeled_power_received_mwh

number

The total amount of energy entering the respondent’s system from other systems for transmission through the respondent’s system (wheeling) for delivery to other systems. Does not include energy purchased or exchanged for consumption within the respondent’s system, which was wheeled to the respondent by others.

wholesale_power_purchases_mwh

number

Purchases from electricity suppliers.

winter_peak_demand_mw

number

The maximum hourly winter load (for the months of January through March) based on net energy for the system during the reporting year. Net energy for the system is the sum of energy an electric utility needs to satisfy their service area and includes full and partial wholesale requirements customers, and the losses experienced in delivery. The maximum hourly load is determined by the interval in which the 60-minute integrated demand is the greatest.

core_eia861__yearly_operational_data_revenue#

Noticed any irregularities with this table? File a report.

Annual time series of utility revenue by type of electric operating revenue.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA Form 861 – Annual Electric Power Industry Report (Schedule 2C)

Primary key:

utility_id_eia, nerc_region, report_date, revenue_class

Usage Warnings#

  • Some values have been redacted.

Additional Details#

A utility’s revenue by type of electric operating revenue. Includes electric operating revenue from sales to ultimate customers, revenue from unbundled (delivery) customers, revenue from sales for resale, electric credits/other adjustments, revenue from transmission, other electric operating revenue, and total electric operating revenue.

Respondents are required to report this information to the EIA, but are not required to disclose utility-level data to the public. When a respondent chooses to keep its utility-level data proprietary, it files using EIA utility id 88888. For more details, see EIA-861 Notable Irregularities.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_operational_data_revenue.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_operational_data_revenue.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__yearly_operational_data_revenue.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_operational_data_revenue.parquet"
)

Columns#

Field Name

Type

Description

data_maturity

string

Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.

nerc_region

string

NERC region in which the plant is located

report_date

date

Date reported.

revenue

number

Amount of revenue.

revenue_class

string

Source of revenue (e.g., retail sales, transmission).

state

string

Two letter US state abbreviation.

utility_id_eia

integer

The EIA Utility Identification number.

core_eia861__yearly_reliability#

Noticed any irregularities with this table? File a report.

Annual time series of electricity system reliability and outage impacts.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA Form 861 – Annual Electric Power Industry Report (Schedules 3B and 3C)

Primary key:

utility_id_eia, state, report_date, standard

Note: standard is included because while respondents are asked to only fill out one of parts B or C, sometimes they fill out both.

Additional Details#

Contains information on non-momentary electrical interruptions. Includes the system average interruption duration index (SAIDI), system average interruption frequency index (SAIFI), and customer average interruption duration index (CAIDI) aka SAIDI/SAIFI with and without major event days and loss of service. Includes the standard (IEEE/other) and other relevant information.

SAIDI is measured in minutes, and represents the sum over the year of all customers interrupted times the number of minutes they experienced an interruption, divided by total number of customers.

SAIFI is measured in customers, and represents the sum over the year of the total number of customers that experienced an interruption, divided by the total number of customers.

The IEEE standards which can be used to calculate SAIDI and SAIFI include IEEE 1366-2003 and IEEE 1366-2012. These standards define momentary interruptions as having a duration of five minutes or less. If one of these IEEE standards is used, respondents fill out part B and are required to specify:

  • SAIDI major event days minus loss of supply

  • SAIFI major event days minus loss of supply

SAIDI and SAIFI major event days minus loss of supply excludes events where the reliability event resulted from an event on the distribution system, not from the high-voltage system. The voltage that distinguishes the distribution system from the supply system is given in highest_distribution_voltage_kv.

If a method other than these IEEE standards is used for calculating SAIDI and SAIFI indexes, respondents fill out part C and are required to specify:

  • whether inactive accounts are included

  • how they define a momentary interruption (less than 1 minute, less than or equal to 5 minutes, or other)

In this table, column standard is “ieee_standard” for respondents who have filled out part B, and “other_standard” for respondents who have filled out part C.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_reliability.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_reliability.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__yearly_reliability.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_reliability.parquet"
)

Columns#

Field Name

Type

Description

caidi_w_major_event_days_minus_loss_of_service_minutes

number

Average number of minutes per interruption (SAIDI/SAIFI) including major event days and excluding reliability events caused by a loss of supply.

caidi_w_major_event_days_minutes

number

Average number of minutes per interruption (SAIDI/SAIFI) including major event days.

caidi_wo_major_event_days_minutes

number

Average number of minutes per interruption (SAIDI/SAIFI) excluding major event days.

customers

number

Number of customers.

data_maturity

string

Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.

entity_type

string

Entity type of principal owner.

highest_distribution_voltage_kv

number

The highest voltage that’s part of the distribution system.

inactive_accounts_included

boolean

Whether the respondent includes inactive accounts in its definition of customers used to determine SAIDI and SAIFI.

momentary_interruption_definition

string

How the respondent defines momentary service interruptions: less than 1 min, equal to or less than 5 min, or some other way.

outages_recorded_automatically

boolean

Whether the information about customer outages is recorded automatically.

report_date

date

Date reported.

saidi_w_major_event_days_minus_loss_of_service_minutes

number

Cumulative duration (minutes) of interruption for the average customer during the report year including major event days and excluding reliability events caused by a loss of supply.

saidi_w_major_event_days_minutes

number

Cumulative duration (minutes) of interruption for the average customer during the report year including major event days.

saidi_wo_major_event_days_minutes

number

Cumulative duration (minutes) of interruption for the average customer during the report year excluding major event days.

saifi_w_major_event_days_customers

number

Average number of times a customer experienced a sustained interruption (over 5 minutes) during the report year including major event days.

saifi_w_major_event_days_minus_loss_of_service_customers

number

Average number of times a customer experienced a sustained interruption (over 5 minutes) during the report year including major event days and excluding reliability events caused by a loss of supply.

saifi_wo_major_event_days_customers

number

Average number of times a customer experienced a sustained interruption (over 5 minutes) during the report year excluding major event days.

short_form

boolean

Whether the reported information comes from the short form. In the case of form EIA 861, a shorter version of the form was created in 2012 to reduce respondent burden on smaller utilities and increase our processing efficiency.

standard

string

Whether the respondent calculates SAIDI/SAIFI, and major event days according to the IEEE or an Other standard.

state

string

Two letter US state abbreviation.

utility_id_eia

integer

The EIA Utility Identification number.

utility_name_eia

string

The name of the utility.

core_eia861__yearly_sales#

Noticed any irregularities with this table? File a report.

Annual time series of electricity sales to ultimate customers by utility, balancing authority, state, and customer class.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA Form 861 – Annual Electric Power Industry Report

Primary key:

utility_id_eia, state, report_date, balancing_authority_code_eia, customer_class, business_model, service_type

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_sales.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_sales.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__yearly_sales.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_sales.parquet"
)

Columns#

Field Name

Type

Description

balancing_authority_code_eia

string

EIA short code identifying a balancing authority. May include Canadian and Mexican BAs.

business_model

string

Business model.

customer_class

string

High level categorization of customer type (e.g., commercial, residential).

customers

number

Number of customers.

data_maturity

string

Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.

data_observed

boolean

Is the value observed (True) or imputed (False).

entity_type

string

Entity type of principal owner.

report_date

date

Date reported.

sales_mwh

number

Quantity of electricity sold in MWh.

sales_revenue

number

Revenue from electricity sold.

service_type

string

The type of service the respondent provides to a given customer class.Bundled: both energy and delivery; energy: just the energy consumed; delivery: just the billing and energy delivery services.

short_form

boolean

Whether the reported information comes from the short form. In the case of form EIA 861, a shorter version of the form was created in 2012 to reduce respondent burden on smaller utilities and increase our processing efficiency.

state

string

Two letter US state abbreviation.

utility_id_eia

integer

The EIA Utility Identification number.

utility_name_eia

string

The name of the utility.

core_eia861__yearly_service_territory#

Noticed any irregularities with this table? File a report.

Annual time series of counties in utility service territories.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA Form 861 – Annual Electric Power Industry Report

Primary key:

report_date, utility_id_eia, county_id_fips

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_service_territory.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_service_territory.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__yearly_service_territory.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_service_territory.parquet"
)

Columns#

Field Name

Type

Description

county

string

County name.

county_id_fips

string

County ID from the Federal Information Processing Standard Publication 6-4.

data_maturity

string

Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.

report_date

date

Date reported.

short_form

boolean

Whether the reported information comes from the short form. In the case of form EIA 861, a shorter version of the form was created in 2012 to reduce respondent burden on smaller utilities and increase our processing efficiency.

state

string

Two letter US state abbreviation.

state_id_fips

string

Two digit state FIPS code.

utility_id_eia

integer

The EIA Utility Identification number.

utility_name_eia

string

The name of the utility.

core_eia861__yearly_short_form#

Noticed any irregularities with this table? File a report.

Annual time series of data from the short form (EIA-861S).

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA Form 861 – Annual Electric Power Industry Report

Primary key:

utility_id_eia, state, report_date, balancing_authority_code_eia

Usage Warnings#

  • Some years are missing from the data record.

Additional Details#

The data started being reported in 2012. However, the 2019 data is not available. They are expected to submit the completed Form EIA-861S to EIA by April 30th, following the end of the prior calendar year. Utilities report on Form EIA-861S if they:

  • Report less than 200,000 megawatthours on the last previous Form EIA-861.

  • Provide only bundled service (generation and distribution).

  • Are not needed to ensure acceptable quality of statistical estimates.

  • Are not part of the aggregate TVA or WPPI.

  • Do not report on Form EIA-861M.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_short_form.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_short_form.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__yearly_short_form.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_short_form.parquet"
)

Columns#

Field Name

Type

Description

balancing_authority_code_eia

string

EIA short code identifying a balancing authority. May include Canadian and Mexican BAs.

customers

number

Number of customers.

data_maturity

string

Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.

entity_type

string

Entity type of principal owner.

has_demand_side_management

boolean

Whether there were strategies or measures used to control electricity demand by customers

has_green_pricing

boolean

Whether a green pricing program was associated with this utility during the reporting year.

has_net_metering

boolean

Whether the plant has a net metering agreement in effect during the reporting year. (Only displayed for facilities that report the sun or wind as an energy source). This field was only reported up until 2015

has_time_responsive_programs

boolean

Whether the respondent operates any time-based rate programs (e.g., real-time pricing, critical peak pricing, variable peak pricing and time-of-use rates administered through a tariff).

report_date

date

Date reported.

sales_mwh

number

Quantity of electricity sold in MWh.

sales_revenue

number

Revenue from electricity sold.

state

string

Two letter US state abbreviation.

utility_id_eia

integer

The EIA Utility Identification number.

utility_name_eia

string

The name of the utility.

core_eia861__yearly_utility_data_misc#

Noticed any irregularities with this table? File a report.

Annual time series of utility business activities.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA Form 861 – Annual Electric Power Industry Report

Primary key:

This table has no primary key. The primary key would have been: utility_id_eia, state, report_date, and nerc_region, but there are nulls in the state column across several years of reporting.

Additional Details#

This includes whether they operate alternative fuel vehicles, whether they provide transmission, distribution, or generation services (bundled or unbundled), and whether they engage in wholesale and/or retail markets.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_utility_data_misc.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_utility_data_misc.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__yearly_utility_data_misc.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_utility_data_misc.parquet"
)

Columns#

Field Name

Type

Description

alternative_fuel_vehicle_2_activity

boolean

Whether the utility plants to operate alternative-fueled vehicles this coming year.

alternative_fuel_vehicle_activity

boolean

Whether the utility operates alternative-fueled vehicles during the year.

bundled_activity

boolean

Whether a utility engaged in combined utility services (electricity plus other services such as gas, water, etc. in addition to electric services) during the year.

buying_distribution_activity

boolean

Whether a utility bought any distribution on other electrical systems during the year.

buying_transmission_activity

boolean

Whether a utility bought any transmission services on other electrical systems during the year.

data_maturity

string

Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.

distribution_activity

boolean

Whether a utility engaged in any distribution using owned/leased electrical wires during the year.

entity_type

string

Entity type of principal owner.

generation_activity

boolean

Whether a utility utilized generation from company owned plant during the year.

nerc_region

string

NERC region in which the plant is located

operates_generating_plant

boolean

Whether the respondent operated at least one generating plant during the reporting period.

report_date

date

Date reported.

retail_marketing_activity

boolean

Whether a utility engaged in retail power marketing during the year.

short_form

boolean

Whether the reported information comes from the short form. In the case of form EIA 861, a shorter version of the form was created in 2012 to reduce respondent burden on smaller utilities and increase our processing efficiency.

state

string

Two letter US state abbreviation.

transmission_activity

boolean

Whether a utility engaged in any transmission activities during the year.

utility_id_eia

integer

The EIA Utility Identification number.

utility_name_eia

string

The name of the utility.

wholesale_marketing_activity

boolean

Whether a utility engages in wholesale power marketing during the year.

core_eia861__yearly_utility_data_nerc#

Noticed any irregularities with this table? File a report.

Annual time series of the NERC regions that utilities operate in.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA Form 861 – Annual Electric Power Industry Report

Primary key:

This table has no primary key. The primary key would have been: utility_id_eia, state, report_date, nerc_region, and nerc_regions_of_operation, but there are nulls in the state column across several years of reporting.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_utility_data_nerc.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_utility_data_nerc.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__yearly_utility_data_nerc.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_utility_data_nerc.parquet"
)

Columns#

Field Name

Type

Description

data_maturity

string

Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.

nerc_region

string

NERC region in which the plant is located

nerc_regions_of_operation

string

All the regional entities within the North American Electric Reliability Corporation (NERC) in which the respodent conducts operations.

report_date

date

Date reported.

state

string

Two letter US state abbreviation.

utility_id_eia

integer

The EIA Utility Identification number.

core_eia861__yearly_utility_data_rto#

Noticed any irregularities with this table? File a report.

Annual time series of the RTOs that utilities operate in.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA Form 861 – Annual Electric Power Industry Report

Primary key:

This table has no primary key. The primary key would have been: utility_id_eia, state, report_date, nerc_region, and rtos_of_operation, but there are nulls in the state column across several years of reporting.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_utility_data_rto.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_utility_data_rto.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia861__yearly_utility_data_rto.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia861__yearly_utility_data_rto.parquet"
)

Columns#

Field Name

Type

Description

data_maturity

string

Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.

nerc_region

string

NERC region in which the plant is located

report_date

date

Date reported.

rtos_of_operation

string

The ISOs/RTOs, in which the respondent conducts operations.

state

string

Two letter US state abbreviation.

utility_id_eia

integer

The EIA Utility Identification number.

core_eia923__entity_coalmine#

Noticed any irregularities with this table? File a report.

Entity table containing static information about coal mines reporting deliveries in the Fuel Receipts and Costs table.

Most-recent data:

2025

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA Form 923 – Power Plant Operations Report (Schedule 2, Part C)

Primary key:

mine_id_pudl

Usage Warnings#

  • Mine ID’s are not as well defined and consistent as other EIA ID’s.

Additional Details#

This table is produced during the transformation of fuel delivery data, in order to produce a better normalized database. The same coalmines report many individual deliveries, and repeating their attributes many times in the fuel receipts and costs table is duplicative. Unfortunately the coalmine attributes do not generally use a controlled vocabulary or well defined IDs and so in practice there are many distinct records in this table that correspond to the same mines in reality.

We have not yet taken the time to rigorously clean this data, but it could be linked with both Mining Safety and Health Administration (MSHA) and USGS data to provide more insight into where coal is coming from, and what the employment and geological context is for those supplies.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia923__entity_coalmine.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia923__entity_coalmine.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia923__entity_coalmine.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia923__entity_coalmine.parquet"
)

Columns#

Field Name

Type

Description

county_id_fips

string

County ID from the Federal Information Processing Standard Publication 6-4.

data_maturity

string

Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.

mine_id_msha

integer

MSHA issued mine identifier.

mine_id_pudl

integer

Dynamically assigned PUDL mine identifier.

mine_name

string

Coal mine name.

mine_type_code

string

Type of coal mine.

state

string

Two letter US state abbreviation.

core_eia923__fuel_receipts_costs#

Noticed any irregularities with this table? File a report.

Individual fuel deliveries to power plants, organized by fuel type and supplier.

Most-recent data:

2025

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA Form 923 – Power Plant Operations Report (Schedule 2 - Part A)

Primary key:

This table has no primary key. Each record describes an individual fuel delivery. There can be multiple deliveries of the same type of fuel from the same supplier to the same plant in a single month, so the table has no natural primary key.

Usage Warnings#

  • Date column arbitrarily uses the first of the month.

  • Some values have been redacted.

  • Time of fuel deliveries is not necessarily connected with time of fuel consumption.

Additional Details#

There can be a significant delay between the receipt of fuel and its consumption, so using this table to infer monthly attributes associated with power generation may not be entirely accurate. However, this is the most granular data we have describing fuel costs, and we use it in calculating the marginal cost of electricity for individual generation units.

Under some circumstances utilities are allowed to treat the price of fuel as proprietary business data, meaning it is redacted from the publicly available spreadsheets. It’s still reported to EIA and influences the aggregated (state, region, annual, etc.) fuel prices they publish. From 2009-2021 about 1/3 of all prices are redacted. The missing data is not randomly distributed. Deregulated markets dominated by merchant generators (independent power producers) redact much more data, and natural gas is by far the most likely fuel to have its price redacted. This means, for instance, that the entire Northeastern US reports essentially no fine-grained data about its natural gas prices.

Additional data which we haven’t yet integrated is available in a similar format from 2002-2008 via the EIA-423, and going back as far as 1972 from the FERC-423.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia923__fuel_receipts_costs.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia923__fuel_receipts_costs.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia923__fuel_receipts_costs.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia923__fuel_receipts_costs.parquet"
)

Columns#

Field Name

Type

Description

ash_content_pct

number

Ash content percentage by weight to the nearest 0.1 percent.

chlorine_content_ppm

number

For coal only: the chlorine content in parts per million (ppm) to the nearest 0.001 ppm. If lab tests of the coal do not include the chlorine content, this field contains the amount specified in the contract with the supplier.

contract_expiration_date

date

Date contract expires.Format: MMYY.

contract_type_code

string

Purchase type under which receipts occurred in the reporting month. C: Contract, NC: New Contract, S: Spot Purchase, T: Tolling Agreement.

data_maturity

string

Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.

energy_source_code

string

A 2-3 letter code indicating the energy source (e.g. fuel type) associated with the record.

fuel_cost_per_mmbtu

number

Average fuel cost per MMBTU of heat content in nominal USD.

fuel_group_code

string

Fuel groups used in the Electric Power Monthly

fuel_mmbtu_per_unit

number

Heat content of the fuel in millions of Btus per physical unit.

fuel_received_units

number

Quantity of fuel received in tons, barrel, or Mcf.

fuel_type_code_pudl

string

Simplified fuel type code used in PUDL

mercury_content_ppm

number

Mercury content in parts per million (ppm) to the nearest 0.001 ppm.

mine_id_pudl

integer

Dynamically assigned PUDL mine identifier.

moisture_content_pct

number

For coal only: the moisture content of the fuel in terms of moisture percentage by weight. Reported to the nearest 0.01 percent.

natural_gas_delivery_contract_type_code

string

Contract type for natural gas delivery service:

natural_gas_transport_code

string

Contract type for natural gas transportation service.

plant_id_eia

integer

The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.

primary_transportation_mode_code

string

Transportation mode for the longest distance transported.

report_date

date

Date reported.

secondary_transportation_mode_code

string

Transportation mode for the second longest distance transported.

sulfur_content_pct

number

Sulfur content percentage by weight to the nearest 0.01 percent.

supplier_name

string

Company that sold the fuel to the plant or, in the case of Natural Gas, pipeline owner.

core_eia923__monthly_boiler_fuel#

Noticed any irregularities with this table? File a report.

Monthly time series of boiler fuel consumption and emissions.

Most-recent data:

2025

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA Form 923 – Power Plant Operations Report (Schedule 3)

Primary key:

plant_id_eia, boiler_id, energy_source_code, prime_mover_code, report_date

Usage Warnings#

  • Date column arbitrarily uses the first of the month.

  • A small number of respondents only report annual fuel consumption, and all of it is reported in December.

Additional Details#

Reports the quantity of each type of fuel consumed by each boiler on a monthly basis, as well as the sulfur and ash content of those fuels. Fuel quantity is reported in standard EIA fuel units (tons, barrels, Mcf). Heat content per unit of fuel is also reported, making this table useful for calculating the thermal efficiency (heat rate) of various generation units.

This table provides better coverage of the entire fleet of generators than the core_eia923__monthly_generation_fuel table, but the fuel consumption reported here is not directly associated with a generator. This complicates the heat rate calculation, since the associations between individual boilers and generators are incomplete and can be complex.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia923__monthly_boiler_fuel.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia923__monthly_boiler_fuel.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia923__monthly_boiler_fuel.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia923__monthly_boiler_fuel.parquet"
)

Columns#

Field Name

Type

Description

ash_content_pct

number

Ash content percentage by weight to the nearest 0.1 percent.

boiler_id

string

Alphanumeric boiler ID.

data_maturity

string

Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.

energy_source_code

string

A 2-3 letter code indicating the energy source (e.g. fuel type) associated with the record.

fuel_consumed_units

number

Consumption of the fuel type in physical unit. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production.

fuel_mmbtu_per_unit

number

Heat content of the fuel in millions of Btus per physical unit.

fuel_type_code_pudl

string

Simplified fuel type code used in PUDL

plant_id_eia

integer

The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.

prime_mover_code

string

Code for the type of prime mover (e.g. CT, CG)

report_date

date

Date reported.

sulfur_content_pct

number

Sulfur content percentage by weight to the nearest 0.01 percent.

core_eia923__monthly_energy_storage#

Noticed any irregularities with this table? File a report.

Monthly time series of energy consumed and discharged (“generated”) by energy storage units.

Most-recent data:

2025

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA Form 923 – Power Plant Operations Report (Schedule 3)

Primary key:

plant_id_eia, report_date, prime_mover_code, energy_source_code

Usage Warnings#

  • Date column arbitrarily uses the first of the month.

Additional Details#

The total MWh discharged from the energy storage unit during the reporting period is the gross generation and the difference between gross generation and consumption is the net generation.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia923__monthly_energy_storage.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia923__monthly_energy_storage.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia923__monthly_energy_storage.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia923__monthly_energy_storage.parquet"
)

Columns#

Field Name

Type

Description

data_maturity

string

Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.

energy_source_code

string

A 2-3 letter code indicating the energy source (e.g. fuel type) associated with the record.

fuel_consumed_for_electricity_units

number

Consumption for electric generation of the fuel type in physical unit.

fuel_consumed_units

number

Consumption of the fuel type in physical unit. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production.

fuel_units

string

Reported unit of measure for fuel.

gross_generation_mwh

number

Gross electricity generation for the specified period in megawatt-hours (MWh).

net_generation_mwh

number

Net electricity generation for the specified period in megawatt-hours (MWh).

plant_id_eia

integer

The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.

prime_mover_code

string

Code for the type of prime mover (e.g. CT, CG)

report_date

date

Date reported.

core_eia923__monthly_generation#

Noticed any irregularities with this table? File a report.

Monthly time series of net electricity generated by power plant generators.

Most-recent data:

2025

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA Form 923 – Power Plant Operations Report (Schedule 3)

Primary key:

plant_id_eia, generator_id, report_date

Usage Warnings#

  • Date column arbitrarily uses the first of the month.

  • Not all IDs are present.

  • Table has known low coverage - either geographic or temporal or otherwise.

  • A small number of respondents only report annual fuel consumption, and all of it is reported in December.

Additional Details#

This is the most granular information we have about how much electricity individual generators are producing, but only about half of all the generation reported in the core_eia923__monthly_generation_fuel appears in this table due to the different reporting requirements imposed on different types and sizes of generators.

Whenever possible, we use this generator-level net generation to estimate the heat rates of generation units and the marginal cost of electricity on a per-generator basis, but those calculations depend on plant-level fuel costs and sometimes uncertain or incomplete boiler-generator associations.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia923__monthly_generation.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia923__monthly_generation.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia923__monthly_generation.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia923__monthly_generation.parquet"
)

Columns#

Field Name

Type

Description

data_maturity

string

Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.

generator_id

string

Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string!

net_generation_mwh

number

Net electricity generation for the specified period in megawatt-hours (MWh).

plant_id_eia

integer

The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.

report_date

date

Date reported.

core_eia923__monthly_generation_fuel#

Noticed any irregularities with this table? File a report.

Monthly time series of generation and fuel consumption by fuel and prime mover.

Most-recent data:

2025

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA Form 923 – Power Plant Operations Report (Schedule 3)

Primary key:

plant_id_eia, report_date, prime_mover_code, energy_source_code

Usage Warnings#

  • Date column arbitrarily uses the first of the month.

  • A small number of respondents only report annual fuel consumption, and all of it is reported in December.

Additional Details#

This data can’t be easily linked to individual boilers, generators, and generation units, but it is provides the most complete coverage of fuel consumption and electricity generation for the entire generation fleet. We use the primary fuels and prime movers reported for each generator along with their capacities to attribute fuel consumption and generation when it isn’t directly reported in the core_eia923__monthly_generation and core_eia923__monthly_boiler_fuel tables in order to calculate capacity factors, heat rates, and the marginal cost of electricity.

The table makes a distinction between all fuel consumed and fuel consumed for electricity generation because some units are also combined heat and power (CHP) units, and also provide high temperature process heat at the expense of net electricity generation.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia923__monthly_generation_fuel.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia923__monthly_generation_fuel.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia923__monthly_generation_fuel.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia923__monthly_generation_fuel.parquet"
)

Columns#

Field Name

Type

Description

data_maturity

string

Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.

energy_source_code

string

A 2-3 letter code indicating the energy source (e.g. fuel type) associated with the record.

fuel_consumed_for_electricity_mmbtu

number

Total consumption of fuel to produce electricity, in physical unit, year to date.

fuel_consumed_for_electricity_units

number

Consumption for electric generation of the fuel type in physical unit.

fuel_consumed_mmbtu

number

Total consumption of fuel in physical unit, year to date. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production.

fuel_consumed_units

number

Consumption of the fuel type in physical unit. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production.

fuel_mmbtu_per_unit

number

Heat content of the fuel in millions of Btus per physical unit.

fuel_type_code_agg

string

A partial aggregation of the reported fuel type codes into larger categories used by EIA in, for example, the Annual Energy Review (AER) or Monthly Energy Review (MER). Two or three letter alphanumeric.

fuel_type_code_pudl

string

Simplified fuel type code used in PUDL

net_generation_mwh

number

Net electricity generation for the specified period in megawatt-hours (MWh).

plant_id_eia

integer

The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.

prime_mover_code

string

Code for the type of prime mover (e.g. CT, CG)

report_date

date

Date reported.

core_eia923__monthly_generation_fuel_nuclear#

Noticed any irregularities with this table? File a report.

Monthly time series of generation and fuel consumption of nuclear generation units by fuel and prime mover.

Most-recent data:

2025

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA Form 923 – Power Plant Operations Report (Schedule 3)

Primary key:

plant_id_eia, report_date, nuclear_unit_id, energy_source_code, prime_mover_code

Usage Warnings#

  • Date column arbitrarily uses the first of the month.

  • A small number of respondents only report annual fuel consumption, and all of it is reported in December.

Additional Details#

This data is originally reported alongside similar information for fossil fuel plants, but the nuclear data is reported by (nuclear) generation unit rather than fuel type and prime mover, and so has a different primary key.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia923__monthly_generation_fuel_nuclear.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia923__monthly_generation_fuel_nuclear.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia923__monthly_generation_fuel_nuclear.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia923__monthly_generation_fuel_nuclear.parquet"
)

Columns#

Field Name

Type

Description

data_maturity

string

Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.

energy_source_code

string

A 2-3 letter code indicating the energy source (e.g. fuel type) associated with the record.

fuel_consumed_for_electricity_mmbtu

number

Total consumption of fuel to produce electricity, in physical unit, year to date.

fuel_consumed_for_electricity_units

number

Consumption for electric generation of the fuel type in physical unit.

fuel_consumed_mmbtu

number

Total consumption of fuel in physical unit, year to date. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production.

fuel_consumed_units

number

Consumption of the fuel type in physical unit. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production.

fuel_mmbtu_per_unit

number

Heat content of the fuel in millions of Btus per physical unit.

fuel_type_code_agg

string

A partial aggregation of the reported fuel type codes into larger categories used by EIA in, for example, the Annual Energy Review (AER) or Monthly Energy Review (MER). Two or three letter alphanumeric.

fuel_type_code_pudl

string

Simplified fuel type code used in PUDL

net_generation_mwh

number

Net electricity generation for the specified period in megawatt-hours (MWh).

nuclear_unit_id

string

For nuclear plants only, the unit number .One digit numeric. Nuclear plants are the only type of plants for which data are shown explicitly at the generating unit level.

plant_id_eia

integer

The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.

prime_mover_code

string

Code for the type of prime mover (e.g. CT, CG)

report_date

date

Date reported.

core_eia930__hourly_interchange#

Noticed any irregularities with this table? File a report.

Hourly time series of balancing authority interchange.

Most-recent data:

2026

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA Form 930 – Hourly and Daily Balancing Authority Operations Report

Primary key:

datetime_utc, balancing_authority_code_eia, balancing_authority_code_adjacent_eia

Additional Details#

Interchange is the net metered tie line flow from one BA to another directly interconnected BA. Total net interchange is the net sum of all interchange occurring between a BA and its directly interconnected neighboring BAs. Negative interchange values indicate net inflows, and positive interchange values indicate net outflows.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia930__hourly_interchange.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia930__hourly_interchange.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia930__hourly_interchange.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia930__hourly_interchange.parquet"
)

Columns#

Field Name

Type

Description

balancing_authority_code_adjacent_eia

string

EIA short code for the other adjacent balancing authority, with which interchange is occurring. Includes Canadian and Mexican BAs.

balancing_authority_code_eia

string

EIA short code identifying a balancing authority. May include Canadian and Mexican BAs.

datetime_utc

datetime

Timestamp at the end of the hour for which the data is reported.

interchange_reported_mwh

number

Original reported energy interchange between adjacent balancing authorities.

core_eia930__hourly_net_generation_by_energy_source#

Noticed any irregularities with this table? File a report.

Hourly time series of balancing authority net generation by energy source.

Most-recent data:

2026

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA Form 930 – Hourly and Daily Balancing Authority Operations Report

Primary key:

datetime_utc, balancing_authority_code_eia, generation_energy_source

Usage Warnings#

  • Contains known calculation discrepancies.

Additional Details#

BAs report generation from dual-fuel (switching from one fuel to another) and multiple-fuel (using multiple fuels simultaneously) generators under the actual energy source used, if known, and under the generator’s primary energy source, if not known.

To maintain generator confidentiality, generation may sometimes be reported in the Other category if too few generators are reported for a particular energy source category.

In theory, the sum of net generation across all energy sources should equal the total net generation reported in the balancing authority operations table. In practice, there are many cases in which these values diverge significantly, which require further investigation.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia930__hourly_net_generation_by_energy_source.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia930__hourly_net_generation_by_energy_source.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia930__hourly_net_generation_by_energy_source.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia930__hourly_net_generation_by_energy_source.parquet"
)

Columns#

Field Name

Type

Description

balancing_authority_code_eia

string

EIA short code identifying a balancing authority. May include Canadian and Mexican BAs.

datetime_utc

datetime

Timestamp at the end of the hour for which the data is reported.

generation_energy_source

string

High level energy source used to produce electricity.

net_generation_adjusted_mwh

number

Reported net generation adjusted by EIA to reflect non-physical commercial transfers through pseudo-ties and dynamic scheduling.

net_generation_imputed_eia_mwh

number

Reported net generation with outlying values removed and missing values imputed by EIA.

net_generation_reported_mwh

number

Unaltered originally reported net generation for the specified period.

core_eia930__hourly_operations#

Noticed any irregularities with this table? File a report.

Hourly time series of balancing authority net generation, interchange, and demand.

Most-recent data:

2026

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA Form 930 – Hourly and Daily Balancing Authority Operations Report

Primary key:

datetime_utc, balancing_authority_code_eia

Additional Details#

Net generation represents the metered output of electric generating units in a BA’s electric system. This generation only includes generating units that are managed by a BA or whose operations are visible to a BA.

Generators on the distribution system—both large-scale resources and small-scale distributed resources, such as rooftop solar photovoltaic (PV) systems—are typically not included.

In some electricity publications, EIA reports generation from all utility-scale generating units in the United States. BAs only meter generating units that are from a subset of all utility-scale generating units. As a result, when hourly generation from the EIA-930 is aggregated to monthly or annual levels, the results will be lower than monthly and annual aggregations in other EIA electricity publications.

Interchange is the net metered tie line flow from one BA to another directly interconnected BA. This table includes the net sum of all interchange occurring between a BA and all of its directly interconnected neighboring BAs. For a detailed breakdown of interchange between each pair of adjacent BAs see core_eia930__hourly_interchange.

Negative interchange values indicate net inflows, and positive interchange values indicate net outflows.

Demand is a calculated value representing the amount of electricity load within a BA’s electric system. A BA derives its demand value by taking the total metered net electricity generation within its electric system and subtracting the total metered net electricity interchange occurring between the BA and its neighboring BAs.

Each BA produces a day-ahead electricity demand forecast for every hour of the next day. These forecasts help BAs plan for and coordinate the reliable operation of their electric system.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia930__hourly_operations.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia930__hourly_operations.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia930__hourly_operations.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia930__hourly_operations.parquet"
)

Columns#

Field Name

Type

Description

balancing_authority_code_eia

string

EIA short code identifying a balancing authority. May include Canadian and Mexican BAs.

datetime_utc

datetime

Timestamp at the end of the hour for which the data is reported.

demand_adjusted_mwh

number

Electricity demand adjusted by EIA to reflect non-physical commercial transfers through pseudo-ties and dynamic scheduling.

demand_forecast_mwh

number

Day ahead demand forecast.

demand_imputed_eia_mwh

number

Electricity demand calculated by subtracting BA interchange from net generation, with outliers and missing values imputed by EIA.

demand_reported_mwh

number

Originally reported electricity demand, calculated by taking the net generation within the BA and subtracting the interchange with adjacent BAs.

interchange_adjusted_mwh

number

Energy interchange between adjacent balancing authorities, adjusted by EIA to reflect non-physical commercial transfers through pseudo-ties and dynamic scheduling.

interchange_imputed_eia_mwh

number

Energy interchange between adjacent balancing authorities, with outliers and missing values imputed by EIA.

interchange_reported_mwh

number

Original reported energy interchange between adjacent balancing authorities.

net_generation_adjusted_mwh

number

Reported net generation adjusted by EIA to reflect non-physical commercial transfers through pseudo-ties and dynamic scheduling.

net_generation_imputed_eia_mwh

number

Reported net generation with outlying values removed and missing values imputed by EIA.

net_generation_reported_mwh

number

Unaltered originally reported net generation for the specified period.

core_eia930__hourly_subregion_demand#

Noticed any irregularities with this table? File a report.

Hourly time series of balancing authority subregion demand.

Most-recent data:

2026

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA Form 930 – Hourly and Daily Balancing Authority Operations Report

Primary key:

datetime_utc, balancing_authority_code_eia, balancing_authority_subregion_code_eia

Additional Details#

For select BAs, balancing authority subregion demand provides a further level of geographic disaggregation (for example, load zones, weather zones, operating company areas, local BAs, etc.) within a BA’s electric system.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia930__hourly_subregion_demand.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia930__hourly_subregion_demand.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia930__hourly_subregion_demand.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia930__hourly_subregion_demand.parquet"
)

Columns#

Field Name

Type

Description

balancing_authority_code_eia

string

EIA short code identifying a balancing authority. May include Canadian and Mexican BAs.

balancing_authority_subregion_code_eia

string

Code identifying subregions of larger balancing authorities.

datetime_utc

datetime

Timestamp at the end of the hour for which the data is reported.

demand_reported_mwh

number

Originally reported electricity demand for the balancing area subregion. Note that different BAs have different methods of calculating and allocating subregion demand.

core_eia__codes_averaging_periods#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for the averaging period specified by emissions statutes and regulations for the EIA 860.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA – Mix of multiple EIA Forms

Primary key:

code

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_averaging_periods.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_averaging_periods.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_averaging_periods.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_averaging_periods.parquet"
)

Columns#

Field Name

Type

Description

code

string

Originally reported short code.

description

string

Long human-readable description of the meaning of a code/label.

label

string

Longer human-readable code using snake_case

core_eia__codes_balancing_authorities#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for balancing authorities in EIA 860, EIA 923, and EIA 930.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA – Mix of multiple EIA Forms

Primary key:

code

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_balancing_authorities.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_balancing_authorities.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_balancing_authorities.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_balancing_authorities.parquet"
)

Columns#

Field Name

Type

Description

balancing_authority_region_code_eia

string

EIA balancing authority region code.

balancing_authority_region_name_eia

string

Human-readable name of the EIA balancing region.

balancing_authority_retirement_date

date

Date on which the balancing authority ceased independent operation.

code

string

Originally reported short code.

description

string

Long human-readable description of the meaning of a code/label.

interconnect_code_eia

string

EIA interconnect code.

is_generation_only

boolean

Indicates whether the balancing authority is generation-only, meaning it does not serve retail customers and thus reports only net generation and interchange, but not demand.

label

string

Longer human-readable code using snake_case

report_timezone

string

Timezone used by the reporting entity. For use in localizing UTC times.

core_eia__codes_balancing_authority_subregions#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for balancing authority subregions in EIA 930.

Most-recent data:

2026half1

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA – Mix of multiple EIA Forms

Primary key:

balancing_authority_code_eia, balancing_authority_subregion_code_eia

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_balancing_authority_subregions.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_balancing_authority_subregions.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_balancing_authority_subregions.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_balancing_authority_subregions.parquet"
)

Columns#

Field Name

Type

Description

balancing_authority_code_eia

string

EIA short code identifying a balancing authority. May include Canadian and Mexican BAs.

balancing_authority_subregion_code_eia

string

Code identifying subregions of larger balancing authorities.

balancing_authority_subregion_name_eia

string

Name of the balancing authority subregion.

core_eia__codes_boiler_generator_assn_types#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for boiler-generator associations in the EIA 860.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA – Mix of multiple EIA Forms

Primary key:

code

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_boiler_generator_assn_types.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_boiler_generator_assn_types.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_boiler_generator_assn_types.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_boiler_generator_assn_types.parquet"
)

Columns#

Field Name

Type

Description

code

string

Originally reported short code.

description

string

Long human-readable description of the meaning of a code/label.

label

string

Longer human-readable code using snake_case

core_eia__codes_boiler_status#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for boiler status in the EIA 860.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA – Mix of multiple EIA Forms

Primary key:

code

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_boiler_status.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_boiler_status.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_boiler_status.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_boiler_status.parquet"
)

Columns#

Field Name

Type

Description

code

string

Originally reported short code.

description

string

Long human-readable description of the meaning of a code/label.

label

string

Longer human-readable code using snake_case

core_eia__codes_boiler_types#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for boiler regulatory types in the EIA 860.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA – Mix of multiple EIA Forms

Primary key:

code

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_boiler_types.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_boiler_types.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_boiler_types.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_boiler_types.parquet"
)

Columns#

Field Name

Type

Description

code

string

Originally reported short code.

description

string

Long human-readable description of the meaning of a code/label.

label

string

Longer human-readable code using snake_case

core_eia__codes_coalmine_types#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for coalmines reported as fuel sources in the EIA-923.

Most-recent data:

2025

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA – Mix of multiple EIA Forms

Primary key:

code

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_coalmine_types.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_coalmine_types.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_coalmine_types.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_coalmine_types.parquet"
)

Columns#

Field Name

Type

Description

code

string

Originally reported short code.

description

string

Long human-readable description of the meaning of a code/label.

label

string

Longer human-readable code using snake_case

core_eia__codes_contract_types#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for fuel supply contracts reported in EIA-923.

Most-recent data:

2025

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA – Mix of multiple EIA Forms

Primary key:

code

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_contract_types.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_contract_types.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_contract_types.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_contract_types.parquet"
)

Columns#

Field Name

Type

Description

code

string

Originally reported short code.

description

string

Long human-readable description of the meaning of a code/label.

label

string

Longer human-readable code using snake_case

core_eia__codes_cooling_system_types#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for cooling systems.

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA – Mix of multiple EIA Forms

Primary key:

code

Usage Warnings#

  • Contains information from multiple raw inputs.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_cooling_system_types.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_cooling_system_types.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_cooling_system_types.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_cooling_system_types.parquet"
)

Columns#

Field Name

Type

Description

code

string

Originally reported short code.

description

string

Long human-readable description of the meaning of a code/label.

label

string

Longer human-readable code using snake_case

core_eia__codes_cooling_tower_types#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for cooling towers.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA – Mix of multiple EIA Forms

Primary key:

code

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_cooling_tower_types.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_cooling_tower_types.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_cooling_tower_types.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_cooling_tower_types.parquet"
)

Columns#

Field Name

Type

Description

code

string

Originally reported short code.

description

string

Long human-readable description of the meaning of a code/label.

label

string

Longer human-readable code using snake_case

core_eia__codes_cooling_water_sources#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for cooling water sources in EIA-860.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA – Mix of multiple EIA Forms

Primary key:

code

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_cooling_water_sources.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_cooling_water_sources.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_cooling_water_sources.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_cooling_water_sources.parquet"
)

Columns#

Field Name

Type

Description

code

string

Originally reported short code.

description

string

Long human-readable description of the meaning of a code/label.

label

string

Longer human-readable code using snake_case

core_eia__codes_cooling_water_types#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for cooling water.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA – Mix of multiple EIA Forms

Primary key:

code

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_cooling_water_types.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_cooling_water_types.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_cooling_water_types.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_cooling_water_types.parquet"
)

Columns#

Field Name

Type

Description

code

string

Originally reported short code.

description

string

Long human-readable description of the meaning of a code/label.

label

string

Longer human-readable code using snake_case

core_eia__codes_emission_control_equipment_types#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for emissions control equipment installed on a boiler.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA – Mix of multiple EIA Forms

Primary key:

code

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_emission_control_equipment_types.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_emission_control_equipment_types.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_emission_control_equipment_types.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_emission_control_equipment_types.parquet"
)

Columns#

Field Name

Type

Description

code

string

Originally reported short code.

description

string

Long human-readable description of the meaning of a code/label.

label

string

Longer human-readable code using snake_case

core_eia__codes_energy_sources#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for energy sources reported to EIA.

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA – Mix of multiple EIA Forms

Primary key:

code

Usage Warnings#

  • Contains information from multiple raw inputs.

Additional Details#

Compiled from EIA-860 instructions and EIA-923 file layout spreadsheets.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_energy_sources.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_energy_sources.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_energy_sources.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_energy_sources.parquet"
)

Columns#

Field Name

Type

Description

code

string

Originally reported short code.

description

string

Long human-readable description of the meaning of a code/label.

fuel_derived_from

string

Original fuel from which this refined fuel was derived.

fuel_group_eia

string

High level fuel group defined in the 2021-2023 EIA Form 860 instructions, Table 28.

fuel_phase

string

Physical phase of matter of the fuel.

fuel_type_code_pudl

string

Simplified fuel type code used in PUDL

fuel_units

string

Reported unit of measure for fuel.

label

string

Longer human-readable code using snake_case

max_fuel_mmbtu_per_unit

number

Maximum heat content per physical unit of fuel in MMBTU.

min_fuel_mmbtu_per_unit

number

Minimum heat content per physical unit of fuel in MMBTU.

core_eia__codes_environmental_equipment_manufacturers#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for manufacturers of boilers and environmental control equipment in the EIA-860.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA – Mix of multiple EIA Forms

Primary key:

code

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_environmental_equipment_manufacturers.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_environmental_equipment_manufacturers.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_environmental_equipment_manufacturers.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_environmental_equipment_manufacturers.parquet"
)

Columns#

Field Name

Type

Description

code

string

Originally reported short code.

description

string

Long human-readable description of the meaning of a code/label.

label

string

Longer human-readable code using snake_case

core_eia__codes_firing_types#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for boiler firing types in the EIA-860.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA – Mix of multiple EIA Forms

Primary key:

code

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_firing_types.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_firing_types.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_firing_types.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_firing_types.parquet"
)

Columns#

Field Name

Type

Description

code

string

Originally reported short code.

description

string

Long human-readable description of the meaning of a code/label.

label

string

Longer human-readable code using snake_case

core_eia__codes_fuel_transportation_modes#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for fuel transportation modes reported in the EIA-860 and EIA-923.

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA – Mix of multiple EIA Forms

Primary key:

code

Usage Warnings#

  • Contains information from multiple raw inputs.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_fuel_transportation_modes.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_fuel_transportation_modes.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_fuel_transportation_modes.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_fuel_transportation_modes.parquet"
)

Columns#

Field Name

Type

Description

code

string

Originally reported short code.

description

string

Long human-readable description of the meaning of a code/label.

label

string

Longer human-readable code using snake_case

core_eia__codes_fuel_types_agg#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for aggregated fuel types used in the Annual Energy Review or Monthly Energy Review.

Most-recent data:

2025

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA – Mix of multiple EIA Forms

Primary key:

code

Additional Details#

See EIA-923 Fuel Code table for additional information.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_fuel_types_agg.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_fuel_types_agg.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_fuel_types_agg.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_fuel_types_agg.parquet"
)

Columns#

Field Name

Type

Description

code

string

Originally reported short code.

description

string

Long human-readable description of the meaning of a code/label.

core_eia__codes_mercury_compliance_strategies#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for compliance strategies used to control mercury in the EIA-860.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA – Mix of multiple EIA Forms

Primary key:

code

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_mercury_compliance_strategies.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_mercury_compliance_strategies.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_mercury_compliance_strategies.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_mercury_compliance_strategies.parquet"
)

Columns#

Field Name

Type

Description

code

string

Originally reported short code.

description

string

Long human-readable description of the meaning of a code/label.

label

string

Longer human-readable code using snake_case

core_eia__codes_momentary_interruptions#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for utility definitions of momentary service interruptions.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA – Mix of multiple EIA Forms

Primary key:

code

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_momentary_interruptions.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_momentary_interruptions.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_momentary_interruptions.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_momentary_interruptions.parquet"
)

Columns#

Field Name

Type

Description

code

string

Originally reported short code.

description

string

Long human-readable description of the meaning of a code/label.

label

string

Longer human-readable code using snake_case

core_eia__codes_nox_compliance_strategies#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for compliance strategies used to control nitrogen oxide in the EIA-860.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA – Mix of multiple EIA Forms

Primary key:

code

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_nox_compliance_strategies.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_nox_compliance_strategies.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_nox_compliance_strategies.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_nox_compliance_strategies.parquet"
)

Columns#

Field Name

Type

Description

code

string

Originally reported short code.

description

string

Long human-readable description of the meaning of a code/label.

label

string

Longer human-readable code using snake_case

core_eia__codes_nox_control_status#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for the operational status of nitrogen oxide control units associated with boilers in the EIA-860 data.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA – Mix of multiple EIA Forms

Primary key:

code

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_nox_control_status.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_nox_control_status.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_nox_control_status.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_nox_control_status.parquet"
)

Columns#

Field Name

Type

Description

code

string

Originally reported short code.

description

string

Long human-readable description of the meaning of a code/label.

label

string

Longer human-readable code using snake_case

core_eia__codes_nox_units#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for units of measurement for nitrogen oxide in the EIA-860.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA – Mix of multiple EIA Forms

Primary key:

code

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_nox_units.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_nox_units.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_nox_units.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_nox_units.parquet"
)

Columns#

Field Name

Type

Description

code

string

Originally reported short code.

description

string

Long human-readable description of the meaning of a code/label.

label

string

Longer human-readable code using snake_case

core_eia__codes_operational_status#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for operational status reported to EIA.

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA – Mix of multiple EIA Forms

Primary key:

code

Usage Warnings#

  • Contains information from multiple raw inputs.

Additional Details#

Compiled from EIA-860 instructions and EIA-923 file layout spreadsheets.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_operational_status.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_operational_status.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_operational_status.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_operational_status.parquet"
)

Columns#

Field Name

Type

Description

code

string

Originally reported short code.

description

string

Long human-readable description of the meaning of a code/label.

label

string

Longer human-readable code using snake_case

operational_status

string

The operating status of the asset. For generators this is based on which tab the generator was listed in in EIA 860.

core_eia__codes_particulate_compliance_strategies#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for compliance strategies used to control particulate matter in the EIA-860.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA – Mix of multiple EIA Forms

Primary key:

code

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_particulate_compliance_strategies.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_particulate_compliance_strategies.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_particulate_compliance_strategies.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_particulate_compliance_strategies.parquet"
)

Columns#

Field Name

Type

Description

code

string

Originally reported short code.

description

string

Long human-readable description of the meaning of a code/label.

label

string

Longer human-readable code using snake_case

core_eia__codes_particulate_units#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for units of measurement for particulate matter in the EIA-860.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA – Mix of multiple EIA Forms

Primary key:

code

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_particulate_units.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_particulate_units.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_particulate_units.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_particulate_units.parquet"
)

Columns#

Field Name

Type

Description

code

string

Originally reported short code.

description

string

Long human-readable description of the meaning of a code/label.

label

string

Longer human-readable code using snake_case

core_eia__codes_prime_movers#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for prime movers reported in the EIA-860 and EIA-923.

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA – Mix of multiple EIA Forms

Primary key:

code

Usage Warnings#

  • Contains information from multiple raw inputs.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_prime_movers.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_prime_movers.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_prime_movers.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_prime_movers.parquet"
)

Columns#

Field Name

Type

Description

code

string

Originally reported short code.

description

string

Long human-readable description of the meaning of a code/label.

label

string

Longer human-readable code using snake_case

core_eia__codes_regulations#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for levels of statutes and codes under which boilers operate in the EIA-860.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA – Mix of multiple EIA Forms

Primary key:

code

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_regulations.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_regulations.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_regulations.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_regulations.parquet"
)

Columns#

Field Name

Type

Description

code

string

Originally reported short code.

description

string

Long human-readable description of the meaning of a code/label.

label

string

Longer human-readable code using snake_case

core_eia__codes_reporting_frequencies#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for the reporting frequencies used by plants in the EIA-923.

Most-recent data:

2025

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA – Mix of multiple EIA Forms

Primary key:

code

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_reporting_frequencies.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_reporting_frequencies.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_reporting_frequencies.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_reporting_frequencies.parquet"
)

Columns#

Field Name

Type

Description

code

string

Originally reported short code.

description

string

Long human-readable description of the meaning of a code/label.

label

string

Longer human-readable code using snake_case

core_eia__codes_sector_consolidated#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for EIA consolidated NAICS sectors.

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA – Mix of multiple EIA Forms

Primary key:

code

Usage Warnings#

  • Contains information from multiple raw inputs.

Additional Details#

Codes and descriptions taken from the EIA-923 File Layout spreadsheet.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_sector_consolidated.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_sector_consolidated.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_sector_consolidated.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_sector_consolidated.parquet"
)

Columns#

Field Name

Type

Description

code

string

Originally reported short code.

description

string

Long human-readable description of the meaning of a code/label.

label

string

Longer human-readable code using snake_case

core_eia__codes_so2_compliance_strategies#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for compliance strategies used to control sulfur dioxide in the EIA-860.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA – Mix of multiple EIA Forms

Primary key:

code

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_so2_compliance_strategies.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_so2_compliance_strategies.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_so2_compliance_strategies.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_so2_compliance_strategies.parquet"
)

Columns#

Field Name

Type

Description

code

string

Originally reported short code.

description

string

Long human-readable description of the meaning of a code/label.

label

string

Longer human-readable code using snake_case

core_eia__codes_so2_units#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for units of measurement for sulfur dioxide in the EIA-860.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA – Mix of multiple EIA Forms

Primary key:

code

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_so2_units.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_so2_units.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_so2_units.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_so2_units.parquet"
)

Columns#

Field Name

Type

Description

code

string

Originally reported short code.

description

string

Long human-readable description of the meaning of a code/label.

label

string

Longer human-readable code using snake_case

core_eia__codes_sorbent_types#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for flue gas desulfurization sorbent types used in EIA-860.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA – Mix of multiple EIA Forms

Primary key:

code

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_sorbent_types.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_sorbent_types.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_sorbent_types.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_sorbent_types.parquet"
)

Columns#

Field Name

Type

Description

code

string

Originally reported short code.

description

string

Long human-readable description of the meaning of a code/label.

label

string

Longer human-readable code using snake_case

core_eia__codes_steam_plant_types#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for steam plants in the EIA-860.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA – Mix of multiple EIA Forms

Primary key:

code

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_steam_plant_types.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_steam_plant_types.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_steam_plant_types.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_steam_plant_types.parquet"
)

Columns#

Field Name

Type

Description

code

string

Originally reported short code.

description

string

Long human-readable description of the meaning of a code/label.

label

string

Longer human-readable code using snake_case

core_eia__codes_storage_enclosure_types#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for energy storage enclosures.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA – Mix of multiple EIA Forms

Primary key:

code

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_storage_enclosure_types.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_storage_enclosure_types.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_storage_enclosure_types.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_storage_enclosure_types.parquet"
)

Columns#

Field Name

Type

Description

code

string

Originally reported short code.

description

string

Long human-readable description of the meaning of a code/label.

label

string

Longer human-readable code using snake_case

core_eia__codes_storage_technology_types#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for energy storage technologies.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA – Mix of multiple EIA Forms

Primary key:

code

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_storage_technology_types.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_storage_technology_types.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_storage_technology_types.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_storage_technology_types.parquet"
)

Columns#

Field Name

Type

Description

code

string

Originally reported short code.

description

string

Long human-readable description of the meaning of a code/label.

label

string

Longer human-readable code using snake_case

core_eia__codes_wet_dry_bottom#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for boiler bottoms in the EIA-860.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA – Mix of multiple EIA Forms

Primary key:

code

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_wet_dry_bottom.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_wet_dry_bottom.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_wet_dry_bottom.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_wet_dry_bottom.parquet"
)

Columns#

Field Name

Type

Description

code

string

Originally reported short code.

description

string

Long human-readable description of the meaning of a code/label.

label

string

Longer human-readable code using snake_case

core_eia__codes_wind_quality_class#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for wind quality classes.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA – Mix of multiple EIA Forms

Primary key:

code

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_wind_quality_class.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_wind_quality_class.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__codes_wind_quality_class.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__codes_wind_quality_class.parquet"
)

Columns#

Field Name

Type

Description

code

string

Originally reported short code.

description

string

Long human-readable description of the meaning of a code/label.

extreme_fifty_year_gust_ms

number

The extreme 50-year wind gusts at this generator in meters per hour.

label

string

Longer human-readable code using snake_case

turbulence_intensity_a

number

The upper bounds of the turbulence intensity at the wind site (ratio of standard deviation of fluctuating wind velocity to the mean wind speed).

turbulence_intensity_b

number

The lower bounds of the turbulence intensity at the wind site (ratio of standard deviation of fluctuating wind velocity to the mean wind speed).

wind_speed_avg_ms

number

Average wind speed in meters per second.

core_eia__entity_boilers#

Noticed any irregularities with this table? File a report.

Entity table containing static information about boilers compiled from the EIA-860 and EIA-923.

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA – Mix of multiple EIA Forms

Primary key:

plant_id_eia, boiler_id

Usage Warnings#

  • Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.

  • Contains information from multiple raw inputs.

Additional Details#

This is one of two tables where canonical values for boilers are set. It contains values which are expected to remain fixed, while core_eia860__scd_boilers contains those which may vary from year to year. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process. All tables downstream of this one inherit the canonical values established here.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__entity_boilers.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__entity_boilers.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__entity_boilers.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__entity_boilers.parquet"
)

Columns#

Field Name

Type

Description

boiler_id

string

Alphanumeric boiler ID.

boiler_manufacturer

string

Name of boiler manufacturer.

boiler_manufacturer_code

string

EIA short code for boiler manufacturer.

plant_id_eia

integer

The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.

core_eia__entity_generators#

Noticed any irregularities with this table? File a report.

Entity table containing static information about generators compiled from across the EIA-860 and EIA-923.

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA – Mix of multiple EIA Forms

Primary key:

plant_id_eia, generator_id

Usage Warnings#

  • Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.

  • Contains information from multiple raw inputs.

Additional Details#

This is one of two tables where canonical values for generators are set. It contains values which are expected to remain fixed, while core_eia860__scd_generators contains those which may vary from year to year. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process. All tables downstream of this one inherit the canonical values established here.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__entity_generators.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__entity_generators.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__entity_generators.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__entity_generators.parquet"
)

Columns#

Field Name

Type

Description

associated_combined_heat_power

boolean

Indicates whether the generator is associated with a combined heat and power system

bypass_heat_recovery

boolean

Can this generator operate while bypassing the heat recovery steam generator?

can_switch_when_operating

boolean

Indicates whether a fuel switching generator can switch fuels while operating.

duct_burners

boolean

Indicates whether the unit has duct-burners for supplementary firing of the turbine exhaust gas

fluidized_bed_tech

boolean

Indicates whether the generator uses fluidized bed technology

generator_id

string

Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string!

generator_operating_date

date

Date the generator began commercial operation. If harvested values are inconsistent, we default to using the most recently reported date.

original_planned_generator_operating_date

date

The date the generator was originally scheduled to be operational

other_combustion_tech

boolean

Indicates whether the generator uses other combustion technologies

plant_id_eia

integer

The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.

previously_canceled

boolean

Indicates whether the generator was previously reported as indefinitely postponed or canceled

pulverized_coal_tech

boolean

Indicates whether the generator uses pulverized coal technology

rto_iso_lmp_node_id

string

The designation used to identify the price node in RTO/ISO Locational Marginal Price reports

rto_iso_location_wholesale_reporting_id

string

The designation used to report the specific location of the wholesale sales transactions to FERC for the Electric Quarterly Report

solid_fuel_gasification

boolean

Indicates whether the generator is part of a solid fuel gasification system

stoker_tech

boolean

Indicates whether the generator uses stoker technology

subcritical_tech

boolean

Indicates whether the generator uses subcritical technology

supercritical_tech

boolean

Indicates whether the generator uses supercritical technology

topping_bottoming_code

string

If the generator is associated with a combined heat and power system, indicates whether the generator is part of a topping cycle or a bottoming cycle

ultrasupercritical_tech

boolean

Indicates whether the generator uses ultra-supercritical technology

core_eia__entity_plants#

Noticed any irregularities with this table? File a report.

Entity table containing static information about plants, compiled from across all EIA-860 and EIA-923 data.

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA – Mix of multiple EIA Forms

Primary key:

plant_id_eia

Usage Warnings#

  • Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.

  • Contains information from multiple raw inputs.

Additional Details#

This is one of two tables where canonical values for plants are set. It contains values which are expected to remain fixed, while core_eia860__scd_plants contains those which may vary from year to year. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process. All tables downstream of this one inherit the canonical values established here.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__entity_plants.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__entity_plants.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__entity_plants.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__entity_plants.parquet"
)

Columns#

Field Name

Type

Description

city

string

Name of the city.

county

string

County name.

latitude

number

Latitude of the plant’s location, in degrees.

longitude

number

Longitude of the plant’s location, in degrees.

plant_id_eia

integer

The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.

plant_name_eia

string

Plant name.

state

string

Two letter US state abbreviation.

street_address

string

Physical street address.

timezone

string

IANA timezone name

zip_code

string

Five digit US Zip Code.

core_eia__entity_utilities#

Noticed any irregularities with this table? File a report.

Entity table containing static information about utilities, compiled from all EIA data.

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA – Mix of multiple EIA Forms

Primary key:

utility_id_eia

Usage Warnings#

  • Data has been drawn from several EIA sources which are not always consistent with each other, and PUDL chooses the most consistent or relevant value to facilitate cross-referencing even if that means some values will differ from the raw sources. See Harvesting for details, and see Entity Harvesting Methodology for a fuller conceptual overview.

  • Contains information from multiple raw inputs.

Additional Details#

This is one of two tables where canonical values for utilities are set. It contains values which are expected to remain fixed, while core_eia860__scd_utilities contains those which may vary from year to year. EIA reports many attributes in many different tables across EIA-860 and EIA-923. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. If an attribute was reported inconsistently across the original EIA tables, then it will show up as a null value. See Entity Resolution for a conceptual overview of this process. All tables downstream of this one inherit the canonical values established here.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__entity_utilities.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__entity_utilities.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__entity_utilities.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__entity_utilities.parquet"
)

Columns#

Field Name

Type

Description

utility_id_eia

integer

The EIA Utility Identification number.

utility_name_eia

string

The name of the utility.

core_eia__yearly_fuel_receipts_costs_aggs#

Noticed any irregularities with this table? File a report.

Annual time series of aggregated fuel receipts and costs from the EIA API.

Most-recent data:

2026

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA – Mix of multiple EIA Forms

Primary key:

fuel_agg, geo_agg, sector_agg, temporal_agg, report_date

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__yearly_fuel_receipts_costs_aggs.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__yearly_fuel_receipts_costs_aggs.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eia__yearly_fuel_receipts_costs_aggs.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eia__yearly_fuel_receipts_costs_aggs.parquet"
)

Columns#

Field Name

Type

Description

fuel_agg

string

Category of fuel aggregation in EIA bulk electricity data.

fuel_cost_per_mmbtu

number

Average fuel cost per MMBTU of heat content in nominal USD.

fuel_received_mmbtu

number

Aggregated fuel receipts, in MMBTU, in EIA bulk electricity data.

geo_agg

string

Category of geographic aggregation in EIA bulk electricity data.

report_date

date

Date reported.

sector_agg

string

Category of sectoral aggregation in EIA bulk electricity data.

temporal_agg

string

Category of temporal aggregation in EIA bulk electricity data.

core_eiaaeo__yearly_projected_electric_sales#

Noticed any irregularities with this table? File a report.

Annual time series of projected electricity sales by region and customer class.

Most-recent data:

2025

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA Annual Energy Outlook (AEO)

Primary key:

report_year, electricity_market_module_region_eiaaeo, model_case_eiaaeo, projection_year, customer_class

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eiaaeo__yearly_projected_electric_sales.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eiaaeo__yearly_projected_electric_sales.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eiaaeo__yearly_projected_electric_sales.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eiaaeo__yearly_projected_electric_sales.parquet"
)

Columns#

Field Name

Type

Description

customer_class

string

High level categorization of customer type (e.g., commercial, residential).

electricity_market_module_region_eiaaeo

string

AEO projection region.

model_case_eiaaeo

string

Factors such as economic growth, future oil prices, the ultimate size of domestic energy resources, and technological change are often uncertain. To illustrate some of these uncertainties, EIA runs side cases to show how the model responds to changes in key input variables compared with the Reference case. See https://www.eia.gov/outlooks/aeo/assumptions/case_descriptions.php for more details.

projection_year

integer

The year of the projected value.

report_year

integer

Four-digit year in which the data was reported.

sales_mwh

number

Quantity of electricity sold in MWh.

core_eiaaeo__yearly_projected_energy_use_by_sector_and_type#

Noticed any irregularities with this table? File a report.

Annual time series of projected energy use for commercial, electric power, industrial, residential, and transportation sectors, drawn from AEO Table 2.

Most-recent data:

2025

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA Annual Energy Outlook (AEO)

Primary key:

report_year, region_name_eiaaeo, model_case_eiaaeo, projection_year, energy_use_sector, energy_use_type

Usage Warnings#

  • Some columns contain subtotals; use caution when choosing columns to aggregate.

Additional Details#

The series in Table 2 which track energy use by sector do not always define each type of use the same way across sectors. There is detailed information about what is included or excluded in each use type for each sector in the footnotes of the EIA’s online AEO data browser:

https://www.eia.gov/outlooks/aeo/data/browser/#/?id=2-AEO2023

Use caution when aggregating across use types! Energy Use has a tricky system of subtotals, and summing all types within a sector will result in double-counting. Consult the EIA’s data browser for visibility into which use types are subtotals, and what they contain: subtotal series are displayed indented, and include all lines above them which are one level out, up to the next indented line. Delivered Energy and Total are special cases which include those plus all subtotals above. In this way, “Delivered Energy” includes purchased electricity, renewable energy, and an array of fuels based on sector, and explicitly excludes electricity-related losses.

AEO Energy Use figures are variously referred to as delivered energy, energy consumption, energy use, and energy demand, depending on which use types are being discussed, and which org and which document is describing them. In PUDL we say energy use or energy consumption.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eiaaeo__yearly_projected_energy_use_by_sector_and_type.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eiaaeo__yearly_projected_energy_use_by_sector_and_type.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eiaaeo__yearly_projected_energy_use_by_sector_and_type.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eiaaeo__yearly_projected_energy_use_by_sector_and_type.parquet"
)

Columns#

Field Name

Type

Description

energy_use_mmbtu

number

Energy use, in MMBTU; also referred to as energy consumption, energy demand, or delivered energy, depending on type.

energy_use_sector

string

Sector for energy use figures in AEO Table 2. Similar to customer class, but with some missing and some extra values.

energy_use_type

string

Type of energy use, indicating the name of the series from AEO Table 2. Includes fuels, electricity, losses, and various subtotals; consult table documentation for aggregation guidelines.

model_case_eiaaeo

string

Factors such as economic growth, future oil prices, the ultimate size of domestic energy resources, and technological change are often uncertain. To illustrate some of these uncertainties, EIA runs side cases to show how the model responds to changes in key input variables compared with the Reference case. See https://www.eia.gov/outlooks/aeo/assumptions/case_descriptions.php for more details.

projection_year

integer

The year of the projected value.

region_name_eiaaeo

string

EIA AEO region for energy consumption. Includes US Census Divisions plus United States.

region_type_eiaaeo

string

Region type for EIA AEO energy consumption, indicating whether region_name_eiaaeo is a US Census Division or country (United States)

report_year

integer

Four-digit year in which the data was reported.

core_eiaaeo__yearly_projected_fuel_cost_in_electric_sector_by_type#

Noticed any irregularities with this table? File a report.

Annual time series of projected fuel prices for the electric power sector, including electricity-only and combined-heat-and-power plants that have a regulatory status.

Most-recent data:

2025

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA Annual Energy Outlook (AEO)

Primary key:

report_year, electricity_market_module_region_eiaaeo, model_case_eiaaeo, projection_year, fuel_type_eiaaeo

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eiaaeo__yearly_projected_fuel_cost_in_electric_sector_by_type.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eiaaeo__yearly_projected_fuel_cost_in_electric_sector_by_type.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eiaaeo__yearly_projected_fuel_cost_in_electric_sector_by_type.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eiaaeo__yearly_projected_fuel_cost_in_electric_sector_by_type.parquet"
)

Columns#

Field Name

Type

Description

electricity_market_module_region_eiaaeo

string

AEO projection region.

fuel_cost_per_mmbtu

number

Average fuel cost per MMBTU of heat content in nominal USD.

fuel_cost_real_per_mmbtu_eiaaeo

number

Average fuel cost per MMBTU of heat content in real USD, standardized to the value of a USD in the year defined by real_cost_basis_year.

fuel_type_eiaaeo

string

Fuel type reported for AEO end-use sector generation data.

model_case_eiaaeo

string

Factors such as economic growth, future oil prices, the ultimate size of domestic energy resources, and technological change are often uncertain. To illustrate some of these uncertainties, EIA runs side cases to show how the model responds to changes in key input variables compared with the Reference case. See https://www.eia.gov/outlooks/aeo/assumptions/case_descriptions.php for more details.

projection_year

integer

The year of the projected value.

real_cost_basis_year

integer

Four-digit year which is the basis for any ‘real cost’ monetary values (as opposed to nominal values).

report_year

integer

Four-digit year in which the data was reported.

core_eiaaeo__yearly_projected_generation_in_electric_sector_by_technology#

Noticed any irregularities with this table? File a report.

Annual time series of projected generation capacity & total generation in the electric sector, broken out by technology.

Most-recent data:

2025

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA Annual Energy Outlook (AEO)

Primary key:

report_year, electricity_market_module_region_eiaaeo, model_case_eiaaeo, projection_year, technology_description_eiaaeo

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eiaaeo__yearly_projected_generation_in_electric_sector_by_technology.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eiaaeo__yearly_projected_generation_in_electric_sector_by_technology.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eiaaeo__yearly_projected_generation_in_electric_sector_by_technology.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eiaaeo__yearly_projected_generation_in_electric_sector_by_technology.parquet"
)

Columns#

Field Name

Type

Description

electricity_market_module_region_eiaaeo

string

AEO projection region.

gross_generation_mwh

number

Gross electricity generation for the specified period in megawatt-hours (MWh).

model_case_eiaaeo

string

Factors such as economic growth, future oil prices, the ultimate size of domestic energy resources, and technological change are often uncertain. To illustrate some of these uncertainties, EIA runs side cases to show how the model responds to changes in key input variables compared with the Reference case. See https://www.eia.gov/outlooks/aeo/assumptions/case_descriptions.php for more details.

projection_year

integer

The year of the projected value.

report_year

integer

Four-digit year in which the data was reported.

summer_capacity_mw

number

The net summer capacity.

summer_capacity_planned_additions_mw

number

The total planned additions to net summer generating capacity.

summer_capacity_retirements_mw

number

The total retirements from net summer generating capacity.

summer_capacity_unplanned_additions_mw

number

The total unplanned additions to net summer generating capacity.

technology_description_eiaaeo

string

Generation technology reported for AEO.

core_eiaaeo__yearly_projected_generation_in_end_use_sectors_by_fuel_type#

Noticed any irregularities with this table? File a report.

Annual time series of projected generation capacity and total generation in the end-use sector, broken out by fuel type.

Most-recent data:

2025

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EIA Annual Energy Outlook (AEO)

Primary key:

report_year, electricity_market_module_region_eiaaeo, model_case_eiaaeo, projection_year, fuel_type_eiaaeo

Usage Warnings#

  • The solar photovoltaic capacity portion of Renewable Sources is reported in megawatts direct current; other technologies are reported in megawatts of alternating current.

Additional Details#

Includes combined-heat-and-power plants and electricity-only plants in the commercial and industrial sectors; and small on-site generating systems in the residential, commercial, and industrial sectors used primarily for own-use generation, but which may also sell some power to the grid.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eiaaeo__yearly_projected_generation_in_end_use_sectors_by_fuel_type.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eiaaeo__yearly_projected_generation_in_end_use_sectors_by_fuel_type.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_eiaaeo__yearly_projected_generation_in_end_use_sectors_by_fuel_type.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_eiaaeo__yearly_projected_generation_in_end_use_sectors_by_fuel_type.parquet"
)

Columns#

Field Name

Type

Description

electricity_market_module_region_eiaaeo

string

AEO projection region.

fuel_type_eiaaeo

string

Fuel type reported for AEO end-use sector generation data.

gross_generation_mwh

number

Gross electricity generation for the specified period in megawatt-hours (MWh).

model_case_eiaaeo

string

Factors such as economic growth, future oil prices, the ultimate size of domestic energy resources, and technological change are often uncertain. To illustrate some of these uncertainties, EIA runs side cases to show how the model responds to changes in key input variables compared with the Reference case. See https://www.eia.gov/outlooks/aeo/assumptions/case_descriptions.php for more details.

projection_year

integer

The year of the projected value.

report_year

integer

Four-digit year in which the data was reported.

summer_capacity_mw

number

The net summer capacity.

core_epa__assn_eia_epacamd#

Noticed any irregularities with this table? File a report.

Association table providing connections between EPA units and EIA plants, boilers, and generators.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EPA – Mix of multiple EPA sources

Primary key:

This table has no primary key.

Additional Details#

This crosswalk table comes from the PUDL fork of the EPA camd-eia-crosswalk Github repo: https://github.com/catalyst-cooperative/camd-eia-crosswalk-latest.

The camd-eia-crosswalk README and our Data Source documentation page on EPA Hourly Continuous Emission Monitoring System (CEMS) depict the complicated relationship between EIA and EPA data, specifically the nature of EPA vs. EIA “units” and the level of granularity that one can connect the two sources.

The original EPA crosswalk runs on 2018 EIA data. We adapted the crosswalk code to run on each new year of EIA data, capturing changes in plant information over time.

Our version of the crosswalk clarifies some of the column names and removes unmatched rows. The pudl.etl.glue_assets.core_epa__assn_eia_epacamd() function doc strings explain what changes are made from the EPA’s version.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_epa__assn_eia_epacamd.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_epa__assn_eia_epacamd.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_epa__assn_eia_epacamd.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_epa__assn_eia_epacamd.parquet"
)

Columns#

Field Name

Type

Description

boiler_id

string

Alphanumeric boiler ID.

emissions_unit_id_epa

string

Emissions (smokestack) unit monitored by EPA CEMS.

generator_id

string

Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string!

generator_id_epa

string

Generator ID used by the EPA.

plant_id_eia

integer

The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.

plant_id_epa

integer

The ORISPL ID used by EPA to refer to the plant. Usually but not always the same as plant_id_eia.

report_year

integer

Four-digit year in which the data was reported.

core_epa__assn_eia_epacamd_subplant_ids#

Noticed any irregularities with this table? File a report.

Association table providing connections between EPA units and EIA units/generators, at the subplant level.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EPA – Mix of multiple EPA sources

Primary key:

This table has no primary key. The primary keys would have been: plant_id_eia, generator_id, subplant_id and emissions_unit_id_epa, but there are some null records in the generator_id column. ~2 percent of all EPA CAMD records are not successfully mapped to EIA generators.

Additional Details#

This table is an augmented version of the core_epa__assn_eia_epacamd crosswalk table which initially comes from the EPA’s Github repo camd-eia-crosswalk: https://github.com/USEPA/camd-eia-crosswalk.

This table identifies subplants within plant_ids, which are the smallest coherent units for aggregation. A plant_id refers to a legal entity that often contains multiple distinct power plants, even of different technology or fuel types.

EPA CEMS data combines information from several parts of a power plant:

  • emissions from smokestacks

  • fuel use from combustors

  • electricity production from generators

But smokestacks, combustors, and generators can be connected in complex, many-to-many relationships. This complexity makes attribution difficult for, as an example, allocating pollution to energy producers. Furthermore, heterogeneity within plant_ids make aggregation to the parent entity difficult or inappropriate.

This table inherits from the EPA’s crosswalk, the IDs from EPA CAMD core_epacems__hourly_emissions table itself, the core_eia860__assn_boiler_generator table and the core_eia860__scd_generators table. While the core_epa__assn_eia_epacamd table is the core backbone of the table, EPA CAMD IDs ensure there is complete coverage of EPA CAMD reporting units. The EIA 860 table addition ensures there is also complete coverage of those units as well.

For more information about the how this subplant_id is made, see the documentation for pudl.etl.glue_assets.make_subplant_ids() and pudl.etl.glue_assets.update_subplant_ids().

But by analyzing the relationships between combustors and generators, as provided in the core_epa__assn_eia_epacamd crosswalk, we can identify distinct power plants. These are the smallest coherent units of aggregation.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_epa__assn_eia_epacamd_subplant_ids.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_epa__assn_eia_epacamd_subplant_ids.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_epa__assn_eia_epacamd_subplant_ids.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_epa__assn_eia_epacamd_subplant_ids.parquet"
)

Columns#

Field Name

Type

Description

emissions_unit_id_epa

string

Emissions (smokestack) unit monitored by EPA CEMS.

generator_id

string

Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string!

plant_id_eia

integer

The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.

plant_id_epa

integer

The ORISPL ID used by EPA to refer to the plant. Usually but not always the same as plant_id_eia.

subplant_id

integer

Sub-plant ID links EPA CEMS emissions units to EIA units.

unit_id_pudl

integer

Dynamically assigned PUDL unit id. WARNING: This ID is not guaranteed to be static long term as the input data and algorithm may evolve over time.

core_epacems__hourly_emissions#

Noticed any irregularities with this table? File a report.

Hourly time series of emissions and plant operational data as required by 40 CFR Part 75.

Most-recent data:

2025

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

EPA Hourly Continuous Emission Monitoring System (CEMS)

Primary key:

plant_id_epa, emissions_unit_id_epa, operating_datetime_utc

Usage Warnings#

  • Large table; do not attempt to open with Excel.

  • Not all IDs are present.

  • Contains information from multiple raw inputs.

Additional Details#

Continuous Emissions Monitoring Systems (CEMS) are used to determine the rate of gas or particulate matter exiting a point source of emissions. The EPA Clean Air Markets Division (CAMD) has collected data on power plant emissions from CEMS units stretching back to 1995. The CEMS dataset includes hourly gross load, SO2, CO2, and NOx emissions associated with a given emissions-unit during startup, shutdown, and instances of malfunction. An EPA CEMS emissions-unit or smokestack unit is not the same as an EIA unit. See core_epa__assn_eia_epacamd for details about how to connect CEMS data to corresponding EIA units.

Only fossil-combustion units over 25 MW are required to install and use CEMS, so there are some units that do report in EIA-860 or EIA-923 that do not have any CEMS data.

Access methods#

Warning

This table will likely not fit in your computer’s memory. Make sure to filter or aggregate to reduce the amount of data read in. This is easier in polars and duckdb than in pandas, so we only provide examples for these.

import polars as pl
df = (
    # Use scan_parquet (lazy evaluation) to plan the query before execution
    pl.scan_parquet(
        "s3://pudl.catalyst.coop/nightly/core_epacems__hourly_emissions.parquet",
        storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
    )
    # Select only a subset of columns to reduce network and memory usage
    .select(
        [
            "state",
            "plant_id_eia",
            "plant_id_epa",
            "operating_datetime_utc",
            "operating_time_hours",
            "emissions_unit_id_epa",
            "gross_load_mw",
            "heat_content_mmbtu",
            "co2_mass_tons",
        ]
    )
    .filter(
        # Select data for only a single plant to reduce network and memory usage
        (pl.col("plant_id_eia") == 470)
        # The row groups in this Parquet file are partitioned by state and year,
        # so indicating the state allows the query to skip irrelevant groups.
        & (pl.col("state") == "CO")
    )
    # Now execute the planned query
    .collect()
)
SELECT
    -- Select only a subset of columns to reduce network and memory usage
    state,
    plant_id_eia,
    plant_id_epa,
    operating_datetime_utc,
    operating_time_hours,
    emissions_unit_id_epa,
    gross_load_mw,
    heat_content_mmbtu,
    co2_mass_tons
FROM 's3://pudl.catalyst.coop/nightly/core_epacems__hourly_emissions.parquet'
WHERE
    -- Select data for only a single plant to reduce network and memory usage
    plant_id_eia = 470
    -- The row groups in this Parquet file are partitioned by state and year,
    -- so indicating the state allows the query to skip irrelevant groups.
    AND state = 'CO';

Columns#

Field Name

Type

Description

co2_mass_measurement_code

string

Identifies whether the reported value of emissions was measured, calculated, or measured and substitute.

co2_mass_tons

number

Carbon dioxide emissions in short tons.

emissions_unit_id_epa

string

Emissions (smokestack) unit monitored by EPA CEMS.

gross_load_mw

number

Average power in megawatts delivered during time interval measured.

heat_content_mmbtu

number

The energy contained in fuel burned, measured in million BTU.

nox_mass_lbs

number

NOx emissions in pounds.

nox_mass_measurement_code

string

Identifies whether the reported value of emissions was measured, calculated, or measured and substitute.

operating_datetime_utc

datetime

Date and time measurement began (UTC).

operating_time_hours

number

Length of time interval measured.

plant_id_eia

integer

The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.

plant_id_epa

integer

The ORISPL ID used by EPA to refer to the plant. Usually but not always the same as plant_id_eia.

so2_mass_lbs

number

Sulfur dioxide emissions in pounds.

so2_mass_measurement_code

string

Identifies whether the reported value of emissions was measured, calculated, or measured and substitute.

state

string

Two letter US state abbreviation.

steam_load_lbs

number

Total steam pressure produced by a unit during the reported hour.

year

integer

Year the data was reported in, used for partitioning EPA CEMS.

core_ferc1__codes_power_purchase_types#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for electric power purchase types.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

FERC Form 1 – Annual Report of Major Electric Utilities

Primary key:

code

Usage Warnings#

  • FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__codes_power_purchase_types.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__codes_power_purchase_types.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc1__codes_power_purchase_types.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__codes_power_purchase_types.parquet"
)

Columns#

Field Name

Type

Description

code

string

Originally reported short code.

description

string

Long human-readable description of the meaning of a code/label.

label

string

Longer human-readable code using snake_case

core_ferc1__yearly_balance_sheet_assets_sched110#

Noticed any irregularities with this table? File a report.

Annual time series of utility assets and other debits.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 110)

Primary key:

utility_id_ferc1, report_year, asset_type

Usage Warnings#

  • Some columns contain subtotals; use caution when choosing columns to aggregate.

  • FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_balance_sheet_assets_sched110.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_balance_sheet_assets_sched110.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc1__yearly_balance_sheet_assets_sched110.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_balance_sheet_assets_sched110.parquet"
)

Columns#

Field Name

Type

Description

asset_type

string

Type of asset being reported to the core_ferc1__yearly_balance_sheet_assets_sched110 table.

balance

string

Indication of whether a column is a credit or debit, as reported in the XBRL taxonomy.

ending_balance

number

Account balance at end of year.

ferc_account

string

Actual FERC Account number (e.g. ‘359.1’) if available, or a PUDL assigned ID when FERC accounts have been split or combined in reporting.

record_id

string

Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.

report_year

integer

Four-digit year in which the data was reported.

row_type_xbrl

string

Indicates whether the value reported in the row is calculated, or uniquely reported within the table.

starting_balance

number

Account balance at beginning of year.

utility_id_ferc1

integer

PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.

utility_type

string

Listing of utility plant types. Examples include Electric Utility, Gas Utility, and Other Utility.

core_ferc1__yearly_balance_sheet_liabilities_sched110#

Noticed any irregularities with this table? File a report.

Annual time series of utility liabilities and other credits.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 110)

Primary key:

utility_id_ferc1, report_year, liability_type

Usage Warnings#

  • Some columns contain subtotals; use caution when choosing columns to aggregate.

  • FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_balance_sheet_liabilities_sched110.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_balance_sheet_liabilities_sched110.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc1__yearly_balance_sheet_liabilities_sched110.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_balance_sheet_liabilities_sched110.parquet"
)

Columns#

Field Name

Type

Description

balance

string

Indication of whether a column is a credit or debit, as reported in the XBRL taxonomy.

ending_balance

number

Account balance at end of year.

ferc_account

string

Actual FERC Account number (e.g. ‘359.1’) if available, or a PUDL assigned ID when FERC accounts have been split or combined in reporting.

liability_type

string

Type of liability being reported to the core_ferc1__yearly_balance_sheet_liabilities_sched110 table.

record_id

string

Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.

report_year

integer

Four-digit year in which the data was reported.

row_type_xbrl

string

Indicates whether the value reported in the row is calculated, or uniquely reported within the table.

starting_balance

number

Account balance at beginning of year.

utility_id_ferc1

integer

PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.

utility_type

string

Listing of utility plant types. Examples include Electric Utility, Gas Utility, and Other Utility.

core_ferc1__yearly_cash_flows_sched120#

Noticed any irregularities with this table? File a report.

Annual time series of utility cash flow.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 120)

Primary key:

utility_id_ferc1, report_year, amount_type

Usage Warnings#

  • Some columns contain subtotals; use caution when choosing columns to aggregate.

  • FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_cash_flows_sched120.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_cash_flows_sched120.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc1__yearly_cash_flows_sched120.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_cash_flows_sched120.parquet"
)

Columns#

Field Name

Type

Description

amount

number

Reported amount of dollars. This could be a balance or a change in value.

amount_type

string

Label describing the type of amount being reported. This could be a balance or a change in value.

balance

string

Indication of whether a column is a credit or debit, as reported in the XBRL taxonomy.

record_id

string

Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.

report_year

integer

Four-digit year in which the data was reported.

row_type_xbrl

string

Indicates whether the value reported in the row is calculated, or uniquely reported within the table.

utility_id_ferc1

integer

PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.

core_ferc1__yearly_depreciation_by_function_sched219#

Noticed any irregularities with this table? File a report.

Annual time series of ending balances in accumulated provision for depreciation of electric utility plant.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 219 - Section B)

Primary key:

utility_id_ferc1, report_year, depreciation_type, plant_function, plant_status, utility_type

Usage Warnings#

  • Some columns contain subtotals; use caution when choosing columns to aggregate.

  • FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

Electric utility plant refers to FERC Account 108. Section B includes the Balances at End of Year According to Functional Classification (plant_function).

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_depreciation_by_function_sched219.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_depreciation_by_function_sched219.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc1__yearly_depreciation_by_function_sched219.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_depreciation_by_function_sched219.parquet"
)

Columns#

Field Name

Type

Description

balance

string

Indication of whether a column is a credit or debit, as reported in the XBRL taxonomy.

depreciation_type

string

Type of depreciation provision within FERC Account 108, including cost ofremoval, depreciation expenses, salvage, cost of retired plant, etc.

ending_balance

number

Account balance at end of year.

plant_function

string

Functional role played by utility plant (steam production, nuclear production, distribution, transmission, etc.).

plant_status

string

Utility plant financial status (in service, future, leased, total).

record_id

string

Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.

report_year

integer

Four-digit year in which the data was reported.

row_type_xbrl

string

Indicates whether the value reported in the row is calculated, or uniquely reported within the table.

utility_id_ferc1

integer

PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.

utility_type

string

Listing of utility plant types. Examples include Electric Utility, Gas Utility, and Other Utility.

core_ferc1__yearly_depreciation_changes_sched219#

Noticed any irregularities with this table? File a report.

Annual time series of changes in accumulated provision for depreciation of electric utility plant.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 219 - Section A)

Primary key:

utility_id_ferc1, report_year, depreciation_type, plant_status, utility_type

Usage Warnings#

  • Some columns contain subtotals; use caution when choosing columns to aggregate.

  • FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

Electric utility plant refers to FERC Account 108.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_depreciation_changes_sched219.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_depreciation_changes_sched219.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc1__yearly_depreciation_changes_sched219.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_depreciation_changes_sched219.parquet"
)

Columns#

Field Name

Type

Description

balance

string

Indication of whether a column is a credit or debit, as reported in the XBRL taxonomy.

depreciation_type

string

Type of depreciation provision within FERC Account 108, including cost ofremoval, depreciation expenses, salvage, cost of retired plant, etc.

dollar_value

number

Dollar value of reported income, expense, asset, or liability.

ferc_account

string

Actual FERC Account number (e.g. ‘359.1’) if available, or a PUDL assigned ID when FERC accounts have been split or combined in reporting.

plant_status

string

Utility plant financial status (in service, future, leased, total).

record_id

string

Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.

report_year

integer

Four-digit year in which the data was reported.

row_type_xbrl

string

Indicates whether the value reported in the row is calculated, or uniquely reported within the table.

utility_id_ferc1

integer

PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.

utility_type

string

Listing of utility plant types. Examples include Electric Utility, Gas Utility, and Other Utility.

core_ferc1__yearly_depreciation_factors_sched336#

Noticed any irregularities with this table? File a report.

Annual time series of factors used in estimating depreciation charges.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 336 - Section C)

Primary key:

This table has no primary key. This table is too unstructured to have a primary key.

Usage Warnings#

  • Some columns contain subtotals; use caution when choosing columns to aggregate.

  • Contains columns which may appear categorical, but are actually free text.

  • The rate and percentage (pct) columns are reported either as values between 0-1 or 0-100. It is often consistent within a given utility-year but use caution when comparing value across different utilities and even the same utility filing in a different year.

  • FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

This table contains details at a variety of levels of granularity.There are many free-form text fields in this table which results in respondents filling out this table very differently from each other or from year to year. We recommend using this table carefully with one utility-year at a time - not attempting to perform analysis across long time-series or across utilities without much caution and cleaning. This table only contains information from Section C: Factors Used in Estimating Depreciation Charges.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_depreciation_factors_sched336.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_depreciation_factors_sched336.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc1__yearly_depreciation_factors_sched336.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_depreciation_factors_sched336.parquet"
)

Columns#

Field Name

Type

Description

account_num

string

Account number(s) in connection with factors used in estimating depreciation charges.

depreciable_plant_base

number

Depreciable plant balance (depreciable base) to which rates are applied.

depreciation_factors

string

Label of the factor of depreciation factors. This field contains is an unstructured, free-form strings. It often includes FERC account IDs, sometimes includes plant names and sometimes includes headers indicating the beginning of a new section - such as a plant or asset type header which is followed by sub-components like FERC account IDs or plant names depending on the section.

depreciation_rate

string

Depreciation rate applied to utility plant balance.Be aware that the formatting of this column is not expected to be standard - expect some values between 0-1 and some between 0-100.

mortality_curve_type

string

Description of the type of mortality curve selected in plant mortality studies prepared to assist in estimating average service lives.

net_salvage_pct

number

Percentage representing the estimated value of utility plant at the end of its service life. Be aware that the formatting of this column is not expected to be standard - expect some values between 0-1 and some between 0-100.

order_num

number

This field is defined in FERC-XBRL documentation as a field that is used to sequence a table.FERC-XBRL documentation notes: ‘This field is added to a table to control ordering of the items on the table.’FERC’s documentation also notes that this field should always be an integer - although there are many instances of floating point values which seem to increment by decimal points. Nonetheless, this field can be used to help understand the original order of the table. This field did not exist prior to FERC publishing Form 1 as XBRL and thus is always null prior to 2021.

record_id

string

Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.

remaining_life_avg

number

Estimated weighted average of remaining life of utility plant assets.The original DBF data was reported as years while the XBRL data was reported as strings in this time duration format: PnYnMnDTnHnMnS. PUDL converts these string time durations into years.

report_year

integer

Four-digit year in which the data was reported.

service_life_avg

number

Estimated average service life of utility plant. The original DBF data was reported as years while the XBRL data was reported as strings in this time duration format: PnYnMnDTnHnMnS. PUDL converts these string time durations into years.

utility_id_ferc1

integer

PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.

core_ferc1__yearly_depreciation_summary_sched336#

Noticed any irregularities with this table? File a report.

Annual time series of depreciation and amortization of electric plant.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 336 - Section A)

Primary key:

utility_id_ferc1, report_year, plant_function, ferc_account_label

Usage Warnings#

  • Some columns contain subtotals; use caution when choosing columns to aggregate.

  • FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

Electric Plant refers to FERC Accounts 403, 404, and 405. This table only contains information from Section A: Summary of depreciation and amortization changes.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_depreciation_summary_sched336.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_depreciation_summary_sched336.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc1__yearly_depreciation_summary_sched336.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_depreciation_summary_sched336.parquet"
)

Columns#

Field Name

Type

Description

dollar_value

number

Dollar value of reported income, expense, asset, or liability.

ferc_account

string

Actual FERC Account number (e.g. ‘359.1’) if available, or a PUDL assigned ID when FERC accounts have been split or combined in reporting.

ferc_account_label

string

Long FERC account identifier derived from values reported in the XBRL taxonomies. May also refer to aggregations of individual FERC accounts.

plant_function

string

Functional role played by utility plant (steam production, nuclear production, distribution, transmission, etc.).

record_id

string

Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.

report_year

integer

Four-digit year in which the data was reported.

row_type_xbrl

string

Indicates whether the value reported in the row is calculated, or uniquely reported within the table.

utility_id_ferc1

integer

PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.

utility_type

string

Listing of utility plant types. Examples include Electric Utility, Gas Utility, and Other Utility.

core_ferc1__yearly_energy_dispositions_sched401#

Noticed any irregularities with this table? File a report.

Annual time series of dispositions of electric energy sold, exchanged, or stored.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 401a)

Primary key:

utility_id_ferc1, report_year, energy_disposition_type

Usage Warnings#

  • Some columns contain subtotals; use caution when choosing columns to aggregate.

  • FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

Electric Energy Account, dispositions only. Schedule 401a. Electricity utilities delivered to end users, internal losses, etc.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_energy_dispositions_sched401.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_energy_dispositions_sched401.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc1__yearly_energy_dispositions_sched401.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_energy_dispositions_sched401.parquet"
)

Columns#

Field Name

Type

Description

energy_disposition_type

string

Type of energy disposition reported in the core_ferc1__yearly_energy_dispositions_sched401. Dispositions include sales to customers, re-sales of energy, energy used internally, losses, etc.

energy_mwh

number

Sources and uses of energy in MWh.

record_id

string

Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.

report_year

integer

Four-digit year in which the data was reported.

row_type_xbrl

string

Indicates whether the value reported in the row is calculated, or uniquely reported within the table.

utility_id_ferc1

integer

PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.

core_ferc1__yearly_energy_sources_sched401#

Noticed any irregularities with this table? File a report.

Annual time series of sources of electric energy generated or purchased, exchanged and wheeled.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 401a)

Primary key:

utility_id_ferc1, report_year, energy_source_type

Usage Warnings#

  • Some columns contain subtotals; use caution when choosing columns to aggregate.

  • FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

Electric Energy Account, sources only. Schedule 401a. Amount of electricity the utility obtained from each of several sources.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_energy_sources_sched401.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_energy_sources_sched401.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc1__yearly_energy_sources_sched401.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_energy_sources_sched401.parquet"
)

Columns#

Field Name

Type

Description

energy_mwh

number

Sources and uses of energy in MWh.

energy_source_type

string

Type of energy source reported in the core_ferc1__yearly_energy_sources_sched401 table. There are three groups of energy sources: generation, power exchanges and transmission.

record_id

string

Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.

report_year

integer

Four-digit year in which the data was reported.

row_type_xbrl

string

Indicates whether the value reported in the row is calculated, or uniquely reported within the table.

utility_id_ferc1

integer

PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.

core_ferc1__yearly_hydroelectric_plants_sched406#

Noticed any irregularities with this table? File a report.

Annual time series of plant statistics for large hydroelectric generating plants.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 406)

Primary key:

This table has no primary key. The best approximation for primary keys for this table would be: report_year, utility_id_ferc1, plant_name_ferc1. FERC does not publish plant IDs. The main identifying column is plant_name_ferc1 but that is a free-form string field and there are duplicate records.

Usage Warnings#

  • The plant_type and construction_type are standardized into categorical values from free-form strings.

  • FERC does not restrict respondents to report unique and non-duplicative plant records. There are sporadic instances of respondents reporting portions of plants and then the total plant (ex: unit 1, unit 2 and total). Use caution when aggregating.

  • FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

Large plants have an installed nameplate capacity of more than 10 MW.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_hydroelectric_plants_sched406.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_hydroelectric_plants_sched406.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc1__yearly_hydroelectric_plants_sched406.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_hydroelectric_plants_sched406.parquet"
)

Columns#

Field Name

Type

Description

asset_retirement_cost

number

Asset retirement cost (USD).

avg_num_employees

number

The average number of employees assigned to each plant.

capacity_mw

number

Total installed (nameplate) capacity, in megawatts.

capex_equipment

number

Cost of plant: equipment (USD).

capex_facilities

number

Cost of plant: reservoirs, dams, and waterways (USD).

capex_land

number

Cost of plant: land and land rights (USD).

capex_per_mw

number

Cost of plant per megawatt of installed (nameplate) capacity. Nominal USD.

capex_roads

number

Cost of plant: roads, railroads, and bridges (USD).

capex_structures

number

Cost of plant: structures and improvements (USD).

capex_total

number

Total cost of plant (USD).

construction_type

string

Type of plant construction (‘outdoor’, ‘semioutdoor’, or ‘conventional’). Categorized by PUDL based on our best guess of intended value in FERC1 freeform strings.

construction_year

integer

Year the plant’s oldest still operational unit was built.

installation_year

integer

Year the plant’s most recently built unit was installed.

net_capacity_adverse_conditions_mw

number

Net plant capability under the least favorable operating conditions, in megawatts.

net_capacity_favorable_conditions_mw

number

Net plant capability under the most favorable operating conditions, in megawatts.

net_generation_mwh

number

Net electricity generation for the specified period in megawatt-hours (MWh).

opex_dams

number

Production expenses: maintenance of reservoirs, dams, and waterways (USD).

opex_electric

number

Production expenses: electric expenses (USD).

opex_engineering

number

Production expenses: maintenance, supervision, and engineering (USD).

opex_generation_misc

number

Production expenses: miscellaneous power generation expenses (USD).

opex_hydraulic

number

Production expenses: hydraulic expenses (USD).

opex_misc_plant

number

Production expenses: maintenance of miscellaneous hydraulic plant (USD).

opex_operations

number

Production expenses: operations, supervision, and engineering (USD).

opex_per_mwh

number

Total production expenses (USD per MWh generated).

opex_plant

number

Production expenses: maintenance of electric plant (USD).

opex_rents

number

Production expenses: rents (USD).

opex_structures

number

Production expenses: maintenance of structures (USD).

opex_total

number

Total production expenses, excluding fuel (USD).

opex_water_for_power

number

Production expenses: water for power (USD).

peak_demand_mw

number

Net peak demand for 60 minutes. Note: in some cases peak demand for other time periods may have been reported instead, if hourly peak demand was unavailable.

plant_hours_connected_while_generating

number

Hours the plant was connected to load while generating in the report year.

plant_name_ferc1

string

Name of the plant, as reported to FERC. This is a freeform string, not guaranteed to be consistent across references to the same plant.

plant_type

string

Type of plant.

project_num

integer

FERC Licensed Project Number.

record_id

string

Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.

report_year

integer

Four-digit year in which the data was reported.

utility_id_ferc1

integer

PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.

core_ferc1__yearly_identification_certification#

Noticed any irregularities with this table? File a report.

Annual time series of respondent identification and corporate officer certification information.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

FERC Form 1 – Annual Report of Major Electric Utilities (Part I and II)

Primary key:

utility_id_ferc1, report_year, report_filing_type

Usage Warnings#

  • Contains columns which may appear categorical, but are actually free text.

  • FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

This table contains information identifying a utility’s contact information, addresses and individuals responsible for accounting and certification. Note that the entity_id_gleif field does not conform to the expected format and largely includes the names of respondents.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_identification_certification.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_identification_certification.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc1__yearly_identification_certification.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_identification_certification.parquet"
)

Columns#

Field Name

Type

Description

attestation_date

date

Date that corporate officer’s certification was signed.

attestation_name

string

Name of person signing the corporate officer’s certification.

attestation_signature

string

Signature of corporate officer to certify the report.

attestation_title

string

Title of person signing the corporate officer’s certification.

company_id_ferc

string

The Company Identifier (CID) obtained through FERC’s Company Registration system.

contact_address

string

Street address for contact identified in contact_name.

contact_city

string

City for contact identified in contact_name.

contact_email

string

Email for contact identified in contact_name.

contact_name

string

Name of contact person for the respondent.

contact_phone

string

Phone number for contact identified in contact_name.

contact_state

string

State for contact identified in contact_name.

contact_title

string

Title of contact person for the respondent.

contact_zip

string

Zip code for contact identified in contact_name.

entity_id_gleif

string

Legal Entity Identification Number issued by Global Legal Entity Identifier Foundation (GLEIF). The Legal Entity Identifier (LEI) is a 20-character, alpha-numeric code based on the ISO 17442 standard developed by the International Organization for Standardization (ISO).

filing_date

date

Date on which the filing was submitted.

filing_software_vendor_name

string

Name of software company used for creating the instance file submitted. If the filer created the submission, is null

name_change_date

date

Date of last name change of the company.

office_city

string

City of principal office at end of period.

office_state

string

State of principal office at end of period.

office_street_address

string

Street address of principal office at end of period.

office_zip

string

Zipcode of principal office at end of period.

prior_utility_name_ferc1

string

Former name of the responding utility, when it changed within the last year.

record_id

string

Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.

report_filing_type

string

Type of report submitted: O (original) or R (revision).

report_year

integer

Four-digit year in which the data was reported.

utility_id_ferc1

integer

PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.

utility_name_ferc1

string

Name of the responding utility, as it is reported in FERC Form 1. For human readability only.

core_ferc1__yearly_income_statements_sched114#

Noticed any irregularities with this table? File a report.

Annual time series of utility income statements.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 114)

Primary key:

utility_id_ferc1, report_year, utility_type, income_type

Usage Warnings#

  • Some columns contain subtotals; use caution when choosing columns to aggregate.

  • FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_income_statements_sched114.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_income_statements_sched114.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc1__yearly_income_statements_sched114.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_income_statements_sched114.parquet"
)

Columns#

Field Name

Type

Description

balance

string

Indication of whether a column is a credit or debit, as reported in the XBRL taxonomy.

dollar_value

number

Dollar value of reported income, expense, asset, or liability.

ferc_account

string

Actual FERC Account number (e.g. ‘359.1’) if available, or a PUDL assigned ID when FERC accounts have been split or combined in reporting.

income_type

string

Type of income reported in core_ferc1__yearly_income_statements_sched114 table.

record_id

string

Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.

report_year

integer

Four-digit year in which the data was reported.

row_type_xbrl

string

Indicates whether the value reported in the row is calculated, or uniquely reported within the table.

utility_id_ferc1

integer

PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.

utility_type

string

Listing of utility plant types. Examples include Electric Utility, Gas Utility, and Other Utility.

core_ferc1__yearly_operating_expenses_sched320#

Noticed any irregularities with this table? File a report.

Annual time series of operating and maintenance costs associated with producing electricity.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 320)

Primary key:

utility_id_ferc1, report_year, expense_type

Usage Warnings#

  • Some columns contain subtotals; use caution when choosing columns to aggregate.

  • FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_operating_expenses_sched320.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_operating_expenses_sched320.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc1__yearly_operating_expenses_sched320.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_operating_expenses_sched320.parquet"
)

Columns#

Field Name

Type

Description

dollar_value

number

Dollar value of reported income, expense, asset, or liability.

expense_type

string

The type of expense.

ferc_account

string

Actual FERC Account number (e.g. ‘359.1’) if available, or a PUDL assigned ID when FERC accounts have been split or combined in reporting.

record_id

string

Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.

report_year

integer

Four-digit year in which the data was reported.

row_type_xbrl

string

Indicates whether the value reported in the row is calculated, or uniquely reported within the table.

utility_id_ferc1

integer

PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.

utility_type

string

Listing of utility plant types. Examples include Electric Utility, Gas Utility, and Other Utility.

core_ferc1__yearly_operating_revenues_sched300#

Noticed any irregularities with this table? File a report.

Annual time series of utilities’ electric operating revenues.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 300)

Primary key:

utility_id_ferc1, report_year, revenue_type

Usage Warnings#

  • Some columns contain subtotals; use caution when choosing columns to aggregate.

  • FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

This table includes only the structured part of schedule 300. There are a number of revenue_type’s that do not have sales_mwh,or avg_customers_per_month provided, in which case these columnswill be NULL.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_operating_revenues_sched300.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_operating_revenues_sched300.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc1__yearly_operating_revenues_sched300.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_operating_revenues_sched300.parquet"
)

Columns#

Field Name

Type

Description

avg_customers_per_month

number

Average number of customers per month.

dollar_value

number

Dollar value of reported income, expense, asset, or liability.

ferc_account

string

Actual FERC Account number (e.g. ‘359.1’) if available, or a PUDL assigned ID when FERC accounts have been split or combined in reporting.

record_id

string

Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.

report_year

integer

Four-digit year in which the data was reported.

revenue_type

string

Label describing types of revenues.

row_type_xbrl

string

Indicates whether the value reported in the row is calculated, or uniquely reported within the table.

sales_mwh

number

Quantity of electricity sold in MWh.

utility_id_ferc1

integer

PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.

utility_type

string

Listing of utility plant types. Examples include Electric Utility, Gas Utility, and Other Utility.

core_ferc1__yearly_other_regulatory_assets_sched232#

Noticed any irregularities with this table? File a report.

Annual time series of utilities’ other regulatory assets, including rate order docket number.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 232)

Primary key:

This table has no primary key. Respondents are able to enter any number of assets across many rows. There are no IDs or set fields enforced in the original table.

Usage Warnings#

  • Contains columns which may appear categorical, but are actually free text.

  • Some columns contain subtotals; use caution when choosing columns to aggregate.

  • FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_other_regulatory_assets_sched232.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_other_regulatory_assets_sched232.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc1__yearly_other_regulatory_assets_sched232.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_other_regulatory_assets_sched232.parquet"
)

Columns#

Field Name

Type

Description

account_detail

string

Description of the account number credited from making debit adjustment to other regulatory liabilities.

additional_description

string

Additional free-form description column. In most instances, the values in this column are exactly the same as the description column,but it differs from the standard description enough that we kept both columns. This field did not exist prior to FERC publishing Form 1 as XBRL and thus is always null prior to 2021.

credits_written_off_recovered

number

Recovered amount of divestiture of other assets lacking physical substance resulting from the ratemaking actions of regulatory agencies.

debits

number

The increase (decrease) during the period in the value of other assets resulting from the ratemaking actions of regulatory agencies.

description

string

Free-form description and purpose of other regulatory assets.

ending_balance

number

Account balance at end of year.

report_year

integer

Four-digit year in which the data was reported.

starting_balance

number

Account balance at beginning of year.

utility_id_ferc1

integer

PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.

core_ferc1__yearly_other_regulatory_liabilities_sched278#

Noticed any irregularities with this table? File a report.

Annual time series of utilities’ other regulatory liabilities, including rate order docket number.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 278)

Primary key:

This table has no primary key. Respondents are able to enter any number of liabilities across many rows. There are no IDs or set fields enforced in the original table.

Usage Warnings#

  • Contains columns which may appear categorical, but are actually free text.

  • Some columns contain subtotals; use caution when choosing columns to aggregate.

  • FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_other_regulatory_liabilities_sched278.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_other_regulatory_liabilities_sched278.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc1__yearly_other_regulatory_liabilities_sched278.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_other_regulatory_liabilities_sched278.parquet"
)

Columns#

Field Name

Type

Description

account_detail

string

Description of the account number credited from making debit adjustment to other regulatory liabilities.

additional_description

string

Additional free-form description column. In most instances, the values in this column are exactly the same as the description column,but it differs from the standard description enough that we kept both columns. This field did not exist prior to FERC publishing Form 1 as XBRL and thus is always null prior to 2021.

decrease_in_other_regulatory_liabilities

number

The decrease during the reporting period of other regulatory liabilities.

description

string

Long human-readable description of the meaning of a code/label.

ending_balance

number

Account balance at end of year.

increase_in_other_regulatory_liabilities

number

The increase during the reporting period of other regulatory liabilities.

report_year

integer

Four-digit year in which the data was reported.

starting_balance

number

Account balance at beginning of year.

utility_id_ferc1

integer

PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.

core_ferc1__yearly_plant_in_service_sched204#

Noticed any irregularities with this table? File a report.

Annual time series of utilities’ balances and changes to FERC Electric Plant in Service accounts.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 204)

Primary key:

utility_id_ferc1, report_year, ferc_account_label

Usage Warnings#

  • Some columns contain subtotals; use caution when choosing columns to aggregate.

  • FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

Account numbers correspond to the FERC Uniform System of Accounts for Electric Plant, which is defined in Code of Federal Regulations (CFR) Title 18, Chapter I, Subchapter C, Part 101. (See e.g. https://www.law.cornell.edu/cfr/text/18/part-101). Each FERC respondent reports starting and ending balances for each account annually. Balances are organization wide, and are not broken down on a per-plant basis. End of year balance should equal beginning year balance plus the sum of additions, retirements, adjustments, and transfers.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_plant_in_service_sched204.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_plant_in_service_sched204.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc1__yearly_plant_in_service_sched204.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_plant_in_service_sched204.parquet"
)

Columns#

Field Name

Type

Description

additions

number

Cost of acquisition of items classified within the account.

adjustments

number

Cost of adjustments to the account.

ending_balance

number

Account balance at end of year.

ferc_account

string

Actual FERC Account number (e.g. ‘359.1’) if available, or a PUDL assigned ID when FERC accounts have been split or combined in reporting.

ferc_account_label

string

Long FERC account identifier derived from values reported in the XBRL taxonomies. May also refer to aggregations of individual FERC accounts.

plant_status

string

Utility plant financial status (in service, future, leased, total).

record_id

string

Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.

report_year

integer

Four-digit year in which the data was reported.

retirements

number

Cost of disposal of items classified within the account.

row_type_xbrl

string

Indicates whether the value reported in the row is calculated, or uniquely reported within the table.

starting_balance

number

Account balance at beginning of year.

transfers

number

Cost of transfers into (out of) the account.

utility_id_ferc1

integer

PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.

utility_type

string

Listing of utility plant types. Examples include Electric Utility, Gas Utility, and Other Utility.

core_ferc1__yearly_pumped_storage_plants_sched408#

Noticed any irregularities with this table? File a report.

Annual time series of plant statistics for hydroelectric pumped storage plants with an installed nameplate capacity of 10+ MW.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 408)

Primary key:

This table has no primary key. The best approximation for primary keys for this table would be: report_year, utility_id_ferc1, plant_name_ferc1. FERC does not publish plant IDs. The main identifying column is plant_name_ferc1 but that is a free-form string field and there are duplicate records.

Usage Warnings#

  • FERC does not restrict respondents to report unique and non-duplicative plant records. There are sporadic instances of respondents reporting portions of plants and then the total plant (ex: unit 1, unit 2 and total). Use caution when aggregating.

  • FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

As reported in Schedule 408 of FERC Form 1.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_pumped_storage_plants_sched408.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_pumped_storage_plants_sched408.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc1__yearly_pumped_storage_plants_sched408.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_pumped_storage_plants_sched408.parquet"
)

Columns#

Field Name

Type

Description

asset_retirement_cost

number

Asset retirement cost (USD).

avg_num_employees

number

The average number of employees assigned to each plant.

capacity_mw

number

Total installed (nameplate) capacity, in megawatts.

capex_equipment_electric

number

Cost of plant: accessory electric equipment (USD).

capex_equipment_misc

number

Cost of plant: miscellaneous power plant equipment (USD).

capex_facilities

number

Cost of plant: reservoirs, dams, and waterways (USD).

capex_land

number

Cost of plant: land and land rights (USD).

capex_per_mw

number

Cost of plant per megawatt of installed (nameplate) capacity. Nominal USD.

capex_roads

number

Cost of plant: roads, railroads, and bridges (USD).

capex_structures

number

Cost of plant: structures and improvements (USD).

capex_total

number

Total cost of plant (USD).

capex_wheels_turbines_generators

number

Cost of plant: water wheels, turbines, and generators (USD).

construction_type

string

Type of plant construction (‘outdoor’, ‘semioutdoor’, or ‘conventional’). Categorized by PUDL based on our best guess of intended value in FERC1 freeform strings.

construction_year

integer

Year the plant’s oldest still operational unit was built.

energy_used_for_pumping_mwh

number

Energy used for pumping, in megawatt-hours.

installation_year

integer

Year the plant’s most recently built unit was installed.

net_generation_mwh

number

Net electricity generation for the specified period in megawatt-hours (MWh).

net_load_mwh

number

Net output for load (net generation - energy used for pumping) in megawatt-hours.

opex_dams

number

Production expenses: maintenance of reservoirs, dams, and waterways (USD).

opex_electric

number

Production expenses: electric expenses (USD).

opex_engineering

number

Production expenses: maintenance, supervision, and engineering (USD).

opex_generation_misc

number

Production expenses: miscellaneous power generation expenses (USD).

opex_misc_plant

number

Production expenses: maintenance of miscellaneous hydraulic plant (USD).

opex_operations

number

Production expenses: operations, supervision, and engineering (USD).

opex_per_mwh

number

Total production expenses (USD per MWh generated).

opex_plant

number

Production expenses: maintenance of electric plant (USD).

opex_production_before_pumping

number

Total production expenses before pumping (USD).

opex_pumped_storage

number

Production expenses: pumped storage (USD).

opex_pumping

number

Production expenses: We are here to PUMP YOU UP! (USD).

opex_rents

number

Production expenses: rents (USD).

opex_structures

number

Production expenses: maintenance of structures (USD).

opex_total

number

Total production expenses, excluding fuel (USD).

opex_water_for_power

number

Production expenses: water for power (USD).

peak_demand_mw

number

Net peak demand for 60 minutes. Note: in some cases peak demand for other time periods may have been reported instead, if hourly peak demand was unavailable.

plant_capability_mw

number

Net plant capability in megawatts.

plant_hours_connected_while_generating

number

Hours the plant was connected to load while generating in the report year.

plant_name_ferc1

string

Name of the plant, as reported to FERC. This is a freeform string, not guaranteed to be consistent across references to the same plant.

project_num

integer

FERC Licensed Project Number.

record_id

string

Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.

report_year

integer

Four-digit year in which the data was reported.

utility_id_ferc1

integer

PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.

core_ferc1__yearly_purchased_power_and_exchanges_sched326#

Noticed any irregularities with this table? File a report.

Annual time series of purchased power (Account 555) including power exchanges (transactions involving a balancing of debits and credits for energy, capacity, etc.) and any settlements for imbalanced exchanges.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 326)

Primary key:

This table has no primary key. The primary key for this table would be record_id, however a small number of these values are duplicated in 2021, 2022, and 2024.

Usage Warnings#

  • Contains columns which may appear categorical, but are actually free text.

  • FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

This table has data about inter-utility power purchases. This includes how much electricity was purchased, how much it cost, and who it was purchased from. Unfortunately the field describing which other utility the power was being bought from (seller_name) is poorly standardized, making it difficult to correlate with other data.

Purchased Power is considered FERC Account 555 according to FERC’s Uniform System of Accounts. Reported on pages 326-327 of FERC Form 1.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_purchased_power_and_exchanges_sched326.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_purchased_power_and_exchanges_sched326.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc1__yearly_purchased_power_and_exchanges_sched326.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_purchased_power_and_exchanges_sched326.parquet"
)

Columns#

Field Name

Type

Description

billing_demand_mw

number

Monthly average billing demand (for requirements purchases, and any transactions involving demand charges). In megawatts.

coincident_peak_demand_mw

number

Average monthly coincident peak (CP) demand (for requirements purchases, and any transactions involving demand charges). Monthly CP demand is the metered demand during the hour (60-minute integration) in which the supplier’s system reaches its monthly peak. In megawatts.

delivered_mwh

number

Gross megawatt-hours delivered in power exchanges and used as the basis for settlement.

demand_charges

number

Demand charges (USD).

energy_charges

number

Energy charges (USD).

non_coincident_peak_demand_mw

number

Average monthly non-coincident peak (NCP) demand (for requirements purhcases, and any transactions involving demand charges). Monthly NCP demand is the maximum metered hourly (60-minute integration) demand in a month. In megawatts.

other_charges

number

Other charges, including out-of-period adjustments (USD).

purchase_type_code

string

Categorization based on the original contractual terms and conditions of the service. Must be one of ‘requirements’, ‘long_firm’, ‘intermediate_firm’, ‘short_firm’, ‘long_unit’, ‘intermediate_unit’, ‘electricity_exchange’, ‘other_service’, or ‘adjustment’. Requirements service is ongoing high reliability service, with load integrated into system resource planning. ‘Long term’ means 5+ years. ‘Intermediate term’ is 1-5 years. ‘Short term’ is less than 1 year. ‘Firm’ means not interruptible for economic reasons. ‘unit’ indicates service from a particular designated generating unit. ‘exchange’ is an in-kind transaction.

purchased_mwh

number

Megawatt-hours shown on bills rendered to the respondent. Includes both electricity purchased for storage and non-storage purposes, which were lumped together prior to 2021.

purchased_other_than_storage_mwh

number

Number of megawatt hours purchased during the period for other than energy storage.

purchased_storage_mwh

number

Number of megawatt hours purchased during the period for energy storage.

received_mwh

number

Gross megawatt-hours received in power exchanges and used as the basis for settlement.

record_id

string

Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.

report_year

integer

Four-digit year in which the data was reported.

seller_name

string

Name of the seller, or the other party in an exchange transaction.

tariff

string

FERC Rate Schedule Number or Tariff. (Note: may be incomplete if originally reported on multiple lines.)

total_settlement

number

Sum of demand, energy, and other charges (USD). For power exchanges, the settlement amount for the net receipt of energy. If more energy was delivered than received, this amount is negative.

utility_id_ferc1

integer

PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.

core_ferc1__yearly_retained_earnings_sched118#

Noticed any irregularities with this table? File a report.

Annual time series of utilities’ statements of retained earnings.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 118)

Primary key:

utility_id_ferc1, report_year, earnings_type

Usage Warnings#

  • Some columns contain subtotals; use caution when choosing columns to aggregate.

  • FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_retained_earnings_sched118.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_retained_earnings_sched118.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc1__yearly_retained_earnings_sched118.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_retained_earnings_sched118.parquet"
)

Columns#

Field Name

Type

Description

balance

string

Indication of whether a column is a credit or debit, as reported in the XBRL taxonomy.

earnings_type

string

Label describing types of earnings.

ending_balance

number

Account balance at end of year.

ferc_account

string

Actual FERC Account number (e.g. ‘359.1’) if available, or a PUDL assigned ID when FERC accounts have been split or combined in reporting.

record_id

string

Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.

report_year

integer

Four-digit year in which the data was reported.

row_type_xbrl

string

Indicates whether the value reported in the row is calculated, or uniquely reported within the table.

starting_balance

number

Account balance at beginning of year.

utility_id_ferc1

integer

PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.

utility_type

string

Listing of utility plant types. Examples include Electric Utility, Gas Utility, and Other Utility.

core_ferc1__yearly_sales_by_rate_schedules_sched304#

Noticed any irregularities with this table? File a report.

Annual time series of utilities’ electric sales from all rate schedules in effect throughout the year.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 304)

Primary key:

This table has no primary key.

Usage Warnings#

  • Some columns contain subtotals; use caution when choosing columns to aggregate.

  • Values in rate_schedule_description are free-form strings.

  • Data prior to 2021 does not include information in columns: rate_schedule_type and billing_status.

  • Units of revenue_per_kwh are suspected to include a mix of dollars and possibly cents.

  • FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

The pre-2021 data in this table (extracted from FoxProDB vs. XBRL) is extremely unstructured. Where the post-2020 data (from XBRL) sorts the data into rate schedule types: residential, industrial, commercial, public_lighting, public_authorities, railroads, interdepartmental, provision_for_rate_refund, commercial_and_industrial, total, and billing status: billed, unbilled, total, the pre-2021 data stuffs all of that information (if you’re lucky) into the rate_schedule_description column. There’s no point trying to parse through the pre 2021 rate_schedule_description column en masse because it’s just too messy. The contents of rate_schedule_description often contain numbers and acronyms that have little to no meaning out of context. The table is structured somewhat like the FERC1 small generators table with headings about rate structure type also embedded into the rate_schedule_description column. To all who dare, beware.

This table is a combination of one pre-2021 (DBF) table and nine post-2020 (XBRL) tables–one for each rate schedule type plus totals–hence increase in data clarity post-2020. The rate_schedule_type and billing_status columns are only relevant for post-2020 data as they can be reliably parsed from each of the tables and incorporated into columns. The rate_schedule_description is supposed to contain sub-rate_schedule_type names for charges (Ex: Residential 1, Residential 2, etc.). However, the pre-2021 data contains a little bit of everything (or nothing) and the post-2020 has some totals or wonky data thrown in. That’s to say, even when working with post-2020 data, be wary of aggregating the data. That’s what the “total” rows are for.

The values that come from from the totals table are marked with the string “total” in the rate_schedule_description column. The totals table is a product of the transition to XBRL, so these distinguishable totals are only available for data post-2020 (otherwise you could try keyword searching for “total” in rate_schedule_description). The total table contains two types of totals, the utility totals across all rate schedules in a given year (marked with rate_schedule_description = “total” and rate_schedule_type = “total”) and each of the utility’s individual rate schedule totals in a given year (marked with rate_schedule_description = “total” and rate_schedule_type = “residential” or any other rate schedule type).

The rate schedule based XBRL tables only report billed values whereas the total tables report billed, unbilled, and total values. (See the column description for more info on the difference between billed and unbilled). This is important to consider if you’re endeavoring to compare the subtotal values with the total values. We have not attempted to fix or verify any subtotals or totals that don’t add up.

Another important note is the possibility of unit discrepancies in certain columns. The revenue_per_kwh column does not specify reporting units, and closer inspection of the data reveals two clear peaks approximate two orders of magnitude apart. This indicates that values may be reported in both dollars and cents. However, because the price of energy per kwh varies so much regionally, we cannot guarantee which is which and have not put any cleaning mechanisms in place to account for this.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_sales_by_rate_schedules_sched304.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_sales_by_rate_schedules_sched304.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc1__yearly_sales_by_rate_schedules_sched304.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_sales_by_rate_schedules_sched304.parquet"
)

Columns#

Field Name

Type

Description

avg_customers_per_month

number

Average number of customers per month.

billing_status

string

Whether an amount is billed, unbilled, or both. Billed amounts pertain to the exchange of energy and unbilled amounts pertain to other sources of revenue such as contracts with peaker plants to keep them on standby or charging rent to host cell antennas on transmission towers.

dollar_value

number

Dollar value of reported income, expense, asset, or liability.

kwh_per_customer

number

kWh per customer.

rate_schedule_description

string

Free-form description of what the rate schedule name is. Not standardized. Often a sub-category of rate_schedule_type.

rate_schedule_type

string

Categorization of rate schedule type.

record_id

string

Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.

report_year

integer

Four-digit year in which the data was reported.

revenue_per_kwh

number

The amount of revenue per kWh by rate schedule acquired in the given report year.

sales_mwh

number

Quantity of electricity sold in MWh.

utility_id_ferc1

integer

PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.

core_ferc1__yearly_small_plants_sched410#

Noticed any irregularities with this table? File a report.

Annual time series of plant statistics for internal combustion plants, gas turbine-plants, conventional hydro plants, and pumped storage plants with less than 10 MW installed nameplate capacity and steam plants with less than 25 MW installed nameplate capacity.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 410)

Primary key:

This table has no primary key. The best approximation for primary keys for this table would be: report_year, utility_id_ferc1, plant_name_ferc1. FERC does not publish plant IDs. The main identifying column is plant_name_ferc1 but that is a free-form string field and there are duplicate records.

Usage Warnings#

  • FERC does not restrict respondents to report unique and non-duplicative plant records. There are sporadic instances of respondents reporting portions of plants and then the total plant (ex: unit 1, unit 2 and total). Use caution when aggregating.

  • FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

As reported on FERC Form 1 Schedule 410 (pages 410-411) and extracted from the FERC Visual FoxPro and XBRL. See our pudl.extract.ferc1.TABLE_NAME_MAP_FERC1 for links to the raw tables.

The raw version of this table is more like a digitized PDF than an actual data table. The rows contain lots of information in addition to what the columns might suggest. For instance, a single column may contain header rows, note rows, and total rows. This extraneous information is useful, but it prevents proper analysis when mixed in with the rest of the values data in the column. We employ a couple of data transformations to extract these rows from the data and preserve some of the information they contain (fuel type, plant type, FERC license, or general notes about the plant) in separate columns.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_small_plants_sched410.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_small_plants_sched410.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc1__yearly_small_plants_sched410.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_small_plants_sched410.parquet"
)

Columns#

Field Name

Type

Description

capacity_mw

number

Total installed (nameplate) capacity, in megawatts.

capex_per_mw

number

Cost of plant per megawatt of installed (nameplate) capacity. Nominal USD.

capex_total

number

Total cost of plant (USD).

construction_year

integer

Year the plant’s oldest still operational unit was built.

fuel_cost_per_mmbtu

number

Average fuel cost per MMBTU of heat content in nominal USD.

fuel_type

string

Type of fuel.

license_id_ferc1

integer

FERC issued operating license ID for the facility, if available. This value is extracted from the original plant name where possible.

net_generation_mwh

number

Net electricity generation for the specified period in megawatt-hours (MWh).

opex_fuel

number

Production expenses: fuel (USD).

opex_maintenance

number

Production expenses: Maintenance (USD).

opex_operations

number

Production expenses: operations, supervision, and engineering (USD).

peak_demand_mw

number

Net peak demand for 60 minutes. Note: in some cases peak demand for other time periods may have been reported instead, if hourly peak demand was unavailable.

plant_name_ferc1

string

Name of the plant, as reported to FERC. This is a freeform string, not guaranteed to be consistent across references to the same plant.

plant_type

string

Type of plant.

record_id

string

Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.

report_year

integer

Four-digit year in which the data was reported.

utility_id_ferc1

integer

PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.

core_ferc1__yearly_steam_plants_fuel_sched402#

Noticed any irregularities with this table? File a report.

Annual time series of fuel cost and quantity for steam plants with a capacity of 25+ MW, internal combustion and gas-turbine plants of 10+ MW, and all nuclear plants.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 402)

Primary key:

This table has no primary key.

Usage Warnings#

  • The fuel_type_code_pudl is inferred from a free-form string field.

  • FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

This table is a subset of the steam plant table reported on page 402 of FERC Form 1.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_steam_plants_fuel_sched402.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_steam_plants_fuel_sched402.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc1__yearly_steam_plants_fuel_sched402.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_steam_plants_fuel_sched402.parquet"
)

Columns#

Field Name

Type

Description

fuel_consumed_units

number

Consumption of the fuel type in physical unit. Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production.

fuel_cost_per_mmbtu

number

Average fuel cost per MMBTU of heat content in nominal USD.

fuel_cost_per_unit_burned

number

Average cost of fuel consumed in the report year per reported fuel unit (USD).

fuel_cost_per_unit_delivered

number

Average cost of fuel delivered in the report year per reported fuel unit (USD).

fuel_mmbtu_per_unit

number

Heat content of the fuel in millions of Btus per physical unit.

fuel_type_code_pudl

string

Simplified fuel type code used in PUDL

fuel_units

string

Reported unit of measure for fuel.

plant_name_ferc1

string

Name of the plant, as reported to FERC. This is a freeform string, not guaranteed to be consistent across references to the same plant.

record_id

string

Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.

report_year

integer

Four-digit year in which the data was reported.

utility_id_ferc1

integer

PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.

core_ferc1__yearly_steam_plants_sched402#

Noticed any irregularities with this table? File a report.

Annual time series of plant statistics for steam plants with a capacity of 25+ MW, internal combustion and gas-turbine plants of 10+ MW, and all nuclear plants.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 402)

Primary key:

This table has no primary key. The best approximation for primary keys for this table would be: report_year, utility_id_ferc1, plant_name_ferc1. FERC does not publish plant IDs. The main identifying column is plant_name_ferc1 but that is a free-form string field and there are duplicate records.

Usage Warnings#

  • FERC does not restrict respondents to report unique and non-duplicative plant records. There are sporadic instances of respondents reporting portions of plants and then the total plant (ex: unit 1, unit 2 and total). Use caution when aggregating.

  • FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_steam_plants_sched402.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_steam_plants_sched402.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc1__yearly_steam_plants_sched402.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_steam_plants_sched402.parquet"
)

Columns#

Field Name

Type

Description

asset_retirement_cost

number

Asset retirement cost (USD).

avg_num_employees

number

The average number of employees assigned to each plant.

capacity_mw

number

Total installed (nameplate) capacity, in megawatts.

capex_equipment

number

Cost of plant: equipment (USD).

capex_land

number

Cost of plant: land and land rights (USD).

capex_per_mw

number

Cost of plant per megawatt of installed (nameplate) capacity. Nominal USD.

capex_structures

number

Cost of plant: structures and improvements (USD).

capex_total

number

Total cost of plant (USD).

construction_type

string

Type of plant construction (‘outdoor’, ‘semioutdoor’, or ‘conventional’). Categorized by PUDL based on our best guess of intended value in FERC1 freeform strings.

construction_year

integer

Year the plant’s oldest still operational unit was built.

installation_year

integer

Year the plant’s most recently built unit was installed.

net_generation_mwh

number

Net electricity generation for the specified period in megawatt-hours (MWh).

not_water_limited_capacity_mw

number

Plant capacity in MW when not limited by condenser water.

opex_allowances

number

Allowances.

opex_boiler

number

Maintenance of boiler (or reactor) plant.

opex_coolants

number

Cost of coolants and water (nuclear plants only)

opex_electric

number

Production expenses: electric expenses (USD).

opex_engineering

number

Production expenses: maintenance, supervision, and engineering (USD).

opex_fuel

number

Production expenses: fuel (USD).

opex_misc_power

number

Miscellaneous steam (or nuclear) expenses.

opex_misc_steam

number

Maintenance of miscellaneous steam (or nuclear) plant.

opex_operations

number

Production expenses: operations, supervision, and engineering (USD).

opex_per_mwh

number

Total production expenses (USD per MWh generated).

opex_plants

number

Maintenance of electrical plant.

opex_production_total

number

Total operating expenses.

opex_rents

number

Production expenses: rents (USD).

opex_steam

number

Steam expenses.

opex_steam_other

number

Steam from other sources.

opex_structures

number

Production expenses: maintenance of structures (USD).

opex_transfer

number

Steam transferred (Credit).

peak_demand_mw

number

Net peak demand for 60 minutes. Note: in some cases peak demand for other time periods may have been reported instead, if hourly peak demand was unavailable.

plant_capability_mw

number

Net plant capability in megawatts.

plant_hours_connected_while_generating

number

Hours the plant was connected to load while generating in the report year.

plant_name_ferc1

string

Name of the plant, as reported to FERC. This is a freeform string, not guaranteed to be consistent across references to the same plant.

plant_type

string

Type of plant.

record_id

string

Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.

report_year

integer

Four-digit year in which the data was reported.

utility_id_ferc1

integer

PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.

water_limited_capacity_mw

number

Plant capacity in MW when limited by condenser water.

core_ferc1__yearly_transmission_lines_sched422#

Noticed any irregularities with this table? File a report.

Annual time series of statistics about transmission lines.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 422)

Primary key:

This table has no primary key. Each record of this table is supposed to represent one stretch of a transmission line, but there are no IDs and many nulls in the fields which would nominally distinguish unique transmission lines.

Usage Warnings#

  • Contains columns which may appear categorical, but are actually free text.

  • Some columns contain subtotals; use caution when choosing columns to aggregate.

  • FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

Information describing transmission lines, the cost of lines, annual operating and capital expenses, etc. This table includes transmission lines having nominal voltage of 132 kilovolts or greater. Transmission lines below these voltages are required to be reported in group totals only for each voltage.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_transmission_lines_sched422.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_transmission_lines_sched422.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc1__yearly_transmission_lines_sched422.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_transmission_lines_sched422.parquet"
)

Columns#

Field Name

Type

Description

capex_land

number

Cost of Land and land rights for the transmission line.

capex_other

number

Construction and other costs for the transmission line.

capex_total

number

Total costs for the transmission line.

conductor_size_and_material

string

Size of transmission conductor and material of the transmission line.

designed_voltage_kv

number

Manufactured (Designed) voltage, expressed in kilo-volts, for three-phase 60 cycle alternative current transmission lines

end_point

string

The end point of a transmission line.

num_transmission_circuits

integer

Number of circuits in a transmission line.

operating_voltage_kv

number

The operating voltage, expressed kilo-volts, for three-phase 60 cycle alternative current transmission lines.

opex_maintenance

number

Maintenance expenses for the transmission line.

opex_operations

number

Operating expenses for the transmission line.

opex_rents

number

Rent expenses for the transmission line.

opex_total

number

Overall expenses for the transmission line.

record_id

string

Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.

report_year

integer

Four-digit year in which the data was reported.

start_point

string

The starting point of a transmission line.

supporting_structure_type

string

Supporting structure of the transmission line.

transmission_line_and_structures_length_miles

number

Length (in pole miles or circuit miles (if transmission lines are underground)) for lines that are agrregated with other lines / structures (whose cost are aggregated and combined with other structures).

transmission_line_length_miles

number

Length (in pole miles or circuit miles (if transmission lines are underground)) for lines that are stand alone structures (whose cost are reported on a stand-alone basis).

utility_id_ferc1

integer

PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.

core_ferc1__yearly_utility_plant_summary_sched200#

Noticed any irregularities with this table? File a report.

Annual time series of utility plant and accumulated provisions for depreciation, amortization and depletion of utility plant assets.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

FERC Form 1 – Annual Report of Major Electric Utilities (Schedule 200)

Primary key:

utility_id_ferc1, report_year, utility_type, utility_plant_asset_type

Usage Warnings#

  • Some columns contain subtotals; use caution when choosing columns to aggregate.

  • FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_utility_plant_summary_sched200.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_utility_plant_summary_sched200.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc1__yearly_utility_plant_summary_sched200.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc1__yearly_utility_plant_summary_sched200.parquet"
)

Columns#

Field Name

Type

Description

ending_balance

number

Account balance at end of year.

record_id

string

Identifier indicating original FERC Form 1 source record. format: {table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}. Unique within FERC Form 1 DB tables which are not row-mapped.

report_year

integer

Four-digit year in which the data was reported.

row_type_xbrl

string

Indicates whether the value reported in the row is calculated, or uniquely reported within the table.

utility_id_ferc1

integer

PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.

utility_plant_asset_type

string

Type of utility plant asset reported in the core_ferc1__yearly_utility_plant_summary_sched200 table. Assets include those leased to others, held for future use, construction work-in-progress and details of accumulated depreciation.

utility_type

string

Listing of utility plant types. Examples include Electric Utility, Gas Utility, and Other Utility.

utility_type_other

string

Freeform description of type of utility reported in one of the other three other utility_type sections in the core_ferc1__yearly_utility_plant_summary_sched200 table. This field is reported only in the DBF reporting years (1994-2020).

core_ferc714__hourly_planning_area_demand#

Noticed any irregularities with this table? File a report.

Hourly time series of electricity demand by planning area.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

FERC Form 714 – Annual Electric Balancing Authority Area and Planning Area Report (Part III, Schedule 2a)

Primary key:

respondent_id_ferc714, datetime_utc

Usage Warnings#

  • The datetime_utc timestamps have been cleaned due to inconsistent datetime reporting. See below for additional details.

  • FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

This table includes data from the pre-2021 CSV raw source as well as the newer 2021 through present XBRL raw source.

This table includes three respondent ID columns: one from the CSV raw source, one from the XBRL raw source and another that is PUDL-derived that links those two source ID’s together. This table has filled in source IDs for all records so you can select the full timeseries for a given respondent from any of these three IDs.

An important caveat to note is that there was some cleaning done to the datetime_utc timestamps. The Form 714 includes sparse documentation for respondents for how to interpret timestamps - the form asks respondents to provide 24 instances of hourly demand for each day. The form is labeled with hour 1-24. There is no indication if hour 1 begins at midnight.

The XBRL data contained several formats of timestamps. Most records corresponding to hour 1 of the Form have a timestamp with hour 1 as T1. About two thirds of the records in the hour 24 location of the form have a timestamp with an hour reported as T24 while the remaining third report this as T00 of the next day. T24 is not a valid format for the hour of a datetime, so we convert these T24 hours into T00 of the next day. A smaller subset of the respondents reports the 24th hour as the last second of the day - we also convert these records to the T00 of the next day.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc714__hourly_planning_area_demand.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc714__hourly_planning_area_demand.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc714__hourly_planning_area_demand.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc714__hourly_planning_area_demand.parquet"
)

Columns#

Field Name

Type

Description

datetime_utc

datetime

Date and time converted to Coordinated Universal Time (UTC).

demand_mwh

number

Electricity demand (energy) within a given timeframe.

report_date

date

Date reported.

respondent_id_ferc714

integer

PUDL-assigned identifying a respondent to FERC Form 714. This ID associates natively reported respondent IDs from the original CSV and XBRL data sources.

respondent_id_ferc714_csv

integer

FERC Form 714 respondent ID from CSV reported data - published from years: 2006-2020. This ID is linked to the newer years of reported XBRL data through the PUDL-assigned respondent_id_ferc714 ID. This ID was originally reported as respondent_id. Note that this ID does not correspond to FERC respondent IDs from other forms.

respondent_id_ferc714_xbrl

string

FERC Form 714 respondent ID from XBRL reported data - published from years: 2021-present. This ID is linked to the older years of reported CSV data through the PUDL-assigned respondent_id_ferc714 ID. This ID was originally reported as entity_id. Note that this ID does not correspond to FERC respondent IDs from other forms.

timezone

string

IANA timezone name

core_ferc714__respondent_id#

Noticed any irregularities with this table? File a report.

Respondent identification.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

FERC Form 714 – Annual Electric Balancing Authority Area and Planning Area Report (Part I, Schedule 1)

Primary key:

respondent_id_ferc714

Usage Warnings#

  • FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc714__respondent_id.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc714__respondent_id.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc714__respondent_id.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc714__respondent_id.parquet"
)

Columns#

Field Name

Type

Description

eia_code

integer

EIA utility or balancing area authority ID associated with this FERC Form 714 respondent. Note that many utilities are also balancing authorities and in many cases EIA uses the same integer ID to identify a utility in its role as a balancing authority AND as a utility, but there is no requirement that these IDs be the same, and in a number of cases they are different.

respondent_id_ferc714

integer

PUDL-assigned identifying a respondent to FERC Form 714. This ID associates natively reported respondent IDs from the original CSV and XBRL data sources.

respondent_id_ferc714_csv

integer

FERC Form 714 respondent ID from CSV reported data - published from years: 2006-2020. This ID is linked to the newer years of reported XBRL data through the PUDL-assigned respondent_id_ferc714 ID. This ID was originally reported as respondent_id. Note that this ID does not correspond to FERC respondent IDs from other forms.

respondent_id_ferc714_xbrl

string

FERC Form 714 respondent ID from XBRL reported data - published from years: 2021-present. This ID is linked to the older years of reported CSV data through the PUDL-assigned respondent_id_ferc714 ID. This ID was originally reported as entity_id. Note that this ID does not correspond to FERC respondent IDs from other forms.

respondent_name_ferc714

string

Name of the utility, balancing area authority, or planning authority responding to FERC Form 714.

core_ferc714__yearly_planning_area_demand_forecast#

Noticed any irregularities with this table? File a report.

Annual time series of 10-year forecasted summer and winter peak demand and annual net energy per planning area.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

FERC Form 714 – Annual Electric Balancing Authority Area and Planning Area Report (Part III, Schedule 2b)

Primary key:

respondent_id_ferc714, report_year, forecast_year

Usage Warnings#

  • This table includes data from the pre-2021 CSV raw source as well as the newer 2021 through present XBRL raw source. We created the respondent_id_ferc714 field to blend disparate IDs from the CSV and XBRL data over time. See the core_ferc714__respondent_id table for links to the original source IDs.

  • There is a small handful of respondents (~11) that report more than 10 years and an even smaller handful that report less than 10 (~9).

  • FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

This table contains forecasted net demand (MWh) as well as summer and winter peak demand (MW) for the next ten years after after the report_year.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc714__yearly_planning_area_demand_forecast.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc714__yearly_planning_area_demand_forecast.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc714__yearly_planning_area_demand_forecast.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc714__yearly_planning_area_demand_forecast.parquet"
)

Columns#

Field Name

Type

Description

forecast_year

integer

Four-digit year that applies to a particular forecasted value.

net_demand_forecast_mwh

number

Net forecasted electricity demand for the specific period in megawatt-hours (MWh).

report_year

integer

Four-digit year in which the data was reported.

respondent_id_ferc714

integer

PUDL-assigned identifying a respondent to FERC Form 714. This ID associates natively reported respondent IDs from the original CSV and XBRL data sources.

summer_peak_demand_forecast_mw

number

The maximum forecasted hourly sumemr load (for the months of June through September).

winter_peak_demand_forecast_mw

number

The maximum forecasted hourly winter load (for the months of January through March).

core_ferc__codes_accounts#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for account numbers from the FERC Uniform System of Accounts for Electric Plant.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

FERC – Mix of multiple FERC Forms

Primary key:

ferc_account_id

Usage Warnings#

  • FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

These codes are defined in Code of Federal Regulations (CFR) Title 18, Chapter I, Subchapter C, Part 101. (See e.g. https://www.law.cornell.edu/cfr/text/18/part-101).

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc__codes_accounts.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc__codes_accounts.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc__codes_accounts.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc__codes_accounts.parquet"
)

Columns#

Field Name

Type

Description

ferc_account_description

string

Description of the FERC account.

ferc_account_id

string

Account identifier from FERC’s Uniform System of Accounts for Electric Plant. Includes higher level labeled categories.

core_ferc__entity_companies#

Noticed any irregularities with this table? File a report.

Entity table containing static information about the companies that submit required filings to FERC.

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

FERC – Mix of multiple FERC Forms

Primary key:

company_id_ferc

Usage Warnings#

  • FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc__entity_companies.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc__entity_companies.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_ferc__entity_companies.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_ferc__entity_companies.parquet"
)

Columns#

Field Name

Type

Description

address_2

string

Second line of the address.

city

string

Name of the city.

company_id_ferc

string

The Company Identifier (CID) obtained through FERC’s Company Registration system.

company_name

string

The name of the FERC-reporting organization or company.

company_website

string

The website URL of the company, which can provide additional information about the organization.

program

string

The specific program or initiative associated with the FERC organization, which can provide context for the company’s activities.

state

string

Two letter US state abbreviation.

street_address

string

Physical street address.

zip_code

string

Five digit US Zip Code.

zip_code_4

string

Four digit US Zip Code suffix.

core_ferceqr__contracts#

Noticed any irregularities with this table? File a report.

Contains information about contracts between companies selling and buying electricity market products.

Most-recent data:

2026q1

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

FERC Form 920 – Electric Quarterly Report (EQR)

Primary key:

This table has no primary key. The FERC EQR contracts table has no natural primary key.

Usage Warnings#

  • This table is experimental and/or a work in progress and may change in the future.

  • FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Access methods#

Note

This table is partitioned into multiple Parquet files which can be treated as a single Parquet dataset.

import pandas as pd
# Select all Parquet files under this bucket prefix.
df = pd.read_parquet(
    "s3://pudl.catalyst.coop/ferceqr/core_ferceqr__contracts/",
    dtype_backend="pyarrow",
)
import polars as pl
# Select all Parquet files under this bucket prefix.
df = pl.read_parquet(
    "s3://pudl.catalyst.coop/ferceqr/core_ferceqr__contracts/",
    storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
-- Use a wildcard to select all parquet files with this bucket prefix.
SELECT * FROM 's3://pudl.catalyst.coop/ferceqr/core_ferceqr__contracts/*.parquet';
library(arrow)
library(dplyr)

# Establish S3 connection
bucket <- s3_bucket("pudl.catalyst.coop/ferceqr/core_ferceqr__contracts")

# Read all parquet files in that prefix
df <- open_dataset(bucket) |> collect()

Columns#

Field Name

Type

Description

actual_termination_date

date

The date the contract actually terminates.

begin_date

datetime

First date and time for the sale of the product at the rate specified.

class_name

string

F - Firm: For transmission sales, a service or product that always has priority over non-firm service. For power sales, a service or product that is not interruptible for economic reasons. NF - Non-firm: For transmission sales, a service that is reserved and/or scheduled on an as-available basis and is subject to curtailment or interruption at a lesser priority compared to Firm service. For an energy sale, a service or product for which delivery or receipt of the energy may be interrupted for any reason or no reason, without liability on the part of either the buyer or seller. UP - Unit Power Sale: Designates a dedicated sale of energy and capacity from one or more than one specified generation unit(s). N/A: To be used only when the other available Class Names do not apply.

commencement_date_of_contract_term

date

The date the terms of the contract reported in fields 18, 23 and 25 through 44 (as defined in the data dictionary) became effective. If those terms became effective on multiple dates (i.e., due to one or more amendments), the date to be reported in this field is the date the most recent amendment became effective. If the contract or the most recent reported amendment does not have an effective date, the date when service began pursuant to the contract or most recent reported amendment may be used. If the terms reported in fields 18, 23 and 25 through 44 have not been amended since January 1, 2009, the initial date the contract became effective (or absent an effective date the initial date when service began) may be used.

contract_affiliate

boolean

If True, this field indicates the customer is an affiliate of the seller. The customer is an affiliate if it controls, is controlled by, or is under common control with the seller. This includes a division that operates as a functional unit. A customer of a seller who is an Exempt Wholesale Generator may be defined as an affiliate under the Public Utility Holding Company Act and the FPA.

contract_execution_date

date

The date the contract was signed. If the parties signed on different dates, use the most recent date signed.

contract_service_agreement_id

string

Unique identifier given to each service agreement that can be used by the Seller to produce the agreement, if requested. The identifier may be the number assigned by FERC for those service agreements that have been filed with and accepted by the Commission, or it may be generated as part of an internal identification system.

contract_termination_date

date

The date that the contract expires.

contract_unique_id

string

An identifier beginning with the letter “C” and followed by a number (e.g., ‘C1’, ‘C2’) used to designate a record containing contract information. Note that these contract IDs may only be unique within the context of a particular seller, seller-buyer pair, or timeframe. FERC documentation of the field is limited.

customer_company_name

string

The name of the purchaser of contract products and services. Unlike the seller_company_name this name is not guaranteed to match a name in the core_ferceqr__quarterly_identity name. In addition, the same customer company may appear with different names in different contracts and transactions, since this field is an unconstrained string chosen by the seller.

end_date

datetime

Last date and time for the sale of the product at the rate specified.

extension_provision_description

string

Description of terms that provide for the continuation of the contract.

ferc_tariff_reference

string

The FERC tariff reference cites the document that specifies the terms and conditions under which a Seller is authorized to make transmission sales, power sales or sales of related jurisdictional services at cost-based rates or at market-based rates. If the sales are market-based, the tariff that is specified in the FERC order granting the Seller Market Based Rate Authority must be listed. If a non-public utility does not have a FERC Tariff Reference, it should enter “NPU” for the FERC Tariff Reference.

increment_name

string

5: Terms of the contract (if specifically noted in the contract) set for more than 0 minutes and less than or equal to 5 minutes (> 0 and ≤ 5 minutes). 15: Terms of the contract (if specifically noted in the contract) set for more than 5 minutes and less than or equal to 15 minutes (> 5 and ≤ 15 minutes). H: Terms of the contract (if specifically noted in the contract) set for more than 15 minutes and less than or equal to 6 hours (> 15 minutes and ≤ 6 hours). D: Terms of the contract (if specifically noted in the contract) set for more than 6 and up to 60 hours (> 6 and ≤ 60 hours). W: Terms of the contract (if specifically noted in the contract) set for over 60 hours and up to 168 hours (> 60 and ≤ 168 hours). M: Terms of the contract (if specifically noted in the contract) set for more than 168 hours up to, but not including, one year (> 168 hours and < 1 year). Y: Terms of the contract (if specifically noted in the contract) set for one year or more (≥ 1 year). N/A: Terms of the contract do not specify an increment.

increment_peaking_name

string

FP: The product described may be sold during those hours designated as on-peak and off-peak at the point of delivery. OP: The product described may be sold only during those hours designated as off-peak at the point of delivery. P: The product described may be sold only during those hours designated as on-peak at the point of delivery. N/A: To be used only when the increment peaking name is not specified in the contract.

point_of_delivery_balancing_authority

string

The registered Balancing Authority (formerly called NERC Control Area) where a jurisdictional product is delivered and/or service ends for a transmission or transmission-related jurisdictional sale. The Balancing Authority will be identified with the abbreviation used in OASIS applications. If delivery occurs at the interconnection of two control areas, the control area that the product is entering should be used. If delivery occurs at a trading hub, the term ‘Hub’ should be used.

point_of_delivery_specific_location

string

The specific location at which the product is delivered if designated in the contract. If receipt occurs at a trading hub, a standardized hub name must be used.

point_of_receipt_balancing_authority

string

The registered Balancing Authority (formerly called NERC Control Area) where service begins for a transmission or transmission-related jurisdictional sale. The Balancing Authority will be identified with the abbreviation used in OASIS applications. If receipt occurs at a trading hub, the term ‘Hub’ should be used.

point_of_receipt_specific_location

string

The specific location at which the product is received if designated in the contract. If receipt occurs at a trading hub, a standardized hub name must be used. If more points of receipt are listed in the contract than can fit into the 50 character space, a description of the collection of points may be used. ‘Various’ alone, is unacceptable unless the contract itself uses that terminology.

product_name

string

Description of product being offered. Note that allowed values differ slightly from those in core_ferceqr__transactions. BLACK START SERVICE: Service available after a system-wide blackout where a generator participates in system restoration activities without the availability of an outside electric supply (Ancillary Service). CAPACITY: A quantity of demand that is charged on a $/KW or $/MW basis. CUSTOMER CHARGE: Fixed contractual charges assessed on a per customer basis that could include billing service. DIRECT ASSIGNMENT FACILITIES CHARGE: Charges for facilities or portions of facilities that are constructed or used for the sole use/benefit of a particular customer. EMERGENCY ENERGY: Contractual provisions to supply energy or capacity to another entity during critical situations. ENERGY: A quantity of electricity that is sold or transmitted over a period of time. ENERGY IMBALANCE: Service provided when a difference occurs between the scheduled and the actual delivery of energy to a load obligation (Ancillary Service). For Contracts, reported if the contract provides for sale of the product. For Transactions, sales by third-party providers (i.e., non-transmission function) are reported. EXCHANGE: Transaction whereby the receiver accepts delivery of energy for a supplier’s account and returns energy at times, rates, and in amounts as mutually agreed if the receiver is not an RTO/ISO. FUEL CHARGE: Charge based on the cost or amount of fuel used for generation. GENERATOR IMBALANCE: Service provided when a difference occurs between the output of a generator located in the Transmission Provider’s Control Area and a delivery schedule from that generator to (1) another Control Area or (2) a load within the Transmission Provider’s Control Area over a single hour (Ancillary Service). For Contracts, reported if the contract provides for sale of the product. For Transactions, sales by third-party providers (i.e., non-transmission function) are reported. GRANDFATHERED BUNDLED: Services provided for bundled transmission, ancillary services and energy under contracts effective prior to Order No. 888’s OATTs. INTERCONNECTION AGREEMENT: Contract that provides the terms and conditions for a generator, distribution system owner, transmission owner, transmission provider, or transmission system to physically connect to a transmission system or distribution system. MEMBERSHIP AGREEMENT: Agreement to participate and be subject to rules of a system operator. MUST RUN AGREEMENT: An agreement that requires a unit to run. NEGOTIATED-RATE TRANSMISSION: Transmission performed under a negotiated rate contract (applies only to merchant transmission companies). NETWORK: Transmission service under contract providing network service. NETWORK OPERATING AGREEMENT: An executed agreement that contains the terms and conditions under which a network customer operates its facilities and the technical and operational matters associated with the implementation of network integration transmission service. OTHER: Product name not otherwise included. POINT-TO-POINT AGREEMENT: Transmission service under contract between specified Points of Receipt and Delivery. PRIMARY FREQUENCY RESPONSE: Service provided as a stand-by resource to support autonomous, pre-programmed changes in output to rapidly arrest large changes in frequency until dispatched resources can take over. REACTIVE SUPPLY & VOLTAGE CONTROL: Production or absorption of reactive power to maintain voltage levels on transmission systems (Ancillary Service). REAL POWER TRANSMISSION LOSS: The loss of energy, resulting from transporting power over a transmission system. REASSIGNMENT AGREEMENT: Transmission capacity reassignment agreement. REGULATION & FREQUENCY RESPONSE: Service providing for continuous balancing of resources (generation and interchange) with load, and for maintaining scheduled interconnection frequency by committing on-line generation where output is raised or lowered and by other non-generation resources capable of providing this service as necessary to follow the moment-by-moment changes in load (Ancillary Service). For Contracts, reported if the contract provides for sale of the product. For Transactions, sales by third-party providers (i.e., non-transmission function) are reported. REQUIREMENTS SERVICE: Firm, load-following power supply necessary to serve a specified share of customer’s aggregate load during the term of the agreement. Requirements service may include some or all of the energy, capacity and ancillary service products. SCHEDULE SYSTEM CONTROL & DISPATCH: Scheduling, confirming and implementing an interchange schedule with other Balancing Authorities, including intermediary Balancing Authorities providing transmission service, and ensuring operational security during the interchange transaction (Ancillary Service). SPINNING RESERVE: Unloaded synchronized generating capacity that is immediately responsive to system frequency and that is capable of being loaded in a short time period or non-generation resources capable of providing this service (Ancillary Service). For Contracts, reported if the contract provides for sale of the product. For Transactions, sales by third-party providers (i.e., non-transmission function) are reported. SUPPLEMENTAL RESERVE: Service needed to serve load in the event of a system contingency, available with greater delay than SPINNING RESERVE. This service may be provided by generating units that are on-line but unloaded, by quick-start generation, or by interruptible load or other non-generation resources capable of providing this service (Ancillary Service). For Contracts, reported if the contract provides for sale of the product. For Transactions, sales by third-party providers (i.e., non-transmission function) are reported. SYSTEM OPERATING AGREEMENTS: An executed agreement that contains the terms and conditions under which a system or network customer shall operate its facilities and the technical and operational matters associated with the implementation of network. TOLLING ENERGY: Energy sold from a plant whereby the buyer provides fuel to a generator (seller) and receives power in return for pre-established fees. TRANSMISSION OWNERS AGREEMENT: The agreement that establishes the terms and conditions under which a transmission owner transfers operational control over designated transmission facilities. UPLIFT: A make-whole payment by an RTO/ISO to a utility.

product_type_name

string

CB: Energy, capacity or ancillary services sold under a FERC-approved cost-based rate tariff. CR: An agreement under which a transmission provider sells, assigns or transfers all or portion of its rights to an eligible customer. CR-AD: Transmission capacity reassignments reported in Atlantic Daylight time. CR-AP: Transmission capacity reassignments reported in Atlantic Prevailing time. CR-AS: Transmission capacity reassignments reported in Atlantic Standard time. CR-CD: Transmission capacity reassignments reported in Central Daylight time. CR-CP: Transmission capacity reassignments reported in Central Prevailing time. CR-CS: Transmission capacity reassignments reported in Central Standard time. CR-ED: Transmission capacity reassignments reported in Eastern Daylight time. CR-EP: Transmission capacity reassignments reported in Eastern Prevailing time. CR-ES: Transmission capacity reassignments reported in Eastern Standard time. CR-MD: Transmission capacity reassignments reported in Mountain Daylight time. CR-MP: Transmission capacity reassignments reported in Mountain Prevailing time. CR-MS: Transmission capacity reassignments reported in Mountain Standard time. CR-PD: Transmission capacity reassignments reported in Pacific Daylight time. CR-PP: Transmission capacity reassignments reported in Pacific Prevailing time. CR-PS: Transmission capacity reassignments reported in Pacific Standard time. MB: Energy, capacity or ancillary services sold under the seller’s FERC-approved market-based rate tariff. T: The product is sold under a FERC-approved transmission tariff. NPU: The product is sold by a non-public utility that is required to file the EQR under section 220 of the Federal Power Act. OTHER: The product cannot be characterized by the other product type names.

quantity

number

Quantity for the contract product identified.

rate

number

The charge for the product per unit as stated in the contract.

rate_description

string

Text description of rate. If the rate is currently available on the FERC website, a citation of the FERC Accession Number and the relevant FERC tariff including page number or section may be included instead of providing the entire rate algorithm. If the rate is not available on the FERC website, include the rate algorithm, if rate is calculated. If the algorithm would exceed the 300 character field limit, it may be provided in a descriptive summary (including bases and methods of calculations) with a detailed citation of the relevant FERC tariff including page number and section.

rate_maximum

number

Maximum rate to be charged per the contract, if a range is specified.

rate_minimum

number

Minimum rate to be charged per the contract, if a range is specified.

rate_units

string

Measure stated in the contract for the product sold. FERC EQR.

seller_company_id_ferc

string

The Company Identifier (CID) obtained through FERC’s Company Registration system corresponding to the selling company.

seller_company_name

string

The name of the company that is authorized to make sales as indicated in the company’s FERC tariff(s) or that is required to file the EQR under section 220 of the Federal Power Act. This name must match the name provided as seller_company_name in the core_ferceqr__quarterly_identity table. There are a handful of cases in which this requirement is violated, so any joins between tables should rely on company_id_ferc, not the company names.

term_name

string

Contracts with durations of one year or greater are long-term (LT). Contracts with shorter durations are short-term (ST).

units

string

Measure stated in the contract for the product sold.

year_quarter

string

Year-quarter corresponding to record. Formatted like YYYYq{1-4}.

core_ferceqr__quarterly_identity#

Noticed any irregularities with this table? File a report.

Quarterly time series of individuals who filed FERC EQR for a company in a given quarter.

Most-recent data:

2026q1

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

FERC Form 920 – Electric Quarterly Report (EQR)

Primary key:

This table has no primary key. The primary key ought to be [‘year_quarter’, ‘company_id_ferc’, ‘filer_unique_id’], where filer_unique_id is an employee-level ID. However, a handful of companies have erroneously reported the same filer_unique_id for multiple employees, resulting in duplicate records. In other cases, there appear to be multiple filings in a given quarter for the same company and filer, resulting in additional duplicates. Thus, there is no reliable natural primary key for the identity table.

Usage Warnings#

  • This table is experimental and/or a work in progress and may change in the future.

  • FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Access methods#

Note

This table is partitioned into multiple Parquet files which can be treated as a single Parquet dataset.

import pandas as pd
# Select all Parquet files under this bucket prefix.
df = pd.read_parquet(
    "s3://pudl.catalyst.coop/ferceqr/core_ferceqr__quarterly_identity/",
    dtype_backend="pyarrow",
)
import polars as pl
# Select all Parquet files under this bucket prefix.
df = pl.read_parquet(
    "s3://pudl.catalyst.coop/ferceqr/core_ferceqr__quarterly_identity/",
    storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
-- Use a wildcard to select all parquet files with this bucket prefix.
SELECT * FROM 's3://pudl.catalyst.coop/ferceqr/core_ferceqr__quarterly_identity/*.parquet';
library(arrow)
library(dplyr)

# Establish S3 connection
bucket <- s3_bucket("pudl.catalyst.coop/ferceqr/core_ferceqr__quarterly_identity")

# Read all parquet files in that prefix
df <- open_dataset(bucket) |> collect()

Columns#

Field Name

Type

Description

company_id_ferc

string

The Company Identifier (CID) obtained through FERC’s Company Registration system.

company_name

string

Name of the reporting company.

contact_address

string

Street address for contact identified in contact_name.

contact_city

string

City for contact identified in contact_name.

contact_country_name

string

Country (USA, Canada, Mexico, or United Kingdom) for contact identified in contact_name.

contact_email

string

Email for contact identified in contact_name.

contact_name

string

(Seller) – The name of the contact for the company authorized to make sales as indicated in the company’s FERC tariff(s) or that is required to file the EQR under section 220 of the Federal Power Act. (Agent) – Name of the person who prepared the filing

contact_phone

string

Phone number for contact identified in contact_name.

contact_state

string

State for contact identified in contact_name.

contact_title

string

Title of of utility contact 1.

contact_zip

string

Zip code for contact identified in contact_name.

filer_unique_id

string

(Seller) An identifier (e.g., “FS1”, “FS2”) used to designate a record containing Seller identification information. One record for each seller company must be included in an EQR for a given quarter. (Agent) – An identifier (i.e., “FA1”) used to designate a record containing Agent identification information. One record with the FA1 identifier must be included in an EQR for a given quarter.

transactions_reported_to_index_price_publishers

boolean

Sellers should indicate whether they have reported their sales transactions to index price publisher(s). If they have, Sellers should indicate specifically which index publisher(s) in Field Number 73.

year_quarter

string

Year-quarter corresponding to record. Formatted like YYYYq{1-4}.

core_ferceqr__quarterly_index_pub#

Noticed any irregularities with this table? File a report.

Quarterly time series of electricity market price indices that individual EQR filers reported transactions to.

Most-recent data:

2026q1

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

FERC Form 920 – Electric Quarterly Report (EQR)

Primary key:

year_quarter, company_id_ferc, filer_unique_id

Usage Warnings#

  • This table is experimental and/or a work in progress and may change in the future.

  • FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Access methods#

Note

This table is partitioned into multiple Parquet files which can be treated as a single Parquet dataset.

import pandas as pd
# Select all Parquet files under this bucket prefix.
df = pd.read_parquet(
    "s3://pudl.catalyst.coop/ferceqr/core_ferceqr__quarterly_index_pub/",
    dtype_backend="pyarrow",
)
import polars as pl
# Select all Parquet files under this bucket prefix.
df = pl.read_parquet(
    "s3://pudl.catalyst.coop/ferceqr/core_ferceqr__quarterly_index_pub/",
    storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
-- Use a wildcard to select all parquet files with this bucket prefix.
SELECT * FROM 's3://pudl.catalyst.coop/ferceqr/core_ferceqr__quarterly_index_pub/*.parquet';
library(arrow)
library(dplyr)

# Establish S3 connection
bucket <- s3_bucket("pudl.catalyst.coop/ferceqr/core_ferceqr__quarterly_index_pub")

# Read all parquet files in that prefix
df <- open_dataset(bucket) |> collect()

Columns#

Field Name

Type

Description

company_id_ferc

string

The Company Identifier (CID) obtained through FERC’s Company Registration system.

filer_unique_id

string

(Seller) An identifier (e.g., “FS1”, “FS2”) used to designate a record containing Seller identification information. One record for each seller company must be included in an EQR for a given quarter. (Agent) – An identifier (i.e., “FA1”) used to designate a record containing Agent identification information. One record with the FA1 identifier must be included in an EQR for a given quarter.

index_price_publisher_name

string

Name of index price publisher, which can be one of the following: AM - Argus Media, EIG - Energy Intelligence Group, Inc., IP - Intelligence Press, P - Platts, B - Bloomberg, PDX - Powerdex, SNL - SNL Energy

seller_company_name

string

The name of the company that is authorized to make sales as indicated in the company’s FERC tariff(s) or that is required to file the EQR under section 220 of the Federal Power Act. This name must match the name provided as seller_company_name in the core_ferceqr__quarterly_identity table. There are a handful of cases in which this requirement is violated, so any joins between tables should rely on company_id_ferc, not the company names.

transactions_reported

string

Description of the types of transactions reported to the index publisher identified in this record.

year_quarter

string

Year-quarter corresponding to record. Formatted like YYYYq{1-4}.

core_ferceqr__transactions#

Noticed any irregularities with this table? File a report.

Contains information about individual electricity market transactions that took place during a given reporting quarter. Reported by the seller.

Most-recent data:

2026q1

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

FERC Form 920 – Electric Quarterly Report (EQR)

Primary key:

year_quarter, seller_company_id_ferc, transaction_unique_id

Usage Warnings#

  • This table is experimental and/or a work in progress and may change in the future.

  • FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Access methods#

Note

This table is partitioned into multiple Parquet files which can be treated as a single Parquet dataset.

Warning

This table will likely not fit in your computer’s memory. Make sure to filter or aggregate to reduce the amount of data read in. This is easier in polars and duckdb than in pandas, so we only provide examples for these.

import polars as pl
# Select some data relevant to tracking solar PPA prices
df = (
    # Limit query to data from a single quarterly Parquet file
    pl.scan_parquet(
        "s3://pudl.catalyst.coop/ferceqr/core_ferceqr__transactions/2013q3.parquet",
        storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
    )
    .select(
        [
            "seller_company_name",
            "product_name",
            "price",
            "standardized_price",
            "transaction_quantity",
            "standardized_quantity",
            "total_transmission_charge",
            "total_transaction_charge",
            "rate_units",
            "trade_date"
        ]
    )
    .filter(
        pl.col("seller_company_name").str.contains("(?i)solar")
        & pl.col("standardized_price").is_not_null()
        & (pl.col("product_name") == "ENERGY")
        & (pl.col("rate_units") == "$/MWH")
        & (pl.col("trade_date") >= 0)
    )
    .collect()
)
-- Select some data relevant to tracking solar PPA prices
SELECT
    -- Select a subset of columns to reduce network and memory usage
    seller_company_name,
    product_name,
    price,
    standardized_price,
    transaction_quantity,
    standardized_quantity,
    total_transmission_charge,
    total_transaction_charge,
    rate_units,
    trade_date
-- Limit to 2013Q3 data by querying only a single Parquet file
FROM 's3://pudl.catalyst.coop/ferceqr/core_ferceqr__transactions/2013q3.parquet'
WHERE seller_company_name ILIKE '%solar%'
    AND standardized_price IS NOT NULL
    AND product_name = 'ENERGY'
    AND rate_units = '$/MWH'
    AND trade_date >= '1970-01-01'::DATE;

Columns#

Field Name

Type

Description

class_name

string

F - Firm: For transmission sales, a service or product that always has priority over non-firm service. For power sales, a service or product that is not interruptible for economic reasons. NF - Non-firm: For transmission sales, a service that is reserved and/or scheduled on an as-available basis and is subject to curtailment or interruption at a lesser priority compared to Firm service. For an energy sale, a service or product for which delivery or receipt of the energy may be interrupted for any reason or no reason, without liability on the part of either the buyer or seller. UP - Unit Power Sale: Designates a dedicated sale of energy and capacity from one or more than one specified generation unit(s). N/A: To be used only when the other available Class Names do not apply.

contract_service_agreement_id

string

Unique identifier given to each service agreement that can be used by the Seller to produce the agreement, if requested. The identifier may be the number assigned by FERC for those service agreements that have been filed with and accepted by the Commission, or it may be generated as part of an internal identification system.

customer_company_name

string

The name of the purchaser of contract products and services. Unlike the seller_company_name this name is not guaranteed to match a name in the core_ferceqr__quarterly_identity name. In addition, the same customer company may appear with different names in different contracts and transactions, since this field is an unconstrained string chosen by the seller.

exchange_brokerage_service

string

If a broker service is used to consummate or effectuate a transaction, the term “Broker” shall be provided. If an exchange is used, the specific exchange that is used shall be selected from the Commission-provided list. Allowed values include BROKER, ICE, NODAL, and NYMEX.

ferc_tariff_reference

string

The FERC tariff reference cites the document that specifies the terms and conditions under which a Seller is authorized to make transmission sales, power sales or sales of related jurisdictional services at cost-based rates or at market-based rates. If the sales are market-based, the tariff that is specified in the FERC order granting the Seller Market Based Rate Authority must be listed. If a non-public utility does not have a FERC Tariff Reference, it should enter “NPU” for the FERC Tariff Reference.

increment_name

string

5: Terms of the contract (if specifically noted in the contract) set for more than 0 minutes and less than or equal to 5 minutes (> 0 and ≤ 5 minutes). 15: Terms of the contract (if specifically noted in the contract) set for more than 5 minutes and less than or equal to 15 minutes (> 5 and ≤ 15 minutes). H: Terms of the contract (if specifically noted in the contract) set for more than 15 minutes and less than or equal to 6 hours (> 15 minutes and ≤ 6 hours). D: Terms of the contract (if specifically noted in the contract) set for more than 6 and up to 60 hours (> 6 and ≤ 60 hours). W: Terms of the contract (if specifically noted in the contract) set for over 60 hours and up to 168 hours (> 60 and ≤ 168 hours). M: Terms of the contract (if specifically noted in the contract) set for more than 168 hours up to, but not including, one year (> 168 hours and < 1 year). Y: Terms of the contract (if specifically noted in the contract) set for one year or more (≥ 1 year). N/A: Terms of the contract do not specify an increment.

increment_peaking_name

string

FP: The product described may be sold during those hours designated as on-peak and off-peak at the point of delivery. OP: The product described may be sold only during those hours designated as off-peak at the point of delivery. P: The product described may be sold only during those hours designated as on-peak at the point of delivery. N/A: To be used only when the increment peaking name is not specified in the contract.

point_of_delivery_balancing_authority

string

The registered Balancing Authority (formerly called NERC Control Area) where a jurisdictional product is delivered and/or service ends for a transmission or transmission-related jurisdictional sale. The Balancing Authority will be identified with the abbreviation used in OASIS applications. If delivery occurs at the interconnection of two control areas, the control area that the product is entering should be used. If delivery occurs at a trading hub, the term ‘Hub’ should be used.

point_of_delivery_specific_location

string

The specific location at which the product is delivered if designated in the contract. If receipt occurs at a trading hub, a standardized hub name must be used.

price

number

Actual price charged for the product per unit. The price reported cannot be averaged or otherwise aggregated.

product_name

string

Description of product being offered. Note that allowed values differ slightly from those in core_ferceqr__contracts. BLACK START SERVICE: Service available after a system-wide blackout where a generator participates in system restoration activities without the availability of an outside electric supply (Ancillary Service). BOOKED OUT POWER: Energy or capacity contractually committed bilaterally for delivery but not actually delivered due to some offsetting or countervailing trade (Transaction only). CAPACITY: A quantity of demand that is charged on a $/KW or $/MW basis. CUSTOMER CHARGE: Fixed contractual charges assessed on a per customer basis that could include billing service. ENERGY: A quantity of electricity that is sold or transmitted over a period of time. ENERGY IMBALANCE: Service provided when a difference occurs between the scheduled and the actual delivery of energy to a load obligation (Ancillary Service). For Contracts, reported if the contract provides for sale of the product. For Transactions, sales by third-party providers (i.e., non-transmission function) are reported. EXCHANGE: Transaction whereby the receiver accepts delivery of energy for a supplier’s account and returns energy at times, rates, and in amounts as mutually agreed if the receiver is not an RTO/ISO. FUEL CHARGE: Charge based on the cost or amount of fuel used for generation. GENERATOR IMBALANCE: Service provided when a difference occurs between the output of a generator located in the Transmission Provider’s Control Area and a delivery schedule from that generator to (1) another Control Area or (2) a load within the Transmission Provider’s Control Area over a single hour (Ancillary Service). For Contracts, reported if the contract provides for sale of the product. For Transactions, sales by third-party providers (i.e., non-transmission function) are reported. GRANDFATHERED BUNDLED: Services provided for bundled transmission, ancillary services and energy under contracts effective prior to Order No. 888’s OATTs. NEGOTIATED-RATE TRANSMISSION: Transmission performed under a negotiated rate contract (applies only to merchant transmission companies). OTHER: Product name not otherwise included. PRIMARY FREQUENCY RESPONSE: Service provided as a stand-by resource to support autonomous, pre-programmed changes in output to rapidly arrest large changes in frequency until dispatched resources can take over. REACTIVE SUPPLY & VOLTAGE CONTROL: Production or absorption of reactive power to maintain voltage levels on transmission systems (Ancillary Service). REAL POWER TRANSMISSION LOSS: The loss of energy, resulting from transporting power over a transmission system. REGULATION & FREQUENCY RESPONSE: Service providing for continuous balancing of resources (generation and interchange) with load, and for maintaining scheduled interconnection frequency by committing on-line generation where output is raised or lowered and by other non-generation resources capable of providing this service as necessary to follow the moment-by-moment changes in load (Ancillary Service). For Contracts, reported if the contract provides for sale of the product. For Transactions, sales by third-party providers (i.e., non-transmission function) are reported. REQUIREMENTS SERVICE: Firm, load-following power supply necessary to serve a specified share of customer’s aggregate load during the term of the agreement. Requirements service may include some or all of the energy, capacity and ancillary service products. SCHEDULE SYSTEM CONTROL & DISPATCH: Scheduling, confirming and implementing an interchange schedule with other Balancing Authorities, including intermediary Balancing Authorities providing transmission service, and ensuring operational security during the interchange transaction (Ancillary Service). SPINNING RESERVE: Unloaded synchronized generating capacity that is immediately responsive to system frequency and that is capable of being loaded in a short time period or non-generation resources capable of providing this service (Ancillary Service). For Contracts, reported if the contract provides for sale of the product. For Transactions, sales by third-party providers (i.e., non-transmission function) are reported. SUPPLEMENTAL RESERVE: Service needed to serve load in the event of a system contingency, available with greater delay than SPINNING RESERVE. This service may be provided by generating units that are on-line but unloaded, by quick-start generation, or by interruptible load or other non-generation resources capable of providing this service (Ancillary Service). For Contracts, reported if the contract provides for sale of the product. For Transactions, sales by third-party providers (i.e., non-transmission function) are reported. TOLLING ENERGY: Energy sold from a plant whereby the buyer provides fuel to a generator (seller) and receives power in return for pre-established fees. UPLIFT: A make-whole payment by an RTO/ISO to a utility.

rate_units

string

Measure stated in the contract for the product sold. FERC EQR.

seller_company_id_ferc

string

The Company Identifier (CID) obtained through FERC’s Company Registration system corresponding to the selling company.

seller_company_name

string

The name of the company that is authorized to make sales as indicated in the company’s FERC tariff(s) or that is required to file the EQR under section 220 of the Federal Power Act. This name must match the name provided as seller_company_name in the core_ferceqr__quarterly_identity table. There are a handful of cases in which this requirement is violated, so any joins between tables should rely on company_id_ferc, not the company names.

seller_transaction_id

string

Unique reference number assigned by the Seller for each transaction. May only be unique in the context of the seller’s internal record keeping. This is an unrestricted text field.

standardized_price

number

For product names energy, capacity, and booked out power only. Specify the price in $/MWh if the product is energy or booked out power and specify the price in $/MW-month if the product is capacity or booked out power.

standardized_quantity

number

For product names energy, capacity, and booked out power only. Specify the quantity in MWh if the product is energy or booked out power and specify the quantity in MW-month if the product is capacity or booked out power.

term_name

string

Contracts with durations of one year or greater are long-term (LT). Contracts with shorter durations are short-term (ST).

timezone

string

A code representing the so-called ‘time zone’ in which the sale was made. However, these codes do not actually correspond to time zones. Rather, they indicate an offset from UTC, which changes according to daylight savings vs. standard time. The codes are as follows: AD: Atlantic Daylight Time (UTC-3), AP: Atlantic Prevailing Time, AS: Atlantic Standard Time (UTC-4), CD: Central Daylight Time (UTC-5), CP: Central Prevailing Time, CS: Central Standard Time (UTC-6), ED: Eastern Daylight Time (UTC-4), EP: Eastern Prevailing Time, ES: Eastern Standard Time (UTC-5), MD: Mountain Daylight Time (UTC-6), MP: Mountain Prevailing Time, MS: Mountain Standard Time (UTC-7), PD: Pacific Daylight Time (UTC-7), PP: Pacific Prevailing Time, PS: Pacific Standard Time (UTC-8), UT: Coordinated Universal Time (UTC+0)

total_transaction_charge

number

transaction_quantity * price + total_transmission_charge.

total_transmission_charge

number

Payments received for transmission services when explicitly identified.

trade_date

date

The date upon which the parties made the legally binding agreement on the price of a transaction.

transaction_begin_date

datetime

First date and time the product is sold during the quarter.

transaction_end_date

datetime

Last date and time the product is sold during the quarter.

transaction_quantity

number

The quantity of the product in this transaction record.

transaction_unique_id

string

An identifier beginning with the letter “T” and followed by a number (e.g., “T1”, “T2”) used to designate a record containing transaction information. One record for each transaction record must be included in an EQR for a given quarter. A new transaction record must be used every time a price changes in a sale. Note, these ID’s are only unique for a single company_identifier and year_quarter.

type_of_rate

string

FIXED: A fixed charge per unit of consumption. No variables are used to determine this rate. FORMULA: A calculation of a rate based upon a formula that does not contain an electric index component. ELECTRIC INDEX: A calculation of a rate based upon an index or a formula that contains an electric index component. An electric index includes an index published by an index publisher such as those required to be listed in Field Number 73 or a price published by an RTO/ISO (e.g., PJM West or Illinois Hub). RTO/ISO: If the price is the result of an RTO/ISO market or the sale is made to the RTO/ISO.

year_quarter

string

Year-quarter corresponding to record. Formatted like YYYYq{1-4}.

core_gridpathratoolkit__assn_generator_aggregation_group#

Noticed any irregularities with this table? File a report.

Association table providing connections between individual generator IDs and their corresponding aggregation group when creating aggregated profiles.

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

GridPath Resource Adequacy Toolkit Data

Primary key:

plant_id_eia, generator_id

Usage Warnings#

  • The original data is no longer being collected or reported in this way.

Additional Details#

This table defines which individual generator profiles are combined when creating aggregated capacity factors / generation profiles. Generator capacity is used to weight the contribution of each generator in the resulting aggregated profiles, and is made available in this table for convenience and legibility. The resulting aggregated profiles are stored in out_gridpathratoolkit__hourly_available_capacity_factor.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_gridpathratoolkit__assn_generator_aggregation_group.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_gridpathratoolkit__assn_generator_aggregation_group.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_gridpathratoolkit__assn_generator_aggregation_group.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_gridpathratoolkit__assn_generator_aggregation_group.parquet"
)

Columns#

Field Name

Type

Description

aggregation_group

string

A label identifying a group of aggregated generator capacity factors.

capacity_mw

number

Total installed (nameplate) capacity, in megawatts.

generator_id

string

Generator ID is usually numeric, but sometimes includes letters. Make sure you treat it as a string!

include_generator

boolean

Every row in the aggregation table describes a single generator. Groups of rows with the same aggregation are combined using a capacity weighted average to produce an aggregate generation profile. A few generators are not included in that aggregation process. This column determines whether a generator is included.

plant_id_eia

integer

The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.

core_nrelatb__yearly_projected_cost_performance#

Noticed any irregularities with this table? File a report.

Annual time series of projections of costs and performance for various technologies.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

NREL Annual Technology Baseline (ATB) for Electricity

Primary key:

This table has no primary key. The fields technology_description_detail_1 and technology_description_detail_2 occasionally contain nulls, preventing this table from having a primary key.

Additional Details#

This table includes NREL’s projections for capital and operational costs as well as operational attributes for a variety of technologies for each of the NREL ATB scenarios. Each new year of ATB data includes projections for the next 25 years.

All of the NREL tables are related. If you want to understand the financial assumptions of a given scenario in this table, see core_nrelatb__yearly_projected_financial_cases. The other two core_nrealatb tables include attributes about technologies in these projections.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_nrelatb__yearly_projected_cost_performance.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_nrelatb__yearly_projected_cost_performance.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_nrelatb__yearly_projected_cost_performance.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_nrelatb__yearly_projected_cost_performance.parquet"
)

Columns#

Field Name

Type

Description

capacity_factor

number

Fraction of potential generation that was actually reported for a plant part. Energy generated over time period / nameplate capacity * time period (hours/years/etc.).

capex_construction_finance_factor

number

Portion of all-in capital cost associated with construction period financing. This factor is applied to an overnight capital cost to represent the financing costs incurred during the construction period.

capex_grid_connection_per_kw

number

Overnight capital cost includes a nominal-distance spur line (<1 mi) for all technologies, and for offshore wind, it includes export cable and construction period transit costs for a 30-km distance from shore. Project-specific costs lines that are based on distance to existing transmission are not included. This only applies to offshore wind.

capex_overnight_additional_per_kw

number

capex for retrofits if plant could be constructed overnight (i.e., excludes construction period financing); includes on-site electrical equipment (e.g., switchyard), a nominal-distance spur line (<1 mi), and necessary upgrades at a transmission substation.

capex_overnight_per_kw

number

capex if plant could be constructed overnight (i.e., excludes construction period financing); includes on-site electrical equipment (e.g., switchyard), a nominal-distance spur line (<1 mi), and necessary upgrades at a transmission substation.

capex_per_kw

number

Capital cost (USD). Expenditures required to achieve commercial operation of the generation plant.

cost_recovery_period_years

integer

The period over which the initial capital investment to build a plant is recovered.

fuel_cost_per_mwh

number

Fuel costs in USD$/MWh. NREL-derived values using heat rates.

heat_rate_mmbtu_per_mwh

number

Fuel content per unit of electricity generated.

heat_rate_penalty

number

Heat rate penalty for retrofitting. This column only has contents to retrofit technologies. It seems to be a rate between 0.35 and 0.09

levelized_cost_of_energy_per_mwh

number

Levelized cost of energy (LCOE) is a summary metric that combines the primary technology cost and performance parameters: capital expenditures, operations expenditures, and capacity factor.

model_case_nrelatb

string

NREL’s financial assumption cases. There are two cases which effect project financial assumptions: R&D Only Case and Market + Policies Case. R&D Only includes only projected R&D improvements while Market + Policy case includes policy and tax incentives. https://atb.nrel.gov/electricity/2024/financial_cases_&_methods

model_tax_credit_case_nrelatb

string

NREL’s tax credit assumption cases. There are two types of tax credits: production tax credit (PTC) and investment tax credit (ITC). For more detail, see: https://atb.nrel.gov/electricity/2024/financial_cases_&_methods

net_output_penalty

number

Penalty for retrofitting for net output. This column only has contents to retrofit technologies. It seems to be a rate between -0.25 and -0.08

opex_fixed_per_kw

number

Fixed operation and maintenance expenses. Annual expenditures to operate and maintain equipment that are not incurred on a per-unit-energy basis.

opex_variable_per_mwh

number

Operation and maintenance costs incurred on a per-unit-energy basis.

projection_year

integer

The year of the projected value.

report_year

integer

Four-digit year in which the data was reported.

scenario_atb

string

Technology innovation scenarios. https://atb.nrel.gov/electricity/2023/definitions#scenarios

technology_description

string

High level description of the technology used by the generator to produce electricity.

technology_description_detail_1

string

Technology details indicate resource levels and specific technology subcategories.

technology_description_detail_2

string

Technology details indicate resource levels and specific technology subcategories.

core_nrelatb__yearly_projected_financial_cases#

Noticed any irregularities with this table? File a report.

Annual time series of financial assumptions for each model case (model_case_nrelatb), and sub-type of technology (technology_description).

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

NREL Annual Technology Baseline (ATB) for Electricity

Primary key:

report_year, model_case_nrelatb, projection_year, technology_description

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_nrelatb__yearly_projected_financial_cases.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_nrelatb__yearly_projected_financial_cases.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_nrelatb__yearly_projected_financial_cases.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_nrelatb__yearly_projected_financial_cases.parquet"
)

Columns#

Field Name

Type

Description

inflation_rate

number

Rate of inflation. All dollar values are given in 2021 USD, using the Consumer Price Index for All Urban Consumers for dollar year conversions where the source year dollars do not match 2021.

interest_rate_calculated_real

number

Calculated real interest rate.

interest_rate_during_construction_nominal

number

Also referred to as construction finance cost. Portion of all-in capital cost associated with construction period financing. It is a function of construction duration, capital fraction during construction, and interest during construction.

interest_rate_nominal

number

Nominal interest rate.

model_case_nrelatb

string

NREL’s financial assumption cases. There are two cases which effect project financial assumptions: R&D Only Case and Market + Policies Case. R&D Only includes only projected R&D improvements while Market + Policy case includes policy and tax incentives. https://atb.nrel.gov/electricity/2024/financial_cases_&_methods

projection_year

integer

The year of the projected value.

rate_of_return_on_equity_calculated_real

number

Calculated real rate of return on equity.

rate_of_return_on_equity_nominal

number

Nomial rate of return on equity.

report_year

integer

Four-digit year in which the data was reported.

tax_rate_federal_state

number

Combined federal and state tax rate. The R&D model_case_nrelatb holds tax and inflation rates constant at assumed long-term values: 21 percent federal tax rate, 6 percent state tax rate (though actual state tax rates vary), and 2.5 percent inflation rate excludes effects of tax credits. The Market + Policy model_case_nrelatb applies federal tax credits and expires them as consistent with existing law and guidelines.

technology_description

string

High level description of the technology used by the generator to produce electricity.

core_nrelatb__yearly_projected_financial_cases_by_scenario#

Noticed any irregularities with this table? File a report.

Annual time series of additional financial assumptions for NREL ATB projections that also vary by technology innovation scenario (scenario_atb), tax credit case (model_tax_credit_case_nrelatb), and cost recovery period (cost_recovery_period_years).

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

NREL Annual Technology Baseline (ATB) for Electricity

Primary key:

This table has no primary key. There are a small number of records which have nulls in the cost_recovery_period_years column. Based on NREL’s documentation, this seems to indicate that those records apply to any relevant cost_recovery_period_years. If those records were non-null, the primary keys of this table would be: [‘report_year’, ‘model_case_nrelatb’, ‘model_tax_credit_case_nrelatb’, ‘projection_year’, ‘technology_description’, ‘scenario_atb’, ‘cost_recovery_period_years’]

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_nrelatb__yearly_projected_financial_cases_by_scenario.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_nrelatb__yearly_projected_financial_cases_by_scenario.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_nrelatb__yearly_projected_financial_cases_by_scenario.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_nrelatb__yearly_projected_financial_cases_by_scenario.parquet"
)

Columns#

Field Name

Type

Description

capital_recovery_factor

number

Ratio of a constant annuity to the present value of receiving that annuity for a given length of time.

cost_recovery_period_years

integer

The period over which the initial capital investment to build a plant is recovered.

debt_fraction

number

Fraction of capital financed with debt; Debt fraction is assumed financed with equity; also referred to as the leverage ratio.

fixed_charge_rate

number

Amount of revenue per dollar of investment required that must be collected annually from customers to pay the carrying charges on that investment.

model_case_nrelatb

string

NREL’s financial assumption cases. There are two cases which effect project financial assumptions: R&D Only Case and Market + Policies Case. R&D Only includes only projected R&D improvements while Market + Policy case includes policy and tax incentives. https://atb.nrel.gov/electricity/2024/financial_cases_&_methods

model_tax_credit_case_nrelatb

string

NREL’s tax credit assumption cases. There are two types of tax credits: production tax credit (PTC) and investment tax credit (ITC). For more detail, see: https://atb.nrel.gov/electricity/2024/financial_cases_&_methods

projection_year

integer

The year of the projected value.

report_year

integer

Four-digit year in which the data was reported.

scenario_atb

string

Technology innovation scenarios. https://atb.nrel.gov/electricity/2023/definitions#scenarios

technology_description

string

High level description of the technology used by the generator to produce electricity.

wacc_nominal

number

Nominal weighted average cost of capital - average expected rate that is paid to finance assets.

wacc_real

number

Real weighted average cost of capital - average expected rate that is paid to finance assets.

core_nrelatb__yearly_technology_status#

Noticed any irregularities with this table? File a report.

Annual time series of technology statuses, indicating whether technologies in the ATB scenarios are mature, and whether they are the default technologies.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

NREL Annual Technology Baseline (ATB) for Electricity

Primary key:

This table has no primary key.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_nrelatb__yearly_technology_status.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_nrelatb__yearly_technology_status.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_nrelatb__yearly_technology_status.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_nrelatb__yearly_technology_status.parquet"
)

Columns#

Field Name

Type

Description

is_default

boolean

Indicator of whether the technology is default.

is_technology_mature

boolean

Indicator of whether the technology is mature. Technologies are definedas mature if a representative plant is operating or under constructionin the United States in the Base Year.

report_year

integer

Four-digit year in which the data was reported.

technology_description

string

High level description of the technology used by the generator to produce electricity.

technology_description_detail_1

string

Technology details indicate resource levels and specific technology subcategories.

technology_description_detail_2

string

Technology details indicate resource levels and specific technology subcategories.

core_phmsagas__yearly_distribution_operators#

Noticed any irregularities with this table? File a report.

Annual time series of distribution operator information.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

Pipelines and Hazardous Materials Safety Administration (PHMSA) Annual Natural Gas Report (Part A)

Primary key:

report_id, report_date, operator_id_phmsa

Additional Details#

This table contains operator-level information including office and headquarter location.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_phmsagas__yearly_distribution_operators.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_phmsagas__yearly_distribution_operators.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_phmsagas__yearly_distribution_operators.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_phmsagas__yearly_distribution_operators.parquet"
)

Columns#

Field Name

Type

Description

additional_information

string

Any additional information which will assist in clarifying or classifying the reported data.

headquarters_city

string

City where an operator’s headquarters are located.

headquarters_county

string

County where an operator’s headquarters are located.

headquarters_state

string

State where an operator’s headquarters are located.

headquarters_street_address

string

Street address for an operator’s headquarters.

headquarters_zip

string

Zipcode where an operator’s headquarters are located.

office_city

string

City where an operator’s office is located.

office_county

string

County where an operator’s office is located.

office_state

string

State where an operator’s office is located.

office_street_address

string

Street address of an operator’s office.

office_zip

string

Zipcode where an operator’s office is located.

operator_id_phmsa

integer

PHMSA unique operator ID. A value of zero represents an unknown operator ID.

operator_name_phmsa

string

PHMSA operator name.

report_date

date

Date reported.

report_id

integer

Report number of the PHMSA Gas utility submission.

core_pudl__assn_eia_pudl_plants#

Noticed any irregularities with this table? File a report.

Association table providing connections between EIA plant IDs and manually assigned PUDL plant IDs.

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

The Public Utility Data Liberation (PUDL) Project

Primary key:

plant_id_eia

Usage Warnings#

  • Contains information from multiple raw inputs.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__assn_eia_pudl_plants.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__assn_eia_pudl_plants.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_pudl__assn_eia_pudl_plants.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__assn_eia_pudl_plants.parquet"
)

Columns#

Field Name

Type

Description

plant_id_eia

integer

The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.

plant_id_pudl

integer

A manually assigned PUDL plant ID. May not be constant over time.

plant_name_eia

string

Plant name.

core_pudl__assn_eia_pudl_utilities#

Noticed any irregularities with this table? File a report.

Association table providing connections between EIA utility IDs and manually assigned PUDL utility IDs.

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

The Public Utility Data Liberation (PUDL) Project

Primary key:

utility_id_eia

Usage Warnings#

  • Contains information from multiple raw inputs.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__assn_eia_pudl_utilities.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__assn_eia_pudl_utilities.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_pudl__assn_eia_pudl_utilities.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__assn_eia_pudl_utilities.parquet"
)

Columns#

Field Name

Type

Description

utility_id_eia

integer

The EIA Utility Identification number.

utility_id_pudl

integer

A manually assigned PUDL utility ID. May not be stable over time.

utility_name_eia

string

The name of the utility.

core_pudl__assn_ferc1_dbf_pudl_utilities#

Noticed any irregularities with this table? File a report.

Association table providing connections between PUDL-assigned FERC1 utility IDs and the native FERC1 DBF utility IDs originally reported as respondent_id.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

The Public Utility Data Liberation (PUDL) Project

Primary key:

utility_id_ferc1_dbf

Usage Warnings#

  • FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__assn_ferc1_dbf_pudl_utilities.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__assn_ferc1_dbf_pudl_utilities.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_pudl__assn_ferc1_dbf_pudl_utilities.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__assn_ferc1_dbf_pudl_utilities.parquet"
)

Columns#

Field Name

Type

Description

utility_id_ferc1

integer

PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.

utility_id_ferc1_dbf

integer

FERC-assigned respondent_id from DBF reporting years, identifying the reporting entity. Stable from year to year.

core_pudl__assn_ferc1_pudl_plants#

Noticed any irregularities with this table? File a report.

Association table providing connections between FERC 1 plants and their manually assigned PUDL plant IDs.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

The Public Utility Data Liberation (PUDL) Project

Primary key:

utility_id_ferc1, plant_name_ferc1

Usage Warnings#

  • FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

FERC does not assign IDs to plants, so each FERC 1 plant is identified by a utility_id_ferc1 and a plant_name_ferc1.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__assn_ferc1_pudl_plants.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__assn_ferc1_pudl_plants.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_pudl__assn_ferc1_pudl_plants.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__assn_ferc1_pudl_plants.parquet"
)

Columns#

Field Name

Type

Description

plant_id_pudl

integer

A manually assigned PUDL plant ID. May not be constant over time.

plant_name_ferc1

string

Name of the plant, as reported to FERC. This is a freeform string, not guaranteed to be consistent across references to the same plant.

utility_id_ferc1

integer

PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.

core_pudl__assn_ferc1_pudl_utilities#

Noticed any irregularities with this table? File a report.

Association table providing connections between PUDL utility IDs and PUDL-assigned FERC1 utility IDs.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

The Public Utility Data Liberation (PUDL) Project

Primary key:

utility_id_ferc1

Usage Warnings#

  • FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

This table maps two manually assigned utility IDs: a PUDL ID and a FERC1 ID. The PUDL IDs link EIA and FERC1 utilities. The PUDL FERC1 IDs link records from older DBF respondent IDs and new XBRL entity IDs via core_pudl__assn_ferc1_dbf_pudl_utilities and core_pudl__assn_ferc1_xbrl_pudl_utilities respectively. This table is generated from a table stored in the PUDL repository: src/package_data/glue/utility_id_pudl.csv

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__assn_ferc1_pudl_utilities.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__assn_ferc1_pudl_utilities.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_pudl__assn_ferc1_pudl_utilities.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__assn_ferc1_pudl_utilities.parquet"
)

Columns#

Field Name

Type

Description

utility_id_ferc1

integer

PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.

utility_id_pudl

integer

A manually assigned PUDL utility ID. May not be stable over time.

utility_name_ferc1

string

Name of the responding utility, as it is reported in FERC Form 1. For human readability only.

core_pudl__assn_ferc1_xbrl_pudl_utilities#

Noticed any irregularities with this table? File a report.

Association table providing connections between PUDL-assigned FERC1 utility IDs and the native FERC1 XBRL utility IDs originally reported as entity_id.

Most-recent data:

2024

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

The Public Utility Data Liberation (PUDL) Project

Primary key:

utility_id_ferc1_xbrl

Usage Warnings#

  • FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__assn_ferc1_xbrl_pudl_utilities.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__assn_ferc1_xbrl_pudl_utilities.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_pudl__assn_ferc1_xbrl_pudl_utilities.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__assn_ferc1_xbrl_pudl_utilities.parquet"
)

Columns#

Field Name

Type

Description

utility_id_ferc1

integer

PUDL-assigned utility ID, identifying a FERC1 utility. This is an auto-incremented ID and is not expected to be stable from year to year.

utility_id_ferc1_xbrl

string

FERC-assigned entity_id from XBRL reporting years, identifying the reporting entity. Stable from year to year.

core_pudl__assn_ferc714_csv_pudl_respondents#

Noticed any irregularities with this table? File a report.

Association table providing connections between PUDL-assigned FERC714 respondent IDs and native FERC714 CSV respondent IDs originally reported as respondent_id.

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

The Public Utility Data Liberation (PUDL) Project

Primary key:

respondent_id_ferc714, respondent_id_ferc714_csv

Usage Warnings#

  • Contains information from multiple raw inputs.

  • FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__assn_ferc714_csv_pudl_respondents.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__assn_ferc714_csv_pudl_respondents.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_pudl__assn_ferc714_csv_pudl_respondents.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__assn_ferc714_csv_pudl_respondents.parquet"
)

Columns#

Field Name

Type

Description

respondent_id_ferc714

integer

PUDL-assigned identifying a respondent to FERC Form 714. This ID associates natively reported respondent IDs from the original CSV and XBRL data sources.

respondent_id_ferc714_csv

integer

FERC Form 714 respondent ID from CSV reported data - published from years: 2006-2020. This ID is linked to the newer years of reported XBRL data through the PUDL-assigned respondent_id_ferc714 ID. This ID was originally reported as respondent_id. Note that this ID does not correspond to FERC respondent IDs from other forms.

core_pudl__assn_ferc714_pudl_respondents#

Noticed any irregularities with this table? File a report.

Entity table containing static information about PUDL-derived FERC 714 respondent IDs.

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

The Public Utility Data Liberation (PUDL) Project

Primary key:

respondent_id_ferc714

Usage Warnings#

  • Contains information from multiple raw inputs.

  • FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Additional Details#

These IDs are used to connect older CSV data which uses different respondent IDs than the newer XBRL entity IDs. These IDs are manually assigned when new FERC 714 data is integrated, and any newly found utilities are added to the list with a new ID. This table is read in from a CSV stored in the PUDL repository: src/pudl/package_data/glue/respondent_id_ferc714.xlsx

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__assn_ferc714_pudl_respondents.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__assn_ferc714_pudl_respondents.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_pudl__assn_ferc714_pudl_respondents.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__assn_ferc714_pudl_respondents.parquet"
)

Columns#

Field Name

Type

Description

respondent_id_ferc714

integer

PUDL-assigned identifying a respondent to FERC Form 714. This ID associates natively reported respondent IDs from the original CSV and XBRL data sources.

core_pudl__assn_ferc714_xbrl_pudl_respondents#

Noticed any irregularities with this table? File a report.

Association table providing connections between PUDL-assigned FERC714 respondent IDs and native FERC714 XBRL respondent IDs originally reported as entity_id.

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

The Public Utility Data Liberation (PUDL) Project

Primary key:

respondent_id_ferc714, respondent_id_ferc714_xbrl

Usage Warnings#

  • Contains information from multiple raw inputs.

  • FERC data is notoriously difficult to extract cleanly, and often contains free-form strings, non-labeled total rows and lack of IDs. See Notable Irregularities for details.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__assn_ferc714_xbrl_pudl_respondents.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__assn_ferc714_xbrl_pudl_respondents.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_pudl__assn_ferc714_xbrl_pudl_respondents.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__assn_ferc714_xbrl_pudl_respondents.parquet"
)

Columns#

Field Name

Type

Description

respondent_id_ferc714

integer

PUDL-assigned identifying a respondent to FERC Form 714. This ID associates natively reported respondent IDs from the original CSV and XBRL data sources.

respondent_id_ferc714_xbrl

string

FERC Form 714 respondent ID from XBRL reported data - published from years: 2021-present. This ID is linked to the older years of reported CSV data through the PUDL-assigned respondent_id_ferc714 ID. This ID was originally reported as entity_id. Note that this ID does not correspond to FERC respondent IDs from other forms.

core_pudl__assn_utilities_plants#

Noticed any irregularities with this table? File a report.

Association table providing connections between PUDL utility IDs and PUDL plant IDs.

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

The Public Utility Data Liberation (PUDL) Project

Primary key:

utility_id_pudl, plant_id_pudl

Additional Details#

This table is read in from a spreadsheet stored in the PUDL repository: src/pudl/package_data/glue/pudl_id_mapping.xlsx

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__assn_utilities_plants.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__assn_utilities_plants.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_pudl__assn_utilities_plants.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__assn_utilities_plants.parquet"
)

Columns#

Field Name

Type

Description

plant_id_pudl

integer

A manually assigned PUDL plant ID. May not be constant over time.

utility_id_pudl

integer

A manually assigned PUDL utility ID. May not be stable over time.

core_pudl__codes_data_maturities#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for maturity levels of data records. Some data sources report less-than-final data. PUDL sometimes includes this data, but use at your own risk.

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

The Public Utility Data Liberation (PUDL) Project

Primary key:

code

Usage Warnings#

  • Contains information from multiple raw inputs.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__codes_data_maturities.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__codes_data_maturities.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_pudl__codes_data_maturities.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__codes_data_maturities.parquet"
)

Columns#

Field Name

Type

Description

code

string

Originally reported short code.

description

string

Long human-readable description of the meaning of a code/label.

core_pudl__codes_datasources#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for the data sources that PUDL integrates.

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

The Public Utility Data Liberation (PUDL) Project

Primary key:

datasource

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__codes_datasources.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__codes_datasources.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_pudl__codes_datasources.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__codes_datasources.parquet"
)

Columns#

Field Name

Type

Description

datasource

string

Code identifying a dataset available within PUDL.

doi

string

Unique digital object identifier of Zenodo archive.

partitions

string

The data partitions used to generate this instance of the database.

pudl_version

string

The version of PUDL used to generate this database.

core_pudl__codes_imputation_reasons#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for the reasons a value might be flagged for imputation.

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

The Public Utility Data Liberation (PUDL) Project

Primary key:

code

Additional Details#

Any time a column is imputed, there should be a corresponding column which contains codes for every value in the column that was imputed.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__codes_imputation_reasons.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__codes_imputation_reasons.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_pudl__codes_imputation_reasons.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__codes_imputation_reasons.parquet"
)

Columns#

Field Name

Type

Description

code

string

Originally reported short code.

description

string

Long human-readable description of the meaning of a code/label.

core_pudl__codes_subdivisions#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for states, provinces, and other sub-national political subdivisions.

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

The Public Utility Data Liberation (PUDL) Project

Primary key:

country_code, subdivision_code

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__codes_subdivisions.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__codes_subdivisions.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_pudl__codes_subdivisions.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__codes_subdivisions.parquet"
)

Columns#

Field Name

Type

Description

country_code

string

Three letter ISO-3166 country code (e.g. USA or CAN).

country_name

string

Full country name (e.g. United States of America).

division_code_us_census

string

Three-letter US Census division code as it appears in the bulk electricity data published by the EIA. Note that EIA splits the Pacific division into distinct contiguous (CA, OR, WA) and non-contiguous (AK, HI) states. For reference see this US Census region and division map: https://www2.census.gov/geo/pdfs/maps-data/maps/reference/us_regdiv.pdf

division_name_us_census

string

Longer human readable name describing the US Census division.

is_epacems_state

boolean

Indicates whether the associated state reports data within the EPA’s Continuous Emissions Monitoring System.

region_name_us_census

string

Human-readable name of a US Census region.

state_id_fips

string

Two digit state FIPS code.

subdivision_code

string

Two-letter ISO-3166 political subdivision code (e.g. US state or Canadian province abbreviations like CA or AB).

subdivision_name

string

Full name of political subdivision (e.g. US state or Canadian province names like California or Alberta.

subdivision_type

string

ISO-3166 political subdivision type. E.g. state, province, outlying_area.

timezone_approx

string

IANA timezone name of the timezone which encompasses the largest portion of the population in the associated geographic area.

core_pudl__entity_plants_pudl#

Noticed any irregularities with this table? File a report.

Entity table containing static information about PUDL-assigned plant IDs.

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

The Public Utility Data Liberation (PUDL) Project

Primary key:

plant_id_pudl

Additional Details#

These IDs are manually generated each year when new FERC and EIA reporting is integrated, and any newly identified plants are added to the list with a new ID. Each ID maps to a power plant which is reported in at least one FERC or EIA data set. This table is read in from a spreadsheet stored in the PUDL repository: src/pudl/package_data/glue/pudl_id_mapping.xlsx

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__entity_plants_pudl.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__entity_plants_pudl.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_pudl__entity_plants_pudl.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__entity_plants_pudl.parquet"
)

Columns#

Field Name

Type

Description

plant_id_pudl

integer

A manually assigned PUDL plant ID. May not be constant over time.

plant_name_pudl

string

Plant name, chosen arbitrarily from the several possible plant names available in the plant matching process. Included for human readability only.

core_pudl__entity_utilities_pudl#

Noticed any irregularities with this table? File a report.

Entity table containing static information about PUDL-assigned utility IDs.

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

The Public Utility Data Liberation (PUDL) Project

Primary key:

utility_id_pudl

Additional Details#

These IDs are manually generated each year when new FERC and EIA reporting is integrated, and any newly found utilities are added to the list with a new ID. Each ID maps to a power plant owning or operating entity which is reported in at least one FERC or EIA data set. This table is read in from a spreadsheet stored in the PUDL repository: src/pudl/package_data/glue/pudl_id_mapping.xlsx

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__entity_utilities_pudl.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__entity_utilities_pudl.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_pudl__entity_utilities_pudl.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_pudl__entity_utilities_pudl.parquet"
)

Columns#

Field Name

Type

Description

utility_id_pudl

integer

A manually assigned PUDL utility ID. May not be stable over time.

utility_name_pudl

string

Utility name, chosen arbitrarily from the several possible utility names available in the utility matching process. Included for human readability only.

core_rus12__entity_borrowers#

Noticed any irregularities with this table? File a report.

Entity table containing static information about active RUS borrowers.

Most-recent data:

2021

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply

Primary key:

borrower_id_rus

Usage Warnings#

  • Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Additional Details#

This table contains canonical values for borrowers are set. It contains values which are expected to remain fixed over time.RUS reports many attributes in many different tables across throughout RUS-7 and RUS-12. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. For the borrower_name_rus PUDL chooses the most consistently reported value regardless of if it meets this 70% threshold so that all borrowers will have a name. We chose this because most name changes were insignificant (eg. “and” changed to “&” or “coop” changed to “cooperative”). All tables downstream of this one inherit the canonical values established here..

This table contains all of the Active Distribution Borrowers as of each report year who were eligible to report to RUS Form 12. If these Borrowers have reported to RUS they will have records in the enclosed data tables, however a small number of these Borrowers did not report for various reasons and these Borrowers will not be represented in any of the other tables.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__entity_borrowers.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__entity_borrowers.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus12__entity_borrowers.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__entity_borrowers.parquet"
)

Columns#

Field Name

Type

Description

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

borrower_name_rus

string

The name of the RUS (Rural Utilities Service) borrower.

state

string

Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

core_rus12__monthly_demand_and_energy_at_delivery_points#

Noticed any irregularities with this table? File a report.

Monthly time series of demand and energy delivered to distribution utilities.

Most-recent data:

2021

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part H - Section E)

Primary key:

report_date, borrower_id_rus, delivery_recipient

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__monthly_demand_and_energy_at_delivery_points.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__monthly_demand_and_energy_at_delivery_points.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus12__monthly_demand_and_energy_at_delivery_points.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__monthly_demand_and_energy_at_delivery_points.parquet"
)

Columns#

Field Name

Type

Description

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

delivered_demand_mw

number

The amount of demand delivered in MW.

delivered_energy_mwh

number

The amount of energy delivered in MWh.

delivery_recipient

string

The recipient of the delivered energy or demand.

report_date

date

Date reported.

core_rus12__monthly_demand_and_energy_at_power_sources#

Noticed any irregularities with this table? File a report.

Monthly time series of demand and energy at point of generation.

Most-recent data:

2021

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part H - Section D)

Primary key:

report_date, borrower_id_rus

Additional Details#

There are some peak demand dates that have an hour component and others that don’t. This is because there is inconsistent reporting of peak_demand_hour in the raw data. If there is no hour reported, we record the hour as 0 and only the date will show in the data.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__monthly_demand_and_energy_at_power_sources.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__monthly_demand_and_energy_at_power_sources.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus12__monthly_demand_and_energy_at_power_sources.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__monthly_demand_and_energy_at_power_sources.parquet"
)

Columns#

Field Name

Type

Description

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

energy_output_mwh

number

The amount of energy output in MWh.

is_peak_coincident

boolean

Whether or not the peak_mw is coincident or non-coincident peak.

peak_demand_date

datetime

The date of the peak demand.

peak_demand_mw

number

peak demand in a given timeframe.

report_date

date

Date reported.

core_rus12__yearly_balance_sheet_assets#

Noticed any irregularities with this table? File a report.

Annual time series of assets and other debts from the balance sheet.

Most-recent data:

2021

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part A - Section B)

Primary key:

report_date, borrower_id_rus, asset_type

Usage Warnings#

  • Some columns contain subtotals; use caution when choosing columns to aggregate.

Additional Details#

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_balance_sheet_assets.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_balance_sheet_assets.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus12__yearly_balance_sheet_assets.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_balance_sheet_assets.parquet"
)

Columns#

Field Name

Type

Description

asset_type

string

Type of asset being reported to the core_rus12__yearly_balance_sheet_assets table.

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

ending_balance

number

Account balance at end of year.

is_total

boolean

Whether or not this record represents a total.

report_date

date

Date reported.

core_rus12__yearly_balance_sheet_liabilities#

Noticed any irregularities with this table? File a report.

Annual time series of liabilities and other credits from the balance sheet.

Most-recent data:

2021

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part A - Section B)

Primary key:

report_date, borrower_id_rus, liability_type

Usage Warnings#

  • Some columns contain subtotals; use caution when choosing columns to aggregate.

Additional Details#

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_balance_sheet_liabilities.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_balance_sheet_liabilities.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus12__yearly_balance_sheet_liabilities.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_balance_sheet_liabilities.parquet"
)

Columns#

Field Name

Type

Description

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

ending_balance

number

Account balance at end of year.

is_total

boolean

Whether or not this record represents a total.

liability_type

string

Type of liability being reported to the core_rus12__yearly_balance_sheet_liabilities table.

report_date

date

Date reported.

core_rus12__yearly_depreciation_changes#

Noticed any irregularities with this table? File a report.

Annual time series of changes in accumulated depreciation.

Most-recent data:

2021

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part H - Section B)

Primary key:

report_date, borrower_id_rus, depreciation_and_amortization_group, depreciation_and_amortization_item

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_depreciation_changes.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_depreciation_changes.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus12__yearly_depreciation_changes.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_depreciation_changes.parquet"
)

Columns#

Field Name

Type

Description

accruals

number

Value of additions into an asset class a.k.a accruals.

adjustments_and_transfers

number

Amount of adjustments and transfers within a class of assets.

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

composite_depreciation_rate

number

The composite depreciation rate within a given category. This is typically expressed as a number between 0 and 100.

depreciation_and_amortization_group

string

High-level category of depreciation and amortization items.

depreciation_and_amortization_item

string

Category of depreciation and amortization items.

ending_balance

number

Account balance at end of year.

is_total

boolean

Whether or not this record represents a total.

report_date

date

Date reported.

retirements_less_net_salvage

number

Cost of retirements minus any net salvage value.

core_rus12__yearly_depreciation_misc#

Noticed any irregularities with this table? File a report.

Annual time series of accumulated depreciation.

Most-recent data:

2021

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part H - 2nd part of Section B)

Primary key:

report_date, borrower_id_rus, depreciation_and_amortization_item

Additional Details#

This is the second half of the depreciation table. It includes ending balances of six depreciation and amortization values.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_depreciation_misc.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_depreciation_misc.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus12__yearly_depreciation_misc.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_depreciation_misc.parquet"
)

Columns#

Field Name

Type

Description

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

depreciation_and_amortization_item

string

Category of depreciation and amortization items.

ending_balance

number

Account balance at end of year.

report_date

date

Date reported.

core_rus12__yearly_external_financial_risk_ratio#

Noticed any irregularities with this table? File a report.

Annual time series of ratio of investments and loan guarantee balances to total utility plant assets.

Most-recent data:

2021

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part H - Section F - Subsection III)

Primary key:

report_date, borrower_id_rus

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_external_financial_risk_ratio.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_external_financial_risk_ratio.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus12__yearly_external_financial_risk_ratio.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_external_financial_risk_ratio.parquet"
)

Columns#

Field Name

Type

Description

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

external_financial_risk_ratio

number

total investments + loan guarantee balances / total utility plant assets. This ratio shows how much a utility is financially exposed to outside entities relative to its own assets.

report_date

date

Date reported.

core_rus12__yearly_investments#

Noticed any irregularities with this table? File a report.

Annual time series of investments, loan guarantees and loans.

Most-recent data:

2021

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part H - Section F, Sub-section I)

Primary key:

This table has no primary key. This is a list of all investments or loans in each year and borrowers can have multiple records with the same investment_description.

Usage Warnings#

  • Some columns contain subtotals; use caution when choosing columns to aggregate.

Additional Details#

Reporting of investments is required by 7 CFR 1717, Subpart N. Investment categories reported on this Part correspond to Balance Sheet items in Part A Section B in table core_rus12__yearly_balance_sheet_assets.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_investments.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_investments.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus12__yearly_investments.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_investments.parquet"
)

Columns#

Field Name

Type

Description

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

excluded_investments

number

Excluded investment.

for_rural_development

boolean

Whether or not the investment or loan is for rural development. This includes investments in any/all types of projects or products that were made to improve the economy and/or quality of life in the specified area.

included_investments

number

Included investment.

income_or_loss

number

Income or loss from investment.

investment_description

string

Description of investment. This is a free-form text field and thus contains a wide variety of values.

investment_type_code

integer

Investment type code.

report_date

date

Date reported.

core_rus12__yearly_lines_stations_labor_materials_cost#

Noticed any irregularities with this table? File a report.

Annual time series of labor and material cost for lines and stations operated by RUS borrowers.

Most-recent data:

2021

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part I - Section C)

Primary key:

report_date, borrower_id_rus, labor_or_material, operation_or_maintenance, lines_or_stations

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_lines_stations_labor_materials_cost.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_lines_stations_labor_materials_cost.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus12__yearly_lines_stations_labor_materials_cost.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_lines_stations_labor_materials_cost.parquet"
)

Columns#

Field Name

Type

Description

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

cost

number

Cost value.

labor_or_material

string

Indicates whether the cost reported is for labor or material.

lines_or_stations

string

Indicates whether the cost reported pertains to lines or stations.

operation_or_maintenance

string

Indicates whether the expenditure is for operation or maintenance.

report_date

date

Date reported.

core_rus12__yearly_loans#

Noticed any irregularities with this table? File a report.

Annual time series of loans provided by RUS borrowers.

Most-recent data:

2021

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part H - Section F - Subsections II & IV)

Primary key:

This table has no primary key. This table has no primary key because some borrowers report multiple loan values from the same entity in a given year.

Additional Details#

This table also includes loan guarantees where the RUS borrower backs a loan from another entity and is therefore liable to pay any remaining balance should the original borrower default.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_loans.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_loans.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus12__yearly_loans.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_loans.parquet"
)

Columns#

Field Name

Type

Description

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

for_rural_development

boolean

Whether or not the investment or loan is for rural development. This includes investments in any/all types of projects or products that were made to improve the economy and/or quality of life in the specified area.

is_loan_guarantee

boolean

Indicates a third-party loan that the reporting utility (referred as a borrower) has co-signed, taking on responsibility for repayment if the primary borrower defaults.

loan_balance

number

The amount of money still owned on a loan at the end of the reporting year.

loan_maturity_date

date

The date on which a loan is scheduled to be fully paid.

loan_original_amount

number

The original amount of a loan.

loan_recipient

string

The organization that received a loan.

report_date

date

Date reported.

core_rus12__yearly_long_term_debt#

Noticed any irregularities with this table? File a report.

Annual time series of long-term debt and debt service requirements for RUS borrowers.

Most-recent data:

2021

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part H - Section H)

Primary key:

This table has no primary key. This table has no primary key because some borrowers report multiple debt values from the same entity in a given year.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_long_term_debt.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_long_term_debt.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus12__yearly_long_term_debt.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_long_term_debt.parquet"
)

Columns#

Field Name

Type

Description

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

debt_description

string

Description of debt or loan. On the original form, there are nine provided descriptions and a section to add other free-form descriptions.

debt_ending_balance

number

The amount of principal still owned on the debt at the end of the report year.

debt_interest

number

The interest expense on the debt for the report year.

debt_principal

number

The principal paid on the debt during the report year.

debt_total

number

The total amount of debt.

report_date

date

Date reported.

core_rus12__yearly_meeting_and_board#

Noticed any irregularities with this table? File a report.

Annual time series of annual meeting and board information for RUS borrowers.

Most-recent data:

2021

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part H - Section I)

Primary key:

report_date, borrower_id_rus

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_meeting_and_board.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_meeting_and_board.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus12__yearly_meeting_and_board.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_meeting_and_board.parquet"
)

Columns#

Field Name

Type

Description

board_members_num

integer

The total number of board members.

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

does_manager_have_written_contract

boolean

Whether or not the RUS borrower’s manager has a written contract.

fees_and_expenses_for_board_members

integer

The total amount of fees and expenses for board members.

last_annual_meeting_date

datetime

The date of the last annual meeting.

members_num

integer

The total number of members.

members_present_at_meeting_num

integer

The number of members present at the last annual meeting.

members_voting_by_proxy_or_mail_num

integer

The number of members voting by mail or by proxy.

report_date

date

Date reported.

was_quorum_present

boolean

Whether or not quorum was met.

core_rus12__yearly_non_utility_plant_changes#

Noticed any irregularities with this table? File a report.

Annual time series of changes in non-utility plant.

Most-recent data:

2021

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part H - Section C)

Primary key:

report_date, borrower_id_rus, non_utility_plant_item

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_non_utility_plant_changes.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_non_utility_plant_changes.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus12__yearly_non_utility_plant_changes.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_non_utility_plant_changes.parquet"
)

Columns#

Field Name

Type

Description

additions

number

Cost of acquisition of items classified within the account.

adjustments_and_transfers

number

Amount of adjustments and transfers within a class of assets.

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

ending_balance

number

Account balance at end of year.

non_utility_plant_item

string

Category describing non-utility plant asset items.

report_date

date

Date reported.

retirements

number

Cost of disposal of items classified within the account.

core_rus12__yearly_plant_costs#

Noticed any irregularities with this table? File a report.

Annual time series of costs of net energy generated by plant.

Most-recent data:

2021

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part F - Section D)

Primary key:

This table has no primary key. This table has no primary key because there is one plant (Walter Scott) that has duplicate records every year. Based on other RUS-12 tables, it is likely that these duplicate records are the borrowers’ ownership portion of the plant as well as the total plant but there is no clear indication in this table.

The primary key of this table otherwise would be: [‘report_date’, ‘borrower_id_rus’, ‘plant_name_rus’, ‘cost_group’, ‘cost_type’].

Usage Warnings#

  • Some columns contain subtotals; use caution when choosing columns to aggregate.

Additional Details#

The cost column in this table is expected to be largely non-null, the cost_per_mwh and cost_per_mmbtu columns only apply to some cost_type’s and even plant_type’s and thus are expected to contain many nulls.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_plant_costs.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_plant_costs.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus12__yearly_plant_costs.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_plant_costs.parquet"
)

Columns#

Field Name

Type

Description

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

cost

number

Cost value.

cost_group

string

High-level category of cost type.

cost_per_mmbtu

number

Unit cost of energy production in cost per MMBTU

cost_per_mwh

number

Unit cost of energy production in cost per MWh

cost_type

string

Detailed category of cost type.

is_total

boolean

Whether or not this record represents a total.

plant_name_rus

string

Name of the plant as reported to RUS.

plant_type

string

Type of plant.

report_date

date

Date reported.

core_rus12__yearly_plant_factors_and_maximum_demand#

Noticed any irregularities with this table? File a report.

Annual time series of plant factors and maximum demand for plants owned by RUS borrowers.

Most-recent data:

2021

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Parts D, E, F IC, F CC, G - Section C)

Primary key:

This table has no primary key. The primary key for this table would be report_date, borrower_id_rus, plant_name_rus, and plant_type but the plant_type column is not reported before 2009 causing lots of duplicate records. Because none of the other fields are static year-over-year, there is no reliable way to backfill the plant_type column in instances where there are different plant_types reported for the same plant in later years.

Additional Details#

The plant_type field was only included in the data starting in 2009. We backfill records with only one reported plant_type per borrower/plant name/year. Records with multiple plant_types per borrower/plant name/year are left with null plant_type because there is no reliable way to determine which plant_type is associated with which record back in time. Plants who’s names are reported differently pre and post 2009 will also miss being backfilled.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_plant_factors_and_maximum_demand.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_plant_factors_and_maximum_demand.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus12__yearly_plant_factors_and_maximum_demand.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_plant_factors_and_maximum_demand.parquet"
)

Columns#

Field Name

Type

Description

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

capacity_factor

number

Fraction of potential generation that was actually reported for a plant part. Energy generated over time period / nameplate capacity * time period (hours/years/etc.).

capacity_factor_running

number

Fraction of potential generation over the time period a plant was in operation. Energy generated over time period / nameplate capacity * time period (hours/years/etc.).

load_factor

number

Fraction of consumption vs demand reported for a plant over a given timeframe. Energy consumed over time period / peak demand * time period (hours/years/etc.).

peak_gross_demand_mw

number

The highest average power output recorded over any single 15 minute interval during the reporting period.

peak_gross_demand_nameplate_mw

number

The theoretical or nameplate peak the plant could produce under the best operating conditions during the reporting period.

plant_name_rus

string

Name of the plant as reported to RUS.

plant_type

string

Type of plant.

report_date

date

Date reported.

core_rus12__yearly_plant_labor#

Noticed any irregularities with this table? File a report.

Annual time series of labor and payroll information for plants owned by RUS borrowers.

Most-recent data:

2021

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Parts D, E, F, G - Section B)

Primary key:

This table has no primary key. The primary key should be report_date, borrower_id_rus, plant_name_rus, and plant_type, but this table did not report plant_type before 2009 and there are respondents who report multiple rows per plant pre-2009. The data cannot be backfilled because there is no way to distinguish between duplicate rows pre-2009.

Usage Warnings#

  • Some columns contain subtotals; use caution when choosing columns to aggregate.

Additional Details#

Note the lack of plant_type pre-2009 leading to a lack of reliable primary keys.

Plant-level data from core_rus12__yearly_plant_operations_by_plant can be matched to data from this table, but you must also join this data usingthe borrower_id_rus column as well as the report_date and plant_name. This is because the core_rus12__yearly_plant_operations_by_plant table includes records from each plant owner. The RUS instructions note that:

you should use caution when using total plant data since there are cases where more than one Borrower shares units at the same plant which means that you will be getting duplicate plant total records (and there is no guarantee that the total plant records entered by two borrowers for the same plant will be identical).

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_plant_labor.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_plant_labor.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus12__yearly_plant_labor.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_plant_labor.parquet"
)

Columns#

Field Name

Type

Description

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

employee_hours_worked_total

number

Total number of hours worked by employees.

employees_full_time_num

integer

Number of employees hired full-time for normal operations of the system.

employees_part_time_num

integer

Number employees regularly employed on a part-time basis. Exclude employees hired for short periods of time to complete special jobs.

payroll_maintenance

number

The amount of payroll spent on plant maintenance.

payroll_operations

number

The amount of payroll spent on plant operations.

payroll_other_accounts

number

The amount of plant payroll spent on accounts other than maintenance and operations.

plant_name_rus

string

Name of the plant as reported to RUS.

plant_type

string

Type of plant.

report_date

date

Date reported.

core_rus12__yearly_plant_operations_by_borrower#

Noticed any irregularities with this table? File a report.

Annual time series of borrower portion of plant operational data including fuel consumption and operational hours.

Most-recent data:

2021

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part D, E, F (CC), F (IC) & G - Section A)

Primary key:

This table has no primary key. This table has no primary key because there are a handful of plants that have duplicate records. The primary key of this table otherwise would be: [report_date, borrower_id_rus, plant_name_rus, plant_name_rus, unit_id_rus, plant_type, is_full_ownership_portion, is_partly_owned_by_borrower].

Additional Details#

The data in this table comes from five different portions of RUS 12 corresponding to different plant types (steam, hydroelectric, combined_cycle, internal_combustion and nuclear). The original form includes plant operations data for each plant type reported in multiple ways: with records corresponding to the portion of plants that borrowers own as well as the whole plant. We split these records into two tables, core_rus12__yearly_plant_operations_by_borrower, which contains records reported at the borrower level, and core_rus12__yearly_plant_operations_by_plant, which contains records reported at the plant level.Records that are wholly owned by one borrower show up in both core_rus12__yearly_plant_operations_by_borrower and core_rus12__yearly_plant_operations_by_plant.

There are two boolean columns used to delineate which records are associated with the borrowers’ share vs the whole plant - which is documented in _OR_PowerSupply Plant File Documentation.rtf in the newer years in the RUS 12 archive. One of these two fields - is_partly_owned_by_borrower - was not reported before 2009. For the pre-2009 years, we assume that all records that report TRUE for is_full_ownership_portion should end up in the by-plant table while all records should end up in the by-borrower portion of the table.Like the post-2009 records, this involves records from the original tables ending up in both of these PUDL tables.

Also there are two cleaning steps that we performed that alter the original data slightly:

  • One dropped with unexpected ownership label and duplicate data: There is a Wisdom steam plant record that is labeled to be both fully owned by borrower and partly owned for one year, which is an unexpected combo based on the _OR_PowerSupply Plant File Documentation.rst documentation file in the rus12 archive. Luckily this plant has exactly the same records as the other Wisdom steam plant that year with more expected ownership labels.* Replaced two string values for unit_id_rus: There are two instances from 2018 of unit_id_rus’s that have string values in them - WSL GT 12 and WSL ST 10. Based on pre-cleaned data, we were able to clearly identify that we can use just the numeric values in these bad strings - 12 and 10 respectively. This enables us to have an integer type for this unit_id_rus column.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_plant_operations_by_borrower.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_plant_operations_by_borrower.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus12__yearly_plant_operations_by_borrower.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_plant_operations_by_borrower.parquet"
)

Columns#

Field Name

Type

Description

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

capacity_mw

number

Total installed (nameplate) capacity, in megawatts.

fuel_consumption_coal_lbs

number

Annual pounds of coal consumed for fuel.This field is only reported for plant_type steam.

fuel_consumption_gas_cubic_feet

number

Annual cubic feet of natural gas consumed for fuel.This field is only reported for plant_type’s combined_cycle, combined_cycle and steam.

fuel_consumption_oil_gallons

number

Annual gallons of oil consumed for fuel.This field is only reported for plant_type’s combined_cycle, combined_cycle and steam.

fuel_consumption_other

number

Annual other fuel consumed. Neither units nor type of fuel are documented.This field is only reported for plant_type’s combined_cycle, combined_cycle and steam.

gross_generation_mwh

number

Gross electricity generation for the specified period in megawatt-hours (MWh).

is_full_ownership_portion

boolean

Whether or not the plant record represents the full plant - regardless of whether its fully owned by the borrower.

is_partly_owned_by_borrower

boolean

Whether or not the plant record is partially owned by the borrower. This column was not reported before 2009.

operating_hours_in_service

number

Number of operating hours in service.

operating_hours_on_standby

number

Number of operating hours on standby.

operating_hours_out_of_service_scheduled

number

Number of operating hours out of service which were scheduled.

operating_hours_out_of_service_unscheduled

number

Number of operating hours out of service which were unscheduled.

ownership_pct

number

Percentage of the plant owned by the respondent.

plant_name_rus

string

Name of the plant as reported to RUS.

plant_type

string

Type of plant.

report_date

date

Date reported.

times_started

number

Number of times the plant was started. This field is only reported for plant_type’s steam and nuclear.

unit_id_rus

integer

RUS-assigned unit identification code.

core_rus12__yearly_plant_operations_by_plant#

Noticed any irregularities with this table? File a report.

Annual time series of whole plant operational data including fuel consumption and operational hours.

Most-recent data:

2021

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part D, E, F (CC), F (IC) & G - Section A)

Primary key:

This table has no primary key. This table has no primary key because there are a handful of plants that have duplicate records. The primary key of this table otherwise would be: [report_date, borrower_id_rus, plant_name_rus, plant_name_rus, unit_id_rus, plant_type, is_full_ownership_portion, is_partly_owned_by_borrower].

Additional Details#

The data in this table comes from five different portions of RUS 12 corresponding to different plant types (steam, hydroelectric, combined_cycle, internal_combustion and nuclear). The original form includes plant operations data for each plant type reported in multiple ways: with records corresponding to the portion of plants that borrowers own as well as the whole plant. We split these records into two tables, core_rus12__yearly_plant_operations_by_borrower, which contains records reported at the borrower level, and core_rus12__yearly_plant_operations_by_plant, which contains records reported at the plant level.Records that are wholly owned by one borrower show up in both core_rus12__yearly_plant_operations_by_borrower and core_rus12__yearly_plant_operations_by_plant.

There are two boolean columns used to delineate which records are associated with the borrowers’ share vs the whole plant - which is documented in _OR_PowerSupply Plant File Documentation.rtf in the newer years in the RUS 12 archive. One of these two fields - is_partly_owned_by_borrower - was not reported before 2009. For the pre-2009 years, we assume that all records that report TRUE for is_full_ownership_portion should end up in the by-plant table while all records should end up in the by-borrower portion of the table.Like the post-2009 records, this involves records from the original tables ending up in both of these PUDL tables.

Also there are two cleaning steps that we performed that alter the original data slightly:

  • One dropped with unexpected ownership label and duplicate data: There is a Wisdom steam plant record that is labeled to be both fully owned by borrower and partly owned for one year, which is an unexpected combo based on the _OR_PowerSupply Plant File Documentation.rst documentation file in the rus12 archive. Luckily this plant has exactly the same records as the other Wisdom steam plant that year with more expected ownership labels.* Replaced two string values for unit_id_rus: There are two instances from 2018 of unit_id_rus’s that have string values in them - WSL GT 12 and WSL ST 10. Based on pre-cleaned data, we were able to clearly identify that we can use just the numeric values in these bad strings - 12 and 10 respectively. This enables us to have an integer type for this unit_id_rus column..

Plant-level data can be matched to the :ref:core_rus12__yearly_plant_labor` and forthcoming core_rus12__yearly_plant_factors_and_maximum_demand, which report Section B and C data for all plants. The RUS instructions note that:

you should use caution when using total plant data since there are cases where more than one Borrower shares units at the same plant which means that you will be getting duplicate plant total records (and there is no guarantee that the total plant records entered by two borrowers for the same plant will be identical).

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_plant_operations_by_plant.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_plant_operations_by_plant.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus12__yearly_plant_operations_by_plant.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_plant_operations_by_plant.parquet"
)

Columns#

Field Name

Type

Description

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

capacity_mw

number

Total installed (nameplate) capacity, in megawatts.

fuel_consumption_coal_lbs

number

Annual pounds of coal consumed for fuel.This field is only reported for plant_type steam.

fuel_consumption_gas_cubic_feet

number

Annual cubic feet of natural gas consumed for fuel.This field is only reported for plant_type’s combined_cycle, combined_cycle and steam.

fuel_consumption_oil_gallons

number

Annual gallons of oil consumed for fuel.This field is only reported for plant_type’s combined_cycle, combined_cycle and steam.

fuel_consumption_other

number

Annual other fuel consumed. Neither units nor type of fuel are documented.This field is only reported for plant_type’s combined_cycle, combined_cycle and steam.

gross_generation_mwh

number

Gross electricity generation for the specified period in megawatt-hours (MWh).

is_partly_owned_by_borrower

boolean

Whether or not the plant record is partially owned by the borrower. This column was not reported before 2009.

operating_hours_in_service

number

Number of operating hours in service.

operating_hours_on_standby

number

Number of operating hours on standby.

operating_hours_out_of_service_scheduled

number

Number of operating hours out of service which were scheduled.

operating_hours_out_of_service_unscheduled

number

Number of operating hours out of service which were unscheduled.

ownership_pct

number

Percentage of the plant owned by the respondent.

plant_name_rus

string

Name of the plant as reported to RUS.

plant_type

string

Type of plant.

report_date

date

Date reported.

times_started

number

Number of times the plant was started. This field is only reported for plant_type’s steam and nuclear.

unit_id_rus

integer

RUS-assigned unit identification code.

core_rus12__yearly_renewable_plants#

Noticed any irregularities with this table? File a report.

Annual time series of renewable energy plant generation information for RUS borrowers.

Most-recent data:

2021

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part C RE)

Primary key:

report_date, borrower_id_rus, plant_name_rus, prime_mover_id

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_renewable_plants.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_renewable_plants.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus12__yearly_renewable_plants.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_renewable_plants.parquet"
)

Columns#

Field Name

Type

Description

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

capacity_factor

number

Fraction of potential generation that was actually reported for a plant part. Energy generated over time period / nameplate capacity * time period (hours/years/etc.).

capacity_mw

number

Total installed (nameplate) capacity, in megawatts.

comments

string

General comments field.

employees_num

integer

Number of employees.

invested

number

The amount of money invested.

net_generation_mwh

number

Net electricity generation for the specified period in megawatt-hours (MWh).

opex_per_mwh

number

Total production expenses (USD per MWh generated).

ownership_pct

number

Percentage of the plant owned by the respondent.

plant_name_rus

string

Name of the plant as reported to RUS.

power_cost_per_mwh

number

The cost of power per mwh.

primary_renewable_fuel_type

string

Primary renewable fuel type used by the plant.

primary_renewable_fuel_type_id

integer

Unique numeric identifier for each renewable fuel type.

prime_mover_id

integer

Unique numeric identifier for each prime mover type used by RUS borrowers.

prime_mover_type

string

Type of prime mover (e.g. Hydro, Internal Combustion).

renewable_fuel_pct

number

Percentage of renewable fuel used.

report_date

date

Date reported.

rus_funding

number

Amount of funding received from the Rural Utilities Service (RUS).

core_rus12__yearly_sources_and_distribution#

Noticed any irregularities with this table? File a report.

Annual time series of MWh and cost of energy sources and distribution by RUS borrowers.

Most-recent data:

2021

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part C)

Primary key:

report_date, borrower_id_rus, source_of_energy

Additional Details#

See the sources_and_distribution_by_plant_type table for a breakdown of plant-type-specific cost, capacity, plant_num, and net_energy_received values. Also note that there are several source_of_energy values that don’t have a corresponding cost value.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_sources_and_distribution.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_sources_and_distribution.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus12__yearly_sources_and_distribution.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_sources_and_distribution.parquet"
)

Columns#

Field Name

Type

Description

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

cost

number

Cost value.

net_energy_received_mwh

number

The net amount of energy received into the system.

report_date

date

Date reported.

source_of_energy

string

The source of energy (not plant type).

core_rus12__yearly_sources_and_distribution_by_plant_type#

Noticed any irregularities with this table? File a report.

Annual time series of capacity, plant num, MWh, and cost of energy by plant type for RUS borrowers.

Most-recent data:

2021

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part C)

Primary key:

report_date, borrower_id_rus, plant_type

Additional Details#

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_sources_and_distribution_by_plant_type.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_sources_and_distribution_by_plant_type.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus12__yearly_sources_and_distribution_by_plant_type.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_sources_and_distribution_by_plant_type.parquet"
)

Columns#

Field Name

Type

Description

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

capacity_mw

number

Total installed (nameplate) capacity, in megawatts.

cost

number

Cost value.

net_energy_received_mwh

number

The net amount of energy received into the system.

plant_num

integer

Number of plants.

plant_type

string

Type of plant.

report_date

date

Date reported.

core_rus12__yearly_statement_of_operations#

Noticed any irregularities with this table? File a report.

Annual time series of opex and cost of electric service for RUS borrowers by time period.

Most-recent data:

2021

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part A - Section A)

Primary key:

report_date, borrower_id_rus, opex_group, opex_type

Usage Warnings#

  • Some columns contain subtotals; use caution when choosing columns to aggregate.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_statement_of_operations.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_statement_of_operations.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus12__yearly_statement_of_operations.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_statement_of_operations.parquet"
)

Columns#

Field Name

Type

Description

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

is_total

boolean

Whether or not this record represents a total.

opex_group

string

High level section from the statement of operations table. Most of these types have subcomponents broken out in the opex_type column.

opex_report_month

number

Amount of operational expense, cost or income during the report month.

opex_type

string

Type of item from the statement of operations.

opex_ytd

number

The year-to-date amount of operational expense, cost or income.

opex_ytd_budget

number

The year-to-date budget for amount of operational expense, cost or income.

report_date

date

Date reported.

core_rus12__yearly_utility_plant_changes#

Noticed any irregularities with this table? File a report.

Annual time series of changes in utility plant

Most-recent data:

2021

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

USDA RUS Form 12 – Financial and Operating Report: Electric Power Supply (Part H - Section A)

Primary key:

report_date, borrower_id_rus, utility_plant_group, utility_plant_item

Usage Warnings#

  • Some columns contain subtotals; use caution when choosing columns to aggregate.

Additional Details#

There are a few layers of nesting in this table, which can be distinguished by the categories in the utility_plant_group column:

  • intangible_plant: only has one total utility_plant_item

  • production_plant: has several sub-component utility_plant_item and a standard total

  • transmission_plant: has several sub-component utility_plant_item and a standard total

  • distribution_plant: has several sub-component utility_plant_item and a standard total

  • electric_plant_in_service: has two sub-components and a total which includes those sub-components and the totals of the above categories

  • utility_plant_in_service: has several sub-components and the total is the sum of those sub-components as well as the total of electric_plant_in_service

  • total_utility_plant: has one sub-component and the total is the sum of the sub-component and utility_plant_in_service

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_utility_plant_changes.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_utility_plant_changes.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus12__yearly_utility_plant_changes.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus12__yearly_utility_plant_changes.parquet"
)

Columns#

Field Name

Type

Description

additions

number

Cost of acquisition of items classified within the account.

adjustments_and_transfers

number

Amount of adjustments and transfers within a class of assets.

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

ending_balance

number

Account balance at end of year.

is_total

boolean

Whether or not this record represents a total.

report_date

date

Date reported.

retirements

number

Cost of disposal of items classified within the account.

utility_plant_group

string

High-level category of utility plant asset type.

utility_plant_item

string

Sub-category of utility_plant_group describing utility plant asset item.

core_rus7__entity_borrowers#

Noticed any irregularities with this table? File a report.

Entity table containing static information about active RUS borrowers.

Most-recent data:

2021

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

USDA RUS Form 7 – Financial and Operating Report: Electric Distribution

Primary key:

borrower_id_rus

Usage Warnings#

  • Borrower name data has been drawn from reported values over multiple years and tables of data which are not always consistent with each other. PUDL chooses the most consistent borrower name to facilitate cross-referencing even if that means some values will differ from the raw sources.

Additional Details#

This table contains canonical values for borrowers are set. It contains values which are expected to remain fixed over time.RUS reports many attributes in many different tables across throughout RUS-7 and RUS-12. In order to compile tidy, well-normalized database tables, PUDL collects all instances of these values and and chooses a canonical value. By default, PUDL chooses the most consistently reported value of a given attribute as long as it is at least 70% of the given instances reported. For the borrower_name_rus PUDL chooses the most consistently reported value regardless of if it meets this 70% threshold so that all borrowers will have a name. We chose this because most name changes were insignificant (eg. “and” changed to “&” or “coop” changed to “cooperative”). All tables downstream of this one inherit the canonical values established here..

This table contains all of the Active Distribution Borrowers as of each report year who were eligible to report to RUS Form 7. If these Borrowers have reported to RUS they will have records in the enclosed data tables, however a small number of these Borrowers did not report for various reasons and these Borrowers will not be represented in any of the other tables.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__entity_borrowers.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__entity_borrowers.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus7__entity_borrowers.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__entity_borrowers.parquet"
)

Columns#

Field Name

Type

Description

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

borrower_name_rus

string

The name of the RUS (Rural Utilities Service) borrower.

state

string

Two letter US state or territory abbreviation, or ISO 3166-1 alpha-two code for Micronesia and the Marshall Islands.

core_rus7__yearly_balance_sheet_assets#

Noticed any irregularities with this table? File a report.

Annual time series of assets and other debts from the balance sheet.

Most-recent data:

2021

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part C)

Primary key:

report_date, borrower_id_rus, asset_type

Usage Warnings#

  • Some columns contain subtotals; use caution when choosing columns to aggregate.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_balance_sheet_assets.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_balance_sheet_assets.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus7__yearly_balance_sheet_assets.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_balance_sheet_assets.parquet"
)

Columns#

Field Name

Type

Description

asset_type

string

Type of asset being reported to the core_rus7__yearly_balance_sheet_assets table.

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

ending_balance

number

Account balance at end of year.

is_total

boolean

Whether or not this record represents a total.

report_date

date

Date reported.

core_rus7__yearly_balance_sheet_liabilities#

Noticed any irregularities with this table? File a report.

Annual time series of liabilities and other credits from the balance sheet.

Most-recent data:

2021

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part C)

Primary key:

report_date, borrower_id_rus, liability_type

Usage Warnings#

  • Some columns contain subtotals; use caution when choosing columns to aggregate.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_balance_sheet_liabilities.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_balance_sheet_liabilities.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus7__yearly_balance_sheet_liabilities.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_balance_sheet_liabilities.parquet"
)

Columns#

Field Name

Type

Description

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

ending_balance

number

Account balance at end of year.

is_total

boolean

Whether or not this record represents a total.

liability_type

string

Type of liability being reported to the core_rus7__yearly_balance_sheet_liabilities table.

report_date

date

Date reported.

core_rus7__yearly_customer_energy_efficiency_and_conservation_loans#

Noticed any irregularities with this table? File a report.

Annual time series of the repayment status of loans made by a borrower to customers for investments in energy efficiency and conservation initiatives.

Most-recent data:

2021

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part J)

Primary key:

report_date, borrower_id_rus, loan_status

Additional Details#

Energy Resources Conservation (ERC) loans are defined by the RUS as loans made by an RUS borrower to its consumers for the cost of labor and materials for the following energy conservation measures: caulking, weather-stripping, ceiling insulation, wall insulation, floor insulation, duct insulation, pipe insulation, water heater insulation storm windows, thermal windows, storm or thermal doors, clock thermostats and attic ventilation fans.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_customer_energy_efficiency_and_conservation_loans.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_customer_energy_efficiency_and_conservation_loans.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus7__yearly_customer_energy_efficiency_and_conservation_loans.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_customer_energy_efficiency_and_conservation_loans.parquet"
)

Columns#

Field Name

Type

Description

actual_pct

number

Observed percentage.

anticipated_pct

number

Expected percentage.

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

loan_status

string

The repayment status of a loan.

report_date

date

Date reported.

ytd_dollars

number

Balance this current year, in U.S. dollars.

core_rus7__yearly_distribution_services#

Noticed any irregularities with this table? File a report.

Annual time series of distribution services.

Most-recent data:

2021

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part B)

Primary key:

report_date, borrower_id_rus, service_status

Usage Warnings#

  • Some columns contain subtotals; use caution when choosing columns to aggregate.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_distribution_services.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_distribution_services.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus7__yearly_distribution_services.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_distribution_services.parquet"
)

Columns#

Field Name

Type

Description

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

is_total

boolean

Whether or not this record represents a total.

report_date

date

Date reported.

service_status

string

Status of services (e.g., idle, retired) in report period. Idle services exclude seasonals.

services

number

Number of services in system at end of year.

core_rus7__yearly_employee_statistics#

Noticed any irregularities with this table? File a report.

Annual time series of statistics about employment and payroll.

Most-recent data:

2021

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part H)

Primary key:

report_date, borrower_id_rus

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_employee_statistics.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_employee_statistics.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus7__yearly_employee_statistics.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_employee_statistics.parquet"
)

Columns#

Field Name

Type

Description

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

employee_hours_worked_over_time

integer

The number of overtime hours worked by employees.

employee_hours_worked_regular_time

integer

The number of regular (non-overtime) hours worked by employees.

employees_fte_num

integer

The number of full time employees.

payroll_capitalized

integer

The amount of payroll spent that was capitalized.

payroll_expensed

integer

The amount of payroll spent that was expensed.

payroll_other

integer

The amount of payroll spent that was funded by other means - not capitalized or expensed.

report_date

date

Date reported.

core_rus7__yearly_energy_efficiency#

Noticed any irregularities with this table? File a report.

Annual time series of investments in and impacts of energy efficiency programs.

Most-recent data:

2021

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part P)

Primary key:

report_date, borrower_id_rus, customer_class, observation_period

Usage Warnings#

  • The savings_mmbtu likely contains values with incorrect units.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_energy_efficiency.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_energy_efficiency.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus7__yearly_energy_efficiency.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_energy_efficiency.parquet"
)

Columns#

Field Name

Type

Description

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

customer_class

string

High level categorization of customer type.

customers_num

number

Number of customers.

invested

number

The amount of money invested.

observation_period

string

The date range that any given record pertains to. Ex: ‘december’ implies that this record covers the month of December only, while ‘avg’ implies this record pertains to the average of the reporting period.

report_date

date

Date reported.

savings_mmbtu

number

The estimated amount of energy savings from energy efficiency programs. Warning: We found values much larger than expected that we have not yet cleaned - this is likely a reporting unit error.

core_rus7__yearly_energy_purchased#

Noticed any irregularities with this table? File a report.

Annual time series of energy purchased by RUS borrowers.

Most-recent data:

2020

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part K)

Primary key:

This table has no primary key. The primary key would probably be report_date, borrower_id_rus, fuel_type_code, supplier_code_rus, renewable_energy_program if not for certain EIA utilities represented as Miscellaneous (supplier code 700000).

Additional Details#

A number of records report purchases from an unspecified entity. These fields all have reported utility names beginning with an *, including *Miscellaneous, *Adjustments, and *Residential Renewable Supplier.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_energy_purchased.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_energy_purchased.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus7__yearly_energy_purchased.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_energy_purchased.parquet"
)

Columns#

Field Name

Type

Description

average_energy_cost_dollars_per_mwh

number

The average cost of energy per MWh.

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

comments

string

General comments field.

fuel_cost_adjustment

number

The variable fuel surcharge component of a distribution cooperative’s wholesale purchased power bill, reflecting pass-through of actual fuel cost fluctuations from the supplying utility, reported separately from base power charges and wheeling costs. Included in the total cost.

fuel_type_code_rus

integer

Unique numeric identifier for RUS fuel types.

is_supplier_eia_respondent

boolean

Whether the utility supplying energy to a RUS borrower is an EIA respondent.

purchased_energy_cost_total

number

The total cost of purchased energy. Includes fuel cost adjustment and wheeling and other charges.

purchased_mwh

number

The total electricity purchased.

report_date

date

Date reported.

supplier_code_rus

string

Unique numeric identifier for the utility supplying energy to a RUS borrower.

utility_name_eia

string

The name of the utility.

wheeling_and_other_charges

number

The cost of wheeling and other charges or credits related to fuel. Included in the total cost.

core_rus7__yearly_external_financial_risk_ratio#

Noticed any irregularities with this table? File a report.

Annual time series of ratio of investments and loan guarantee balances to total utility plant assets.

Most-recent data:

2021

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part Q - Section III)

Primary key:

report_date, borrower_id_rus

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_external_financial_risk_ratio.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_external_financial_risk_ratio.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus7__yearly_external_financial_risk_ratio.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_external_financial_risk_ratio.parquet"
)

Columns#

Field Name

Type

Description

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

external_financial_risk_ratio

number

total investments + loan guarantee balances / total utility plant assets. This ratio shows how much a utility is financially exposed to outside entities relative to its own assets.

report_date

date

Date reported.

core_rus7__yearly_investments#

Noticed any irregularities with this table? File a report.

Annual time series of investments, loan guarantees and loans.

Most-recent data:

2021

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part P - Section I)

Primary key:

This table has no primary key. This is a list of all investments or loans in each year and borrowers can have multiple records with the same investment_description.

Additional Details#

Reporting of investments is required by 7 CFR 1717, Subpart N. Investment categories reported on this Part correspond to Balance Sheet items in Part C.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_investments.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_investments.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus7__yearly_investments.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_investments.parquet"
)

Columns#

Field Name

Type

Description

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

excluded_investments

number

Excluded investment.

for_rural_development

boolean

Whether or not the investment or loan is for rural development. This includes investments in any/all types of projects or products that were made to improve the economy and/or quality of life in the specified area.

included_investments

number

Included investment.

income_or_loss

number

Income or loss from investment.

investment_description

string

Description of investment. This is a free-form text field and thus contains a wide variety of values.

investment_type_code

integer

Investment type code.

report_date

date

Date reported.

core_rus7__yearly_loans#

Noticed any irregularities with this table? File a report.

Annual time series of loans provided by RUS borrowers.

Most-recent data:

2021

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part Q - Sections II & IV)

Primary key:

This table has no primary key. Borrowers may receive multiple loans from lending_organizations in a given year.

Additional Details#

This table also includes loan guarantees where the RUS borrower backs a loan from another entity and is therefore liable to pay any remaining balance should the original borrower default.

In 2006, the loan maturity date for borrower ND0051’s loan from ERC - Paulson, David was reported as 2/8/2820. There is no clear way to determine the correct maturity date given that 2006 is the first year of data we have and the same loan does not appear in future years. For this reason we’ve nulled the date.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_loans.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_loans.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus7__yearly_loans.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_loans.parquet"
)

Columns#

Field Name

Type

Description

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

for_rural_development

boolean

Whether or not the investment or loan is for rural development. This includes investments in any/all types of projects or products that were made to improve the economy and/or quality of life in the specified area.

is_loan_guarantee

boolean

Indicates a third-party loan that the reporting utility (referred as a borrower) has co-signed, taking on responsibility for repayment if the primary borrower defaults.

loan_balance

number

The amount of money still owned on a loan at the end of the reporting year.

loan_maturity_date

date

The date on which a loan is scheduled to be fully paid.

loan_original_amount

number

The original amount of a loan.

loan_recipient

string

The organization that received a loan.

report_date

date

Date reported.

core_rus7__yearly_long_term_debt#

Noticed any irregularities with this table? File a report.

Annual time series of long term debt and debt service requirements.

Most-recent data:

2021

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part N)

Primary key:

This table has no primary key. This table has no native primary key. It is a list of all debts in each year and borrowers can have multiple records with the same investment_description.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_long_term_debt.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_long_term_debt.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus7__yearly_long_term_debt.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_long_term_debt.parquet"
)

Columns#

Field Name

Type

Description

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

debt_description

string

Description of debt or loan. On the original form, there are nine provided descriptions and a section to add other free-form descriptions.

debt_ending_balance

number

The amount of principal still owned on the debt at the end of the report year.

debt_interest

number

The interest expense on the debt for the report year.

debt_principal

number

The principal paid on the debt during the report year.

debt_total

number

The total amount of debt.

report_date

date

Date reported.

core_rus7__yearly_long_term_leases#

Noticed any irregularities with this table? File a report.

Annual time series of long term leases by property type.

Most-recent data:

2021

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part L)

Primary key:

This table has no primary key. Borrowers may receive multiple leases from lending_organizations in a given year.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_long_term_leases.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_long_term_leases.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus7__yearly_long_term_leases.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_long_term_leases.parquet"
)

Columns#

Field Name

Type

Description

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

lending_organization

string

The organization that provided a lease or loan.

property_type

string

The type of property leased.

rental_cost_ytd

number

Year-to-date rental cost for leased property.

report_date

date

Date reported.

core_rus7__yearly_materials_and_supplies#

Noticed any irregularities with this table? File a report.

Annual time series of cost of electric vs. other materials that were purchased, salvaged, used, or sold.

Most-recent data:

2021

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part F)

Primary key:

report_date, borrower_id_rus, electric_or_other_materials

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_materials_and_supplies.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_materials_and_supplies.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus7__yearly_materials_and_supplies.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_materials_and_supplies.parquet"
)

Columns#

Field Name

Type

Description

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

electric_or_other_materials

string

Whether the cost is for electric materials or other materials

materials_adjustment

number

An adjustment value for the cost of materials and supplies.

materials_ending_balance

number

The balance at the end of the report year for materials and supplies.

materials_purchased

number

The cost of materials and supplies purchased.

materials_salvaged

number

The cost of materials and supplies salvaged.

materials_sold

number

The cost of materials and supplies sold.

materials_used

number

The cost of materials and supplies used.

report_date

date

Date reported.

core_rus7__yearly_meeting_and_board#

Noticed any irregularities with this table? File a report.

Annual time series of governance information about RUS borrowers’ annual member meetings as well as information about their board.

Most-recent data:

2021

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part M)

Primary key:

report_date, borrower_id_rus

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_meeting_and_board.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_meeting_and_board.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus7__yearly_meeting_and_board.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_meeting_and_board.parquet"
)

Columns#

Field Name

Type

Description

board_members_num

integer

The total number of board members.

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

does_manager_have_written_contract

boolean

Whether or not the RUS borrower’s manager has a written contract.

fees_and_expenses_for_board_members

integer

The total amount of fees and expenses for board members.

last_annual_meeting_date

datetime

The date of the last annual meeting.

members_num

integer

The total number of members.

members_present_at_meeting_num

integer

The number of members present at the last annual meeting.

members_voting_by_proxy_or_mail_num

integer

The number of members voting by mail or by proxy.

report_date

date

Date reported.

was_quorum_present

boolean

Whether or not quorum was met.

core_rus7__yearly_owed_by_customers#

Noticed any irregularities with this table? File a report.

Annual time series of debt owed by customers.

Most-recent data:

2021

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part J)

Primary key:

report_date, borrower_id_rus

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_owed_by_customers.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_owed_by_customers.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus7__yearly_owed_by_customers.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_owed_by_customers.parquet"
)

Columns#

Field Name

Type

Description

amount_due_over_60_days

number

Reported amount of dollars due over 60 days from consumers for electric service. Includes both connected and disconnected customers.

amount_written_off_ytd

number

Total charges due from consumers for electric service written off during the current year to Account 144.1, representing the write-off of uncollectible accounts.

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

report_date

date

Date reported.

core_rus7__yearly_patronage_capital#

Noticed any irregularities with this table? File a report.

Annual time series of patronage capital distributed and received.

Most-recent data:

2021

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part I)

Primary key:

report_date, borrower_id_rus, patronage_type

Usage Warnings#

  • Some columns contain subtotals; use caution when choosing columns to aggregate.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_patronage_capital.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_patronage_capital.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus7__yearly_patronage_capital.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_patronage_capital.parquet"
)

Columns#

Field Name

Type

Description

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

is_total

boolean

Whether or not this record represents a total.

patronage_cumulative

number

Amount of patronage distributed or received cumulatively. Received patronage capital is not reported cumulatively and thus will be null.

patronage_report_year

number

Amount of patronage distributed or received within report year.

patronage_type

string

Type of patronage capital distributed or received.

report_date

date

Date reported.

core_rus7__yearly_power_requirements#

Noticed any irregularities with this table? File a report.

Annual time series of power requirements - revenue and generation summary.

Most-recent data:

2021

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part O)

Primary key:

report_date, borrower_id_rus

Additional Details#

This table includes totals of electric sales and revenue which also appear in core_rus7__yearly_power_requirements_electric_sales with a customer_class of total. This table includes all other power requirements - not broken out by customer class, so we include these electric requirements in this table as well.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_power_requirements.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_power_requirements.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus7__yearly_power_requirements.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_power_requirements.parquet"
)

Columns#

Field Name

Type

Description

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

electric_sales_mwh

number

Total MWh Sold to electric sales. Total of lines 1b thru 9b on the original form.

electric_sales_revenue

integer

Total Revenue Received From Sales of Electric Energy. Total of lines 1c thru 9c on the original form.

generated_mwh

number

The total electricity generated.

interchange_mwh

number

The net interchange of electricity. The net amount of electricity exchanged in purchases and sales.

is_peak_coincident

boolean

Whether or not the peak_mw is coincident or non-coincident peak.

other_electric_revenue

integer

Electric revenue other than electric_sales_revenue.

own_use_mwh

number

The electricity in MWh used for the borrower’s own internal use.

peak_mw

number

The peak system MWh - the sum of all MW.

purchased_mwh

number

The total electricity purchased.

purchases_and_generation_cost

integer

The cost of purchases and generation of electricity.

report_date

date

Date reported.

transmission_revenue

integer

Transmission revenue.

core_rus7__yearly_power_requirements_electric_customers#

Noticed any irregularities with this table? File a report.

Annual time series of power requirements - number of customers served by customer type.

Most-recent data:

2021

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part O)

Primary key:

report_date, borrower_id_rus, customer_class, observation_period

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_power_requirements_electric_customers.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_power_requirements_electric_customers.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus7__yearly_power_requirements_electric_customers.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_power_requirements_electric_customers.parquet"
)

Columns#

Field Name

Type

Description

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

customer_class

string

High level categorization of customer type.

customers_num

number

Number of customers.

observation_period

string

The date range that any given record pertains to. Ex: ‘december’ implies that this record covers the month of December only, while ‘avg’ implies this record pertains to the average of the reporting period.

report_date

date

Date reported.

core_rus7__yearly_power_requirements_electric_sales#

Noticed any irregularities with this table? File a report.

Annual time series of power requirements - revenue and energy sold by customer type.

Most-recent data:

2021

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part O)

Primary key:

report_date, borrower_id_rus, customer_class

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_power_requirements_electric_sales.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_power_requirements_electric_sales.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus7__yearly_power_requirements_electric_sales.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_power_requirements_electric_sales.parquet"
)

Columns#

Field Name

Type

Description

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

customer_class

string

High level categorization of customer type.

report_date

date

Date reported.

revenue

number

Amount of revenue.

sales_mwh

number

Quantity of electricity sold in MWh.

core_rus7__yearly_service_interruptions#

Noticed any irregularities with this table? File a report.

Annual time series of service interruptions by cause.

Most-recent data:

2021

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part G)

Primary key:

report_date, borrower_id_rus, service_interruption_cause, observation_period

Usage Warnings#

  • Some columns contain subtotals; use caution when choosing columns to aggregate.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_service_interruptions.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_service_interruptions.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus7__yearly_service_interruptions.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_service_interruptions.parquet"
)

Columns#

Field Name

Type

Description

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

is_total

boolean

Whether or not this record represents a total.

observation_period

string

The date range that any given record pertains to. Ex: ‘december’ implies that this record covers the month of December only, while ‘avg’ implies this record pertains to the average of the reporting period.

report_date

date

Date reported.

saidi_minutes

number

Cumulative duration (minutes) of interruption for the average customer during the report year.

service_interruption_cause

string

Source of service interruption.

core_rus7__yearly_statement_of_operations#

Noticed any irregularities with this table? File a report.

Annual time series of opex and cost of electric service for RUS borrowers by time period.

Most-recent data:

2021

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part A)

Primary key:

report_date, borrower_id_rus, opex_group, opex_type

Usage Warnings#

  • Some columns contain subtotals; use caution when choosing columns to aggregate.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_statement_of_operations.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_statement_of_operations.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus7__yearly_statement_of_operations.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_statement_of_operations.parquet"
)

Columns#

Field Name

Type

Description

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

is_total

boolean

Whether or not this record represents a total.

opex_group

string

High level section from the statement of operations table. Most of these types have subcomponents broken out in the opex_type column.

opex_report_month

number

Amount of operational expense, cost or income during the report month.

opex_type

string

Type of item from the statement of operations.

opex_ytd

number

The year-to-date amount of operational expense, cost or income.

opex_ytd_budget

number

The year-to-date budget for amount of operational expense, cost or income.

report_date

date

Date reported.

core_rus7__yearly_transmission_and_distribution_mileage#

Noticed any irregularities with this table? File a report.

Annual time series of miles of transmission and distribution infrastructure.

Most-recent data:

2021

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part B)

Primary key:

report_date, borrower_id_rus, line_type

Usage Warnings#

  • Some columns contain subtotals; use caution when choosing columns to aggregate.

Additional Details#

Note that according to RUS, ‘underbuild in overhead lines or joint runs in underground installations do not increase the number of line miles except for distribution underbuild on transmission poles. In such cases, distribution pole line miles would be increased by the number of underbuild miles involved.’

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_transmission_and_distribution_mileage.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_transmission_and_distribution_mileage.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus7__yearly_transmission_and_distribution_mileage.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_transmission_and_distribution_mileage.parquet"
)

Columns#

Field Name

Type

Description

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

is_total

boolean

Whether or not this record represents a total.

line_type

string

The type of line mileage reported (e.g., transmission, overhead distribution).

miles

number

Line length at the end of the reported period, in miles.

report_date

date

Date reported.

core_rus7__yearly_utility_plant_changes#

Noticed any irregularities with this table? File a report.

Annual time series of changes in utility plant

Most-recent data:

2021

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

USDA RUS Form 7 – Financial and Operating Report: Electric Distribution (Part E)

Primary key:

report_date, borrower_id_rus, utility_plant_group, utility_plant_item

Usage Warnings#

  • Some columns contain subtotals; use caution when choosing columns to aggregate.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_utility_plant_changes.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_utility_plant_changes.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus7__yearly_utility_plant_changes.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus7__yearly_utility_plant_changes.parquet"
)

Columns#

Field Name

Type

Description

additions

number

Cost of acquisition of items classified within the account.

adjustments_and_transfers

number

Amount of adjustments and transfers within a class of assets.

borrower_id_rus

string

Unique identifier of RUS (Rural Utilities Service) borrower. These ID’s are structured as: two character state acronyms followed by four digits.

ending_balance

number

Account balance at end of year.

is_total

boolean

Whether or not this record represents a total.

report_date

date

Date reported.

retirements

number

Cost of disposal of items classified within the account.

utility_plant_group

string

High-level category of utility plant asset type.

utility_plant_item

string

Sub-category of utility_plant_group describing utility plant asset item.

core_rus__codes_fuel_types#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for fuel types in RUS.

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

RUS – Mix of multiple RUS forms

Primary key:

code

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus__codes_fuel_types.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus__codes_fuel_types.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus__codes_fuel_types.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus__codes_fuel_types.parquet"
)

Columns#

Field Name

Type

Description

code

string

Originally reported short code.

description

string

Long human-readable description of the meaning of a code/label.

core_rus__codes_investment_types#

Noticed any irregularities with this table? File a report.

Code table containing descriptions of categorical codes for investment types.

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

RUS – Mix of multiple RUS forms

Primary key:

code

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus__codes_investment_types.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus__codes_investment_types.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_rus__codes_investment_types.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_rus__codes_investment_types.parquet"
)

Columns#

Field Name

Type

Description

code

string

Originally reported short code.

description

string

Long human-readable description of the meaning of a code/label.

core_sec10k__assn_exhibit_21_subsidiaries_and_eia_utilities#

Noticed any irregularities with this table? File a report.

Association table providing connections between subsidiaries listed in Exhibit 21 and EIA utilities.

Most-recent data:

2023

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

U.S. Securities and Exchange Commission (SEC) Form 10-K

Primary key:

subsidiary_company_id_sec10k

Usage Warnings#

  • PUDL does not currently update its copy of this data.

Additional Details#

An Exhibit 21 subsidiary is considered matched to an EIA utility if their names are identical. Only subsidiaries that don’t file SEC 10-K themselves are included in this table. SEC 10-K filers have much more information available and are matched to EIA utilities using probabilistic record linkage in core_sec10k__assn_sec10k_filers_and_eia_utilities.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_sec10k__assn_exhibit_21_subsidiaries_and_eia_utilities.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_sec10k__assn_exhibit_21_subsidiaries_and_eia_utilities.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_sec10k__assn_exhibit_21_subsidiaries_and_eia_utilities.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_sec10k__assn_exhibit_21_subsidiaries_and_eia_utilities.parquet"
)

Columns#

Field Name

Type

Description

subsidiary_company_id_sec10k

string

PUDL-assigned ID for subsidiaries found in SEC 10-K Exhibit 21. The ID is created by concatenating the CIK of the company whose filing the subsidiary was found in, the subsidiary company’s name, and location of incorporation. It is not guaranteed to be stable across different releases of PUDL and so should never be hard-coded in analyses.

utility_id_eia

integer

The EIA Utility Identification number.

core_sec10k__assn_exhibit_21_subsidiaries_and_filers#

Noticed any irregularities with this table? File a report.

Association table providing connections between subsidiaries listed in Exhibit 21 and their SEC central index key, if the subsidiary also files Form 10-K.

Most-recent data:

2023

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

U.S. Securities and Exchange Commission (SEC) Form 10-K

Primary key:

subsidiary_company_id_sec10k

Usage Warnings#

  • PUDL does not currently update its copy of this data.

Additional Details#

Exhibit 21 subsidiaries and SEC 10-K filers are considered matched if they have identical names and the same location of incorporation.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_sec10k__assn_exhibit_21_subsidiaries_and_filers.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_sec10k__assn_exhibit_21_subsidiaries_and_filers.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_sec10k__assn_exhibit_21_subsidiaries_and_filers.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_sec10k__assn_exhibit_21_subsidiaries_and_filers.parquet"
)

Columns#

Field Name

Type

Description

central_index_key

string

Identifier of the company in SEC database.

subsidiary_company_id_sec10k

string

PUDL-assigned ID for subsidiaries found in SEC 10-K Exhibit 21. The ID is created by concatenating the CIK of the company whose filing the subsidiary was found in, the subsidiary company’s name, and location of incorporation. It is not guaranteed to be stable across different releases of PUDL and so should never be hard-coded in analyses.

core_sec10k__assn_sec10k_filers_and_eia_utilities#

Noticed any irregularities with this table? File a report.

Association table providing connections between SEC 10-K filers and EIA utilities.

Most-recent data:

2023

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

U.S. Securities and Exchange Commission (SEC) Form 10-K

Primary key:

central_index_key, utility_id_eia

Usage Warnings#

  • This data has been matched probabilistically using a machine learning model and contains some incompletions and errors. It should not be treated as ground truth data.

  • PUDL does not currently update its copy of this data.

Additional Details#

SEC central index keys are matched to EIA utility IDs using probabilistic record linkage based on associated company information like company name, business and mailing addresses, and state of incorporation. The match between central_index_key and utility_id_eia is one to one and is not allowed to change over time. In cases where there were multiple candidate matches, the match with the highest probability is selected.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_sec10k__assn_sec10k_filers_and_eia_utilities.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_sec10k__assn_sec10k_filers_and_eia_utilities.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_sec10k__assn_sec10k_filers_and_eia_utilities.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_sec10k__assn_sec10k_filers_and_eia_utilities.parquet"
)

Columns#

Field Name

Type

Description

central_index_key

string

Identifier of the company in SEC database.

utility_id_eia

integer

The EIA Utility Identification number.

core_sec10k__changelog_company_name#

Noticed any irregularities with this table? File a report.

Changelog table tracking changes in the names each SEC 10-K filer has used.

Most-recent data:

2023

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

U.S. Securities and Exchange Commission (SEC) Form 10-K

Primary key:

This table has no primary key.

Usage Warnings#

  • PUDL does not currently update its copy of this data.

Additional Details#

This table is extracted from the same SEC 10-K filing header information as core_sec10k__quarterly_company_information. Each filing reports the full history of name change associated with a company up to the date of that filing. Because individual companies may appear in multiple filings in the same year, and the same historical name changes will be reported in multiple years, the raw input data contains many duplicate entries, which are deduplicated to create this table. The original name change data only contains the former name and the date of the change.

Roughly 2% of all records describe multiple name changes happening on the same date (they are duplicates on the basis of central_index_key and name_change_date). This may be due to company name reporting inconsistencies or reporting errors in which the old and new company names have been swapped.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_sec10k__changelog_company_name.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_sec10k__changelog_company_name.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_sec10k__changelog_company_name.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_sec10k__changelog_company_name.parquet"
)

Columns#

Field Name

Type

Description

central_index_key

string

Identifier of the company in SEC database.

company_name

string

Name of the reporting company.

company_name_old

string

Name of company prior to name change.

name_change_date

date

Date of last name change of the company.

core_sec10k__quarterly_company_information#

Noticed any irregularities with this table? File a report.

Quarterly time series of company and filing information extracted from SEC 10-K filing headers.

Most-recent data:

2023

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

U.S. Securities and Exchange Commission (SEC) Form 10-K

Primary key:

filename_sec10k, central_index_key

Usage Warnings#

  • PUDL does not currently update its copy of this data.

Additional Details#

While the SEC 10-K filings are submitted by a single company, they often contain references to many other related companies. Information about these companies and the filing itself are contained in text headers at the beginning of the filing. This table contains data extracted from those headers. Each record in this table represents a single observation of a company in a single filing. The filer_count indicates which referenced company within a filing header the record corresponds to.

Because the same company may be referenced in a number of different filings submitted in the same reporting period or even on the same day, this table contains apparently duplicative records about many companies, that may be distinguished only by the filename associated with the filing they appeared in and their filer count. Note that all references to a particular company may not be perfectly consistent across all filings in which they appear. The various company names, addresses, and other information associated with the company’s unique and permanent central_index_key are later used as inputs into the probabilistic record linkage process.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_sec10k__quarterly_company_information.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_sec10k__quarterly_company_information.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_sec10k__quarterly_company_information.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_sec10k__quarterly_company_information.parquet"
)

Columns#

Field Name

Type

Description

business_city

string

City where the company’s place of business is located.

business_postal_code

string

Non-US postal code of the company’s place of business.

business_state

string

State where the company’s place of business is located.

business_street_address

string

Street address of the company’s place of business.

business_street_address_2

string

Second line of the street address of the company’s place of business.

business_zip_code

string

Zip code of the company’s place of business.

business_zip_code_4

string

Zip code suffix of the company’s place of business.

central_index_key

string

Identifier of the company in SEC database.

company_name

string

Name of the reporting company.

filename_sec10k

string

Unique portion of the filename associated with the SEC 10-K filing in the EDGAR database. The full source URL can be reconstructed by prepending https://www.sec.gov/Archives/edgar/data/ and adding the .txt file type extension.

filer_count

integer

A counter indicating which observation of company data within an SEC 10-K filing header the record pertains to.

filing_number_sec

string

Filing number used internally by the SEC commission to track filing.

film_number

string

Document control number used in the SEC EDGAR database. The first four digits can be used to access scans of the document in the SEC’s Virtual Private Reference Room.

fiscal_year_end

string

The end date of an SEC filing company’s fiscal year, in MMDD format.

incorporation_state

string

Two letter state code where company is incorporated.

industry_id_sic

string

Four-digit Standard Industrial Classification (SIC) code identifying the company’s primary industry. SIC codes have been replaced by NAICS codes in many applications, but are still used by the SEC. See e.g. https://www.osha.gov/data/sic-manual for code definitions.

industry_name_sic

string

Text description of Standard Industrial Classification (SIC)

mail_city

string

City of the company’s mailing address.

mail_postal_code

string

Non-US postal code of the company’s mailing address.

mail_state

string

State of the company’s mailing address.

mail_street_address

string

Street portion of the company’s mailing address.

mail_street_address_2

string

Second line of the street portion of the company’s mailing address.

mail_zip_code

string

Zip code of the company’s mailing address.

mail_zip_code_4

string

Zip code suffix of the company’s mailing address.

phone_number

string

Phone number for utility contact 1.

sec10k_type

string

Specific version of SEC 10-K that was filed. 10-k: the standard annual report. 10-k/a: an amended version of the annual report. 10-k405: filed to report insider trading that was not reported in a timely fashion. 10-k405/a: an amended version of the 10-k405. 10-kt: submitted in lieu of or in addition to a standard 10-K annual report when a company changes the end of its fiscal year (e.g. due to a merger) leaving the company with a longer or shorter reporting period. 10-kt/a: an amended version of the 10-kt. 10-ksb: the annual report for small businesses, also known as penny stocks. 10-ksb/a: an amended version of the 10-ksb.

sec_act

string

SEC Act through which the form was enacted, e.g. 1934 act.

taxpayer_id_irs

string

Taxpayer ID of the company with the IRS.

core_sec10k__quarterly_exhibit_21_company_ownership#

Noticed any irregularities with this table? File a report.

Quarterly time series of subsidiary company ownership data from the SEC 10-K Exhibit 21 attachments.

Most-recent data:

2023

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

U.S. Securities and Exchange Commission (SEC) Form 10-K

Primary key:

filename_sec10k, subsidiary_company_id_sec10k

Usage Warnings#

  • This data has been extracted probabilistically using a machine learning model and contains some incompletions and errors. It should not be treated as ground truth data.

  • PUDL does not currently update its copy of this data.

Additional Details#

Exhibit 21 is an unstructured text or PDF attachment to the main SEC 10-K filing that is used to describe the subsidiaries owned by the filing company. It may or may not provide the percentage of the subsidiary that is owned by the filing company, or the location of the subsidiary.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_sec10k__quarterly_exhibit_21_company_ownership.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_sec10k__quarterly_exhibit_21_company_ownership.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_sec10k__quarterly_exhibit_21_company_ownership.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_sec10k__quarterly_exhibit_21_company_ownership.parquet"
)

Columns#

Field Name

Type

Description

filename_sec10k

string

Unique portion of the filename associated with the SEC 10-K filing in the EDGAR database. The full source URL can be reconstructed by prepending https://www.sec.gov/Archives/edgar/data/ and adding the .txt file type extension.

fraction_owned

number

Proportion of generator ownership attributable to this utility.

subsidiary_company_id_sec10k

string

PUDL-assigned ID for subsidiaries found in SEC 10-K Exhibit 21. The ID is created by concatenating the CIK of the company whose filing the subsidiary was found in, the subsidiary company’s name, and location of incorporation. It is not guaranteed to be stable across different releases of PUDL and so should never be hard-coded in analyses.

subsidiary_company_location

string

Location of subsidiary company. This is the full US state name or country name and occasionally a two digit code that was not mapped to a full name during cleaning.

subsidiary_company_name

string

Name of subsidiary company.

core_sec10k__quarterly_filings#

Noticed any irregularities with this table? File a report.

Quarterly time series of metadata describing SEC 10-K filings.

Most-recent data:

2023

Processing:

Data has been cleaned and organized into well-modeled tables that serve as building blocks for downstream wide tables and analyses.

Source:

U.S. Securities and Exchange Commission (SEC) Form 10-K

Primary key:

filename_sec10k

Usage Warnings#

  • PUDL does not currently update its copy of this data.

Additional Details#

Each SEC 10-K filing is submitted by a single company, but may contain information about numerous other companies. This table indicates the company submitting the filing, as well as some information about the overall filing. Each filing is guaranteed to have a unique filename, but ~1% of all filings are one company submitting the same form multiple times on the same day, so the filename is the only available natural primary key.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_sec10k__quarterly_filings.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_sec10k__quarterly_filings.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/core_sec10k__quarterly_filings.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/core_sec10k__quarterly_filings.parquet"
)

Columns#

Field Name

Type

Description

central_index_key

string

Identifier of the company in SEC database.

company_name

string

Name of the reporting company.

exhibit_21_version

string

Version of exhibit 21 submitted (if applicable).

filename_sec10k

string

Unique portion of the filename associated with the SEC 10-K filing in the EDGAR database. The full source URL can be reconstructed by prepending https://www.sec.gov/Archives/edgar/data/ and adding the .txt file type extension.

filing_date

date

Date on which the filing was submitted.

report_date

date

Date reported.

sec10k_type

string

Specific version of SEC 10-K that was filed. 10-k: the standard annual report. 10-k/a: an amended version of the annual report. 10-k405: filed to report insider trading that was not reported in a timely fashion. 10-k405/a: an amended version of the 10-k405. 10-kt: submitted in lieu of or in addition to a standard 10-K annual report when a company changes the end of its fiscal year (e.g. due to a merger) leaving the company with a longer or shorter reporting period. 10-kt/a: an amended version of the 10-kt. 10-ksb: the annual report for small businesses, also known as penny stocks. 10-ksb/a: an amended version of the 10-ksb.

_core_eia860__cooling_equipment#

Noticed any irregularities with this table? File a report.

Information about cooling equipment at generation facilities.

Most-recent data:

2024

Processing:

Data has been cleaned but not tidied/normalized. Published only temporarily and may be removed without notice.

Source:

EIA Form 860 – Annual Electric Generator Report (Schedule 6D)

Primary key:

plant_id_eia, utility_id_eia, cooling_id_eia, report_date

Usage Warnings#

  • This table has not been harvested with other EIA 923 or 860 data. The same variables present in this table may show up in other _core tables in other years.

Additional Details#

A single plant can have several collections of cooling equipment, so each line in this table represents one cooling system at a given plant. You can determine how to link each set of cooling equipment to specific boilers in a plant via core_eia860__assn_boiler_cooling.

EIA-860 respondents are required to submit information about cooling equipment for plants over 100 MW - including steam, combined cycle, nuclear or solar thermal generators.

This table has been cleaned, but not tidied/normalized with other EIA 923 or 860 data. The same variables present in this table may show up in other tables and/or in other years. Once this table has been harvested, it will be converted into a``core`` table.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/_core_eia860__cooling_equipment.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/_core_eia860__cooling_equipment.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/_core_eia860__cooling_equipment.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/_core_eia860__cooling_equipment.parquet"
)

Columns#

Field Name

Type

Description

chlorine_equipment_cost

number

Actual installed cost for the existing chlorine discharge control system or the anticipated cost to bring the chlorine discharge control system into commercial operation

chlorine_equipment_operating_date

date

Actual or projected in-service date for chlorine discharge control structures and equipment

cooling_equipment_total_cost

number

Actual installed cost for the existing system or the anticipated cost to bring the total system into commercial operation

cooling_id_eia

string

EIA Identification code for cooling system (if multiple cooling systems are not distinguished by separate IDs, the word ‘PLANT’ is listed to encompass the cooling system for the entire plant)

cooling_status_code

string

Operating status of cooling system

cooling_system_operating_date

date

The actual or projected in-service datetime of this cooling system

cooling_type_1

string

Type of cooling system

cooling_type_2

string

Type of cooling system

cooling_type_3

string

Type of cooling system

cooling_type_4

string

Type of cooling system

cooling_water_discharge

string

Name of river, lake, or water source that cooling water is discharged into

cooling_water_source

string

Name of river, lake, or water source that provides cooling water

county

string

County name.

intake_distance_shore_feet

number

Maximum distance from shore to intake

intake_distance_surface_feet

number

Average distance below water surface to intake

intake_rate_100pct_gallons_per_minute

number

Design cooling water flow rate at 100 percent load at in-take

outlet_distance_shore_feet

number

Maximum distance from shore to outlet

outlet_distance_surface_feet

number

Average distance below water surface to outlet

percent_dry_cooling

number

Percent of cooling load served by dry cooling components

plant_id_eia

integer

The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.

plant_name_eia

string

Plant name.

plant_summer_capacity_mw

number

The plant summer capacity associated with the operating generators at the plant

pond_cost

number

Actual installed cost for the existing cooling ponds or the anticipated cost to bring the cooling ponds into commercial operation

pond_operating_date

date

Cooling ponds actual or projected in-service date

pond_surface_area_acres

number

Total surface area of cooling pond

pond_volume_acre_feet

number

Total volume of water in cooling pond

power_requirement_mw

number

Maximum power requirement for cooling towers at 100 percent load

report_date

date

Date reported.

state

string

Two letter US state abbreviation.

steam_plant_type_code

integer

Code that describes types of steam plants from EIA 860. See steam_plant_types_eia table for more details.

tower_cost

number

Actual installed cost for the existing cooling towers or the anticipated cost to bring the cooling towers into commercial operation

tower_operating_date

date

Cooling towers actual or projected in-service date

tower_type_1

string

Types of cooling towers at this plant

tower_type_2

string

Types of cooling towers at this plant

tower_type_3

string

Types of cooling towers at this plant

tower_type_4

string

Types of cooling towers at this plant

tower_water_rate_100pct_gallons_per_minute

number

Maximum design rate of water flow at 100 percent load for the cooling towers

utility_id_eia

integer

The EIA Utility Identification number.

utility_name_eia

string

The name of the utility.

water_source

string

Name of water source associated with the plant.

water_source_code

string

Type of cooling water source

water_type_code

string

Type of cooling water

_core_eia860__fgd_equipment#

Noticed any irregularities with this table? File a report.

Information about flue gas desulfurization equipment at generation facilities.

Most-recent data:

2024

Processing:

Data has been cleaned but not tidied/normalized. Published only temporarily and may be removed without notice.

Source:

EIA Form 860 – Annual Electric Generator Report (Schedule 6E)

Primary key:

plant_id_eia, so2_control_id_eia, report_date

Usage Warnings#

  • This table has not been harvested with other EIA 923 or 860 data. The same variables present in this table may show up in other _core tables in other years.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/_core_eia860__fgd_equipment.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/_core_eia860__fgd_equipment.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/_core_eia860__fgd_equipment.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/_core_eia860__fgd_equipment.parquet"
)

Columns#

Field Name

Type

Description

byproduct_recovery

boolean

Is saleable byproduct recovered by the unit?

county

string

County name.

county_id_fips

string

County ID from the Federal Information Processing Standard Publication 6-4.

data_maturity

string

Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.

fgd_manufacturer

string

Name of flue gas desulfurization equipment manufacturer.

fgd_manufacturer_code

string

Code corresponding to name of flue gas desulfurization equipment manufacturer.

fgd_operating_date

date

The actual or projected in-service datetime of this flue gas desulfurization system

fgd_operational_status_code

string

Operating status code for flue gas desulfurization equipment.

fgd_other_cost

number

Other actual installed costs for installation of a flue gas desulfurization unit or the anticipated other costs pertaining to the installation of a flue gas desulfurization unit.

fgd_structure_cost

number

Actual installed costs for the existing systems or the anticipated costs of structures and equipment to bring a planned flue gas desulfurization system into commercial operation.

fgd_trains_100pct

number

Total number of flue gas desulfurization unit scrubber trains operated at 100 percent load.

fgd_trains_total

number

Total number of flue gas desulfurization unit scrubber trains.

flue_gas_bypass_fgd

boolean

Indicates whether flue gas can bypass the flue gas desulfurization unit.

flue_gas_entering_fgd_pct_of_total

number

Ratio of all flue gas that is entering the flue gas desulfurization unit.

flue_gas_exit_rate_cubic_feet_per_minute

number

Actual flue gas exit rate, in cubic feet per minute.

flue_gas_exit_temperature_fahrenheit

number

Flue gas exit temperature, in degrees Fahrenheit.

plant_id_eia

integer

The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.

plant_summer_capacity_mw

number

The plant summer capacity associated with the operating generators at the plant

pond_landfill_requirements_acre_foot_per_year

number

Annual pond and land fill requirements for flue gas desulfurization equipment.

report_date

date

Date reported.

sludge_disposal_cost

number

Actual installed costs for the existing sludge transport and disposal systems or the anticipated costs of sludge transport and disposal systems to bring a planned system into commercial operation.

sludge_pond

boolean

Indicates if there is a sludge pond associated with this unit.

sludge_pond_lined

boolean

Indicates whether the sludge pond is lined.

so2_control_id_eia

string

Sulfur dioxide control identification number. This ID is not a unique identifier.

so2_emission_rate_lbs_per_hour

number

Sulfur dioxide emission rate when operating at 100 percent load (pounds per hour).

so2_equipment_type_1

string

Type of sulfur dioxide control equipment.

so2_equipment_type_2

string

Type of sulfur dioxide control equipment.

so2_equipment_type_3

string

Type of sulfur dioxide control equipment.

so2_equipment_type_4

string

Type of sulfur dioxide control equipment.

so2_removal_efficiency_design

number

Designed removal efficiency for sulfur dioxide when operating at 100 percent load. Reported at the nearest 0.1 percent by weight of gases removed from the flue gas.

sorbent_type_1

string

Type of sorbent used by this sulfur dioxide control equipment.

sorbent_type_2

string

Type of sorbent used by this sulfur dioxide control equipment.

sorbent_type_3

string

Type of sorbent used by this sulfur dioxide control equipment.

sorbent_type_4

string

Type of sorbent used by this sulfur dioxide control equipment.

specifications_of_coal_ash

number

Design fuel specifications for ash when burning coal or petroleum coke (nearest 0.1 percent by weight).

specifications_of_coal_sulfur

number

Design fuel specifications for sulfur when burning coal or petroleum coke (nearest 0.1 percent by weight).

state

string

Two letter US state abbreviation.

state_id_fips

string

Two digit state FIPS code.

steam_plant_type_code

integer

Code that describes types of steam plants from EIA 860. See steam_plant_types_eia table for more details.

total_fgd_equipment_cost

number

Total actual installed costs for the existing flue gas desulfurization unit or the anticipated costs to bring a planned flue gas desulfurization unit into commercial operation.

utility_id_eia

integer

The EIA Utility Identification number.

utility_name_eia

string

The name of the utility.

water_source

string

Name of water source associated with the plant.

_core_eia923__monthly_cooling_system_information#

Noticed any irregularities with this table? File a report.

Monthly time series of cooling system information at generation facilities - mainly water volumes and temperatures.

Most-recent data:

2024

Processing:

Data has been cleaned but not tidied/normalized. Published only temporarily and may be removed without notice.

Source:

EIA Form 923 – Power Plant Operations Report (Schedule 8D)

Primary key:

plant_id_eia, report_date, cooling_id_eia

Usage Warnings#

  • Some years use a slightly different data definition.

Additional Details#

In 2008 and 2009, EIA only reports annual averages, but in later years all data is monthly.

Note: This table has been cleaned, but not harvested with other EIA 923 or 860 data. The same variables present in this table may show up in other _core tables in other years. Once this table has been harvested, it will be removed from the PUDL database.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/_core_eia923__monthly_cooling_system_information.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/_core_eia923__monthly_cooling_system_information.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/_core_eia923__monthly_cooling_system_information.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/_core_eia923__monthly_cooling_system_information.parquet"
)

Columns#

Field Name

Type

Description

annual_average_consumption_rate_gallons_per_minute

number

Annual average consumption rate of cooling water

annual_average_discharge_rate_gallons_per_minute

number

Annual average discharge rate of cooling water

annual_average_withdrawal_rate_gallons_per_minute

number

Annual average withdrawal rate of cooling water

annual_maximum_intake_summer_temperature_fahrenheit

number

Maximum cooling water temperature at intake during the summer

annual_maximum_intake_winter_temperature_fahrenheit

number

Maximum cooling water temperature at intake in winter

annual_maximum_outlet_summer_temperature_fahrenheit

number

Maximum cooling water temperature at outlet in summer

annual_maximum_outlet_winter_temperature_fahrenheit

number

Maximum cooling water temperature at outlet in winter

annual_total_chlorine_lbs

number

Amount of elemental chlorine added to cooling water annually. May be just the amount of chlorine-containing compound if schedule 9 is filled out.

cooling_id_eia

string

EIA Identification code for cooling system (if multiple cooling systems are not distinguished by separate IDs, the word ‘PLANT’ is listed to encompass the cooling system for the entire plant)

cooling_status_code

string

Operating status of cooling system

cooling_type

string

Type of cooling system

data_maturity

string

Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.

flow_rate_method

string

Calculation method for flow rates (actual or method of estimation)

monthly_average_consumption_rate_gallons_per_minute

number

Monthly average consumption rate of cooling water

monthly_average_discharge_rate_gallons_per_minute

number

Monthly average discharge rate of cooling water

monthly_average_discharge_temperature_fahrenheit

number

Average cooling water temperature at discharge point

monthly_average_diversion_rate_gallons_per_minute

number

Monthly average diversion rate of cooling water

monthly_average_intake_temperature_fahrenheit

number

Average cooling water temperature at intake point

monthly_average_withdrawal_rate_gallons_per_minute

number

Monthly average withdrawal rate of cooling water

monthly_maximum_discharge_temperature_fahrenheit

number

Maximum cooling water temperature at discharge

monthly_maximum_intake_temperature_fahrenheit

number

Maximum cooling water temperature at intake

monthly_total_chlorine_lbs

number

Amount of elemental chlorine added to cooling water monthly. May be just the amount of chlorine-containing compound if schedule 9 is filled out.

monthly_total_consumption_volume_gallons

number

Monthly volume of water consumed at consumption point (accurate to 0.1 million gal)

monthly_total_cooling_hours_in_service

integer

Total hours the system operated during the month

monthly_total_discharge_volume_gallons

number

Monthly volume of water discharged at discharge point (accurate to 0.1 million gal)

monthly_total_diversion_volume_gallons

number

Monthly volume of water diverted at diversion point (accurate to 0.1 million gal)

monthly_total_withdrawal_volume_gallons

number

Monthly volume of water withdrawn at withdrawal point (accurate to 0.1 million gal)

plant_id_eia

integer

The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.

report_date

date

Date reported.

temperature_method

string

Method for measurement of temperatures

_core_eia923__yearly_byproduct_disposition#

Noticed any irregularities with this table? File a report.

Annual time series of by-products reported by thermoelectric power plants with total steam turbine capacity of 100 megawatts or greater.

Most-recent data:

2024

Processing:

Data has been cleaned but not tidied/normalized. Published only temporarily and may be removed without notice.

Source:

EIA Form 923 – Power Plant Operations Report (Schedule 8A)

Primary key:

plant_id_eia, report_date, byproduct_description

Usage Warnings#

  • This table is experimental and/or a work in progress and may change in the future.

Additional Details#

All by-products other than steam sales are reported in thousand tons and are rounded to whole numbers. Steam sales are reported in MMBTu.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/_core_eia923__yearly_byproduct_disposition.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/_core_eia923__yearly_byproduct_disposition.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/_core_eia923__yearly_byproduct_disposition.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/_core_eia923__yearly_byproduct_disposition.parquet"
)

Columns#

Field Name

Type

Description

byproduct_description

string

Description of combustion by-product.

byproduct_units

string

Reported unit of measure for combustion byproduct. MMBTU for steam, tons for all other byproducts.

data_maturity

string

Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.

disposal_landfill_units

number

Disposed by-products in landfill, to the nearest hundred tons or in MMBTU for steam sales.

disposal_offsite_units

number

Disposed by-products offsite, to the nearest hundred tons or in MMBTU for steam sales.

disposal_ponds_units

number

Disposed by-products in ponds, to the nearest hundred tons or in MMBTU for steam sales.

no_byproducts_to_report

string

Whether any combustion by-products were produced by a plant. ‘Y’ indicates no byproducts to report. The ‘Y’ and ‘N’ values do not align with expected values of reported byproducts. This column is messy and requires standardization.

plant_id_eia

integer

The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.

report_date

date

Date reported.

sold_units

number

Sold by-products, in tons (to the nearest 100 tons) or, for Steam, MMBTU.

stored_offsite_units

number

Stored by-products offsite, to the nearest hundred tons or in MMBTU for steam sales.

stored_onsite_units

number

Stored by-products onsite, to the nearest hundred tons or in MMBTU for steam sales.

total_disposal_units

number

Total by-product disposal, to the nearest hundred tons or in MMBTU for steam sales.

used_offsite_units

number

Used offsite by-products, to the nearest hundred tons or in MMBTU for steam sales.

used_onsite_units

number

Used onsite by-products, to the nearest hundred tons or in MMBTU for steam sales.

_core_eia923__yearly_byproduct_expenses_and_revenues#

Noticed any irregularities with this table? File a report.

Annual time series of financial information related to combustion by-products reported by thermoelectric power plants with total steam turbine capacity of 100 megawatts or greater and that produced combustion by-products during the reporting year.

Most-recent data:

2024

Processing:

Data has been cleaned but not tidied/normalized. Published only temporarily and may be removed without notice.

Source:

EIA Form 923 – Power Plant Operations Report (Schedule 8B)

Primary key:

plant_id_eia, report_date

Usage Warnings#

  • This table is experimental and/or a work in progress and may change in the future.

Additional Details#

Cost data must be entered for all entries on Schedule 8A. Financial information includes operational and maintenance expenditures (opex), capital expenditures (capex), and by-products sales revenue.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/_core_eia923__yearly_byproduct_expenses_and_revenues.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/_core_eia923__yearly_byproduct_expenses_and_revenues.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/_core_eia923__yearly_byproduct_expenses_and_revenues.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/_core_eia923__yearly_byproduct_expenses_and_revenues.parquet"
)

Columns#

Field Name

Type

Description

capex_air_abatement

number

Cost of new structures and/or equipment purchased to reduce, monitor, or eliminate airborne pollutants.

capex_other_abatement

number

Other amortizable expenses and purchases of new structures and or equipment when such purchases are not allocated to a particular unit or item.

capex_solid_waste

number

Cost of structures or equipment purchased to collect and dispose of objectionable solids or contained liquids.

capex_water_abatement

number

Cost of new structures and/or equipment purchased to reduce, monitor, or eliminate waterborne pollutants, including chlorine, phosphates, acids, bases, hydrocarbons, sewage, and other pollutants.

data_maturity

string

Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.

opex_bottom_ash_collection

number

Costs of materials and labor associated with the collection of bottom ash from all sources.

opex_bottom_ash_disposal

number

Costs of materials and labor associated with the disposal of bottom ash from all sources.

opex_bottom_ash_other

number

Other costs associated with the collection and disposal of bottom ash.

opex_fgd_byproduct_collection

number

Costs of materials and labor associated with the collection of sulfur by-product (flue gas desulfurization).

opex_fgd_byproduct_disposal

number

Costs of materials and labor associated with the disposal of sulfur by-product (flue gas desulfurization).

opex_fgd_byproduct_other

number

Other costs associated with the collection and disposal of sulfur by-product (flue gas desulfurization).

opex_fly_ash_collection

number

Costs of materials and labor associated with the collection of fly ash from all sources.

opex_fly_ash_disposal

number

Costs of materials and labor associated with the disposal of fly ash from all sources.

opex_fly_ash_other

number

Other costs associated with the collection and disposal of fly ash.

opex_other_abatement_collection

number

Abatement costs of by-product collection that are not allocated to a particular expenditure, e.g., costs of operating an environmental protection office.

opex_other_abatement_disposal

number

Abatement costs of by-product disposal that are not allocated to a particular expenditure.

opex_other_abatement_other

number

Other abatement costs that are not allocated to a particular expenditure.

opex_total_collection_abatement

number

Sum of abatement costs associated with by-product collection.

opex_total_disposal_abatement

number

Sum of abatement costs associated with by-product disposal.

opex_total_other_abatement

number

Sum of other abatement costs associated with the collection and disposal of byproducts.

opex_water_abatement_collection

number

Costs associated with the collection/abatement of water pollution, e.g., equipment operation and maintenance of pumps, pipes, and settling ponds.

opex_water_abatement_disposal

number

Costs associated with the disposal of water pollutants.

opex_water_abatement_other

number

Other abatement costs associated with water pollutants.

plant_id_eia

integer

The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.

report_date

date

Date reported.

revenues_bottom_ash

number

Revenue from the sale of bottom ash by-product.

revenues_fgd_byproducts

number

Revenue from the sale of flue gas desulfurization by-product.

revenues_fly_ash

number

Revenue from the sale of fly ash by-product.

revenues_fly_bottom_ash_intermingled

number

Revenue from the sale of intermingled fly and bottom ash by-product.

revenues_other_byproducts

number

Revenue from the sale of other by-products.

revenues_total_byproduct

number

Total revenue from the sale of by-products.

_core_eia923__yearly_emissions_control#

Noticed any irregularities with this table? File a report.

Annual time series of Actual rate and removal efficiency for air emissions reported by thermoelectric or combined-cycle power plants with a total steam turbine capacity greater than or equal to 10 megawatts.

Most-recent data:

2024

Processing:

Data has been cleaned but not tidied/normalized. Published only temporarily and may be removed without notice.

Source:

EIA Form 923 – Power Plant Operations Report (Schedule 8C)

Primary key:

This table has no primary key. This table is not yet normalized, and contains information about several different types of emissions control equipment, each of which has its own ID, and not all of which are present at every plant. As a result there is currently no unique, non-null primary key. Conceptually, the primary key should consist of plant_id_eia, report_date, mercury_control_id_eia, nox_control_id_eia, so2_control_id_eia, and particulate_control_id_eia. In practice many records have NULL values for all of the control equipment IDs.

Usage Warnings#

  • This table is experimental and/or a work in progress and may change in the future.

Additional Details#

In the raw data, the so2_test_date and particulate_test_date columns contained a wide variety of non-standard date formats. They have been standardized to ISO-8601 (YYYY-MM-DD) where possible. Missing days and months have been filled in with 01. A handful of manual spot-fixes have been applied to particularly strange formats. Two-digit years have been assumed to fall between 1950 and the last year observed in the report_date column.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/_core_eia923__yearly_emissions_control.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/_core_eia923__yearly_emissions_control.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/_core_eia923__yearly_emissions_control.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/_core_eia923__yearly_emissions_control.parquet"
)

Columns#

Field Name

Type

Description

acid_gas_removal_efficiency

number

Removal efficiency for acid gas emissions. Ranges from 0 to 1.

annual_nox_emission_rate_lb_per_mmbtu

number

Actual controlled (or uncontrolled) nitrogen oxides emission rate. Based on data from CEMS where possible.

data_maturity

string

Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.

environmental_equipment_name

string

Type of equipment or strategy for the control of air emissions.

fgd_electricity_consumption_mwh

number

Electric power consumed by the flue gas desulfurization unit (in MWh).

fgd_sorbent_consumption_tons

number

Quantity of flue gas desulfurization sorbent used, to the nearest 100 ton.

hours_in_service

integer

Total hours the emissions control was in service during the reporting year, rounded to the nearest hour.

mercury_control_id_eia

string

Mercury control identification number. This ID is not a unique identifier.

mercury_emission_rate_lb_per_trillion_btu

number

Actual controlled (or uncontrolled) mercury emission rate, based on data from CEMS, where possible.

mercury_removal_efficiency

number

Removal efficiency for mercury emissions. Ranges from 0 to 1.

nox_control_id_eia

string

Nitrogen oxide control identification number. This ID is not a unique identifier.

operational_status

string

The operating status of the asset. For generators this is based on which tab the generator was listed in in EIA 860.

ozone_season_nox_emission_rate_lb_per_mmbtu

number

Actual controlled (or uncontrolled) nitrogen oxides emission rate during the ozone season (May to September)

particulate_control_id_eia

string

Particulate matter control identification number. This ID is not a unique identifier.

particulate_emission_rate_lb_per_mmbtu

number

Average annual emission removal rate for particulate matter.

particulate_removal_efficiency_annual

number

Particulate removal efficiency, based on the annual operating factor, which is defined as annual fuel consumption (MMBTU) divided by the product of the boiler design firing rate (MMBTU per hour) and hours of operation per year.When actual data are not available, estimates are provided based on equipment design performance specifications. Ranges from 0 to 1.

particulate_removal_efficiency_tested

number

The tested efficiency for the removal of particulate matter at 100 percent load. If not tested at 100 percent load, then the load at which the test was conducted is included as a comment on Schedule 9. If no test was conducted, the test date and tested efficiency field should be blank. Ranges from 0 to 1.

particulate_test_date

date

Date of the latest efficiency test for the removal of particulate matter.

plant_id_eia

integer

The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.

report_date

date

Date reported.

so2_control_id_eia

string

Sulfur dioxide control identification number. This ID is not a unique identifier.

so2_removal_efficiency_annual

number

Sulfur dioxide removal efficiency, based on the annual operating factor, which is defined as annual fuel consumption (MMBTU) divided by the product of the boiler design firing rate (MMBTU per hour) and hours of operation per year.When actual data are not available, estimates are provided based on equipment design performance specifications. Ranges from 0 to 1.

so2_removal_efficiency_tested

number

The tested efficiency for the removal of sulfur dioxide at 100 percent load. If not tested at 100 percent load, then the load at which the test was conducted is included as a comment on Schedule 9. If no test was conducted, the test date and tested efficiency field should be blank. Ranges from 0 to 1.

so2_test_date

date

Date of most recent test for sulfur dioxide removal efficiency.

_core_eia923__yearly_fgd_operation_maintenance#

Noticed any irregularities with this table? File a report.

Annual time series of flue gas desulfurization (FGD) operation and maintenance expenses at generation facilities.

Most-recent data:

2024

Processing:

Data has been cleaned but not tidied/normalized. Published only temporarily and may be removed without notice.

Source:

EIA Form 923 – Power Plant Operations Report (Schedule 8C)

Primary key:

plant_id_eia, report_date, so2_control_id_eia

Usage Warnings#

  • Some years use a slightly different data definition.

Additional Details#

From 2008-2011 this table also reported operational characteristics that are now reported in 923 Air Emissions Control Info.

Note: This table has been cleaned, but not harvested with other EIA 923 or 860 data. The same variables present in this table may show up in other _core tables in other years. Once this table has been harvested, it will be removed from the PUDL database.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/_core_eia923__yearly_fgd_operation_maintenance.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/_core_eia923__yearly_fgd_operation_maintenance.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/_core_eia923__yearly_fgd_operation_maintenance.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/_core_eia923__yearly_fgd_operation_maintenance.parquet"
)

Columns#

Field Name

Type

Description

data_maturity

string

Maturity of the source data published by EIA that is reflected in this record. EIA releases data incrementally over time, including monthly updates, annual year-to-date updates, provisional early releases of annual data, and final annual release data that is not expected to change further. Records sourced from multiple upstream EIA datasets may have no well defined data maturity. Records whose values have been inferred within PUDL will also have no data maturity.

fgd_control_flag

boolean

Indicates whether or not a plant has a flue gas desulfurization control unit.

fgd_electricity_consumption_mwh

number

Electric power consumed by the flue gas desulfurization unit (in MWh).

fgd_hours_in_service

integer

Number of hours the flue gas desulfurization equipment was in operation during the year.

fgd_operational_status_code

string

Operating status code for flue gas desulfurization equipment.

fgd_sorbent_consumption_tons

number

Quantity of flue gas desulfurization sorbent used, to the nearest 100 ton.

opex_fgd_feed_materials_chemical

integer

Annual operation and maintenance expenditures for feed materials and chemicals for flue gas desulfurization equipment, excluding electricity.

opex_fgd_labor_supervision

integer

Annual operation and maintenance expenditures for labor and supervision of flue gas desulfurization equipment, excluding electricity.

opex_fgd_land_acquisition

integer

Annual operation and maintenance expenditures for land acquisition for flue gas desulfurization equipment, excluding electricity.

opex_fgd_maintenance_material_other

integer

Annual operation and maintenance expenditures for maintenance, materials and all other costs of flue gas desulfurization equipment, excluding electricity

opex_fgd_total_cost

integer

Annual total cost of operation and maintenance expenditures on flue gas desulfurization equipment, excluding electricity

opex_fgd_waste_disposal

integer

Annual operation and maintenance expenditures for waste disposal, excluding electricity.

plant_id_eia

integer

The unique six-digit facility identification number, also called an ORISPL, assigned by the Energy Information Administration.

report_date

date

Date reported.

so2_control_id_eia

string

Sulfur dioxide control identification number. This ID is not a unique identifier.

so2_removal_efficiency_annual

number

Sulfur dioxide removal efficiency, based on the annual operating factor, which is defined as annual fuel consumption (MMBTU) divided by the product of the boiler design firing rate (MMBTU per hour) and hours of operation per year.When actual data are not available, estimates are provided based on equipment design performance specifications. Ranges from 0 to 1.

so2_removal_efficiency_tested

number

The tested efficiency for the removal of sulfur dioxide at 100 percent load. If not tested at 100 percent load, then the load at which the test was conducted is included as a comment on Schedule 9. If no test was conducted, the test date and tested efficiency field should be blank. Ranges from 0 to 1.

so2_test_date

date

Date of most recent test for sulfur dioxide removal efficiency.

_core_phmsagas__yearly_distribution_by_install_decade#

Noticed any irregularities with this table? File a report.

Annual time series of miles of mains and the number of services in operation at the end of the year by install decade.

Most-recent data:

2024

Processing:

Data has been cleaned but not tidied/normalized. Published only temporarily and may be removed without notice.

Source:

Pipelines and Hazardous Materials Safety Administration (PHMSA) Annual Natural Gas Report (Part B - System Description / Section 4)

Primary key:

report_id, report_date, operator_id_phmsa, operating_state, install_decade

Usage Warnings#

  • This table has been concatenated across all years and re-organized into a logical structure, but the data has not been fully cleaned. Except some inconsistent units, data types and values over the years of reported data. Once fully cleaned, this table will be deprecated and replaced with a core table.

  • Some columns contain subtotals; use caution when choosing columns to aggregate.

Additional Details#

The records with an install decade of total_decade are a total - beware of aggregating these values.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/_core_phmsagas__yearly_distribution_by_install_decade.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/_core_phmsagas__yearly_distribution_by_install_decade.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/_core_phmsagas__yearly_distribution_by_install_decade.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/_core_phmsagas__yearly_distribution_by_install_decade.parquet"
)

Columns#

Field Name

Type

Description

commodity

string

The type of gas delivered by the distribution pipeline.

install_decade

string

The decade the distribution pipeline was installed.

mains_miles

number

The miles of mains distribution pipeline.

operating_state

string

State that the distribution utility is reporting for. Prior to 2004, this may be a list of states.

operator_id_phmsa

integer

PHMSA unique operator ID. A value of zero represents an unknown operator ID.

report_date

date

Date reported.

report_id

integer

Report number of the PHMSA Gas utility submission.

services

number

Number of services in system at end of year.

_core_phmsagas__yearly_distribution_by_material#

Noticed any irregularities with this table? File a report.

Annual time series of miles of mains and the number of services in operation at the end of the year by material for each gas distribution operator.

Most-recent data:

2024

Processing:

Data has been cleaned but not tidied/normalized. Published only temporarily and may be removed without notice.

Source:

Pipelines and Hazardous Materials Safety Administration (PHMSA) Annual Natural Gas Report (Part B - System Description / Section 1 - General)

Primary key:

This table has no primary key. We expect the primary key for this table should be report_id, operator_id_phmsa, operating_state and material. However, there are nulls in the operating_state across several years of reporting.

Usage Warnings#

  • This table has been concatenated across all years and re-organized into a logical structure, but the data has not been fully cleaned. Except some inconsistent units, data types and values over the years of reported data. Once fully cleaned, this table will be deprecated and replaced with a core table.

  • Some columns contain subtotals; use caution when choosing columns to aggregate.

  • The categories of material types have changed slightly over the years (ex: cast and wrought iron were broken up in two categories before 1984).

  • Beginning in 2004, companies file one report per state. The operating_state column has not been normalized and may contain more than one state in earlier years of data.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/_core_phmsagas__yearly_distribution_by_material.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/_core_phmsagas__yearly_distribution_by_material.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/_core_phmsagas__yearly_distribution_by_material.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/_core_phmsagas__yearly_distribution_by_material.parquet"
)

Columns#

Field Name

Type

Description

commodity

string

The type of gas delivered by the distribution pipeline.

mains_miles

number

The miles of mains distribution pipeline.

material

string

The material of the gas distribution pipe. The categories of material types have changed slightly over the years (ex: cast and wrought iron were broken up in two categories before 1984).

operating_state

string

State that the distribution utility is reporting for. Prior to 2004, this may be a list of states.

operator_id_phmsa

integer

PHMSA unique operator ID. A value of zero represents an unknown operator ID.

report_date

date

Date reported.

report_id

integer

Report number of the PHMSA Gas utility submission.

services

number

Number of services in system at end of year.

_core_phmsagas__yearly_distribution_by_material_and_size#

Noticed any irregularities with this table? File a report.

Annual time series of miles of mains and the number of services in operation at the end of the year by material and size of pipe.

Most-recent data:

2024

Processing:

Data has been cleaned but not tidied/normalized. Published only temporarily and may be removed without notice.

Source:

Pipelines and Hazardous Materials Safety Administration (PHMSA) Annual Natural Gas Report (Part B - System Description / Section 3)

Primary key:

This table has no primary key. We expect the primary key for this table should be report_id, operator_id_phmsa, operating_state, main_size and material. There are nulls in the operating_state across several years of reporting.

Usage Warnings#

  • This table has been concatenated across all years and re-organized into a logical structure, but the data has not been fully cleaned. Except some inconsistent units, data types and values over the years of reported data. Once fully cleaned, this table will be deprecated and replaced with a core table.

  • Some columns contain subtotals; use caution when choosing columns to aggregate.

  • The size ranges in main_size have changed slightly over the years (ex: before 1984 they reported 0.5_in_or_less whereas after they reported 1_in_or_less)

  • The categories of material types have changed slightly over the years (ex: cast and wrought iron were broken up in two categories before 1984).

  • Beginning in 2004, companies file one report per state. The operating_state column has not been normalized and may contain more than one state in earlier years of data.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/_core_phmsagas__yearly_distribution_by_material_and_size.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/_core_phmsagas__yearly_distribution_by_material_and_size.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/_core_phmsagas__yearly_distribution_by_material_and_size.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/_core_phmsagas__yearly_distribution_by_material_and_size.parquet"
)

Columns#

Field Name

Type

Description

commodity

string

The type of gas delivered by the distribution pipeline.

main_other_material_detail

string

A free-form text field containing notes about the other material type. This column should only contain values in it for rows with other as the material type listed.

main_size

string

Size range of mains. The size ranges have changed slightly over the years (ex: before 1984 they reported 0.5_in_or_less whereas after they reported 1_in_or_less).

mains_miles

number

The miles of mains distribution pipeline.

material

string

The material of the gas distribution pipe. The categories of material types have changed slightly over the years (ex: cast and wrought iron were broken up in two categories before 1984).

operating_state

string

State that the distribution utility is reporting for. Prior to 2004, this may be a list of states.

operator_id_phmsa

integer

PHMSA unique operator ID. A value of zero represents an unknown operator ID.

report_date

date

Date reported.

report_id

integer

Report number of the PHMSA Gas utility submission.

services

number

Number of services in system at end of year.

_core_phmsagas__yearly_distribution_excavation_damages#

Noticed any irregularities with this table? File a report.

Annual time series of excavation damages from various sources.

Most-recent data:

2024

Processing:

Data has been cleaned but not tidied/normalized. Published only temporarily and may be removed without notice.

Source:

Pipelines and Hazardous Materials Safety Administration (PHMSA) Annual Natural Gas Report (Part D - Excavation Damage)

Primary key:

report_id, damage_type, damage_sub_type

Usage Warnings#

  • This table has been concatenated across all years and re-organized into a logical structure, but the data has not been fully cleaned. Except some inconsistent units, data types and values over the years of reported data. Once fully cleaned, this table will be deprecated and replaced with a core table.

  • Some columns contain subtotals; use caution when choosing columns to aggregate.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/_core_phmsagas__yearly_distribution_excavation_damages.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/_core_phmsagas__yearly_distribution_excavation_damages.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/_core_phmsagas__yearly_distribution_excavation_damages.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/_core_phmsagas__yearly_distribution_excavation_damages.parquet"
)

Columns#

Field Name

Type

Description

commodity

string

The type of gas delivered by the distribution pipeline.

damage_sub_type

string

A sub-category of damage_type of excavation damage causes.

damage_type

string

A high level category of excavation damage causes.

damages

number

Number of instances of excavation damage.

operating_state

string

State that the distribution utility is reporting for. Prior to 2004, this may be a list of states.

operator_id_phmsa

integer

PHMSA unique operator ID. A value of zero represents an unknown operator ID.

report_date

date

Date reported.

report_id

integer

Report number of the PHMSA Gas utility submission.

_core_phmsagas__yearly_distribution_filings#

Noticed any irregularities with this table? File a report.

Annual time series of filings (aka submissions) from gas distribution system operators.

Most-recent data:

2024

Processing:

Data has been cleaned but not tidied/normalized. Published only temporarily and may be removed without notice.

Source:

Pipelines and Hazardous Materials Safety Administration (PHMSA) Annual Natural Gas Report

Primary key:

report_id, report_date, operator_id_phmsa

Usage Warnings#

  • This table has been concatenated across all years and re-organized into a logical structure, but the data has not been fully cleaned. Except some inconsistent units, data types and values over the years of reported data. Once fully cleaned, this table will be deprecated and replaced with a core table.

Additional Details#

This table contains information about the filer and filing type. This includes information about who filed but also whether this was an original filing or a correction.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/_core_phmsagas__yearly_distribution_filings.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/_core_phmsagas__yearly_distribution_filings.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/_core_phmsagas__yearly_distribution_filings.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/_core_phmsagas__yearly_distribution_filings.parquet"
)

Columns#

Field Name

Type

Description

data_date

date

When the data source was last updated.

filing_correction_date

date

Date when a correction filing was submitted.

filing_date

date

Date on which the filing was submitted.

form_revision_id

string

PHMSA form revision identifier.

initial_filing_date

date

Initial date when filing was originally submitted.

operator_id_phmsa

integer

PHMSA unique operator ID. A value of zero represents an unknown operator ID.

preparer_email

string

Email address of representative who filed report.

preparer_fax

string

Fax number of representative who filed report.

preparer_name

string

Name of representative who filed report.

preparer_phone

string

Phone number of representative who filed report.

preparer_title

string

Title of representative who filed report.

report_date

date

Date reported.

report_filing_type

string

Type of report submitted, either Initial or Supplemental.

report_id

integer

Report number of the PHMSA Gas utility submission.

_core_phmsagas__yearly_distribution_leaks#

Noticed any irregularities with this table? File a report.

Annual time series of total and hazardous leaks eliminated or repaired during the report year.

Most-recent data:

2024

Processing:

Data has been cleaned but not tidied/normalized. Published only temporarily and may be removed without notice.

Source:

Pipelines and Hazardous Materials Safety Administration (PHMSA) Annual Natural Gas Report (Part C)

Primary key:

This table has no primary key. We expect the primary key for this table should be report_id, operator_id_phmsa, operating_state, leak_severity and leak_source. There are nulls in the operating_state across several years of reporting.

Usage Warnings#

  • This table has been concatenated across all years and re-organized into a logical structure, but the data has not been fully cleaned. Except some inconsistent units, data types and values over the years of reported data. Once fully cleaned, this table will be deprecated and replaced with a core table.

  • Some columns contain subtotals; use caution when choosing columns to aggregate.

  • Beginning in 2004, companies file one report per state. The operating_state column has not been normalized and may contain more than one state in earlier years of data.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/_core_phmsagas__yearly_distribution_leaks.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/_core_phmsagas__yearly_distribution_leaks.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/_core_phmsagas__yearly_distribution_leaks.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/_core_phmsagas__yearly_distribution_leaks.parquet"
)

Columns#

Field Name

Type

Description

commodity

string

The type of gas delivered by the distribution pipeline.

leak_severity

string

Whether or not the leak described in this record are all leaks or hazardous leaks.

leak_source

string

The cause of the leaks.

mains

number

The number of mains distribution pipeline.

operating_state

string

State that the distribution utility is reporting for. Prior to 2004, this may be a list of states.

operator_id_phmsa

integer

PHMSA unique operator ID. A value of zero represents an unknown operator ID.

report_date

date

Date reported.

report_id

integer

Report number of the PHMSA Gas utility submission.

services

number

Number of services in system at end of year.

_core_phmsagas__yearly_distribution_misc#

Noticed any irregularities with this table? File a report.

Annual time series of miscellaneous distribution information.

Most-recent data:

2024

Processing:

Data has been cleaned but not tidied/normalized. Published only temporarily and may be removed without notice.

Source:

Pipelines and Hazardous Materials Safety Administration (PHMSA) Annual Natural Gas Report (Part B & C)

Primary key:

This table has no primary key. We expect the primary key for this table should be report_id, operator_id_phmsa, and operating_state. There are nulls in the operating_state across several years of reporting.

Usage Warnings#

  • This table has been concatenated across all years and re-organized into a logical structure, but the data has not been fully cleaned. Except some inconsistent units, data types and values over the years of reported data. Once fully cleaned, this table will be deprecated and replaced with a core table.

  • Beginning in 2004, companies file one report per state. The operating_state column has not been normalized and may contain more than one state in earlier years of data.

Access methods#

import pandas as pd
df = pd.read_parquet(
      "s3://pudl.catalyst.coop/nightly/_core_phmsagas__yearly_distribution_misc.parquet",
      dtype_backend="pyarrow",
)
import polars as pl
df = pl.read_parquet(
      "s3://pudl.catalyst.coop/nightly/_core_phmsagas__yearly_distribution_misc.parquet",
      storage_options={"aws_region": "us-west-2", "aws_skip_signature": "True"},
)
SELECT * FROM 's3://pudl.catalyst.coop/nightly/_core_phmsagas__yearly_distribution_misc.parquet';
library(arrow)
df <- read_parquet(
      "s3://pudl.catalyst.coop/nightly/_core_phmsagas__yearly_distribution_misc.parquet"
)

Columns#

Field Name

Type

Description

all_known_leaks_scheduled_for_repair

number

The number of known system leaks at the end of the report year scheduled for repair.

all_known_leaks_scheduled_for_repair_main

number

The number of known leaks on main at the end of the report year scheduled for repair.

average_service_length_feet

number

The average system service length in feet.

excavation_tickets

integer

Number of Excavation Tickets received by the operator during the year, (i.e., receipt of information by the operator from the notification center).

federal_land_leaks_repaired_or_scheduled

integer

Total number of leaks repaired, eliminated, or scheduled for repair on federal land during the reporting year.

hazardous_leaks_mechanical_joint_failure

number

The total number of hazardous leaks caused by a mechanical joint failure.

operating_state

string

State that the distribution utility is reporting for. Prior to 2004, this may be a list of states.

operator_id_phmsa

integer

PHMSA unique operator ID. A value of zero represents an unknown operator ID.

report_date

date

Date reported.

report_id

integer

Report number of the PHMSA Gas utility submission.

services_efv_in_system

integer

Estimated number of services with Excess Flow Valve in the system at end of reported year related to natural gas distribution.

services_efv_installed

integer

Total number of services with Excess Flow Valve installed during reported year related to natural gas distribution.

services_shutoff_valve_in_system

integer

Estimated number of services with manual service line shut-off valves installed in the system at end of report year related to natural gas distribution.

services_shutoff_valve_installed

integer

Total number of manual service line shut-off valves installed during reported year related to natural gas distribution.

unaccounted_for_gas_fraction

number

Unaccounted for gas as a fraction of total consumption for the 12 months ending June 30 of the reporting year. Calculated as follows: Take the sum of: (purchased gas + produced gas) minus (customer use + company use + appropriate adjustments). Then divide by the sum of (customer use + company use + appropriate adjustments). Prior to 2017, this field was calculated with a different deonominator (purchased gas + produced gas). The time period between 2010-2017 having this different calculation method ensured that there was no records that had a negative fraction. For all the other reporting years there are known and expected negative values in this column.