Categories
updates

FERC’s Electric Quarterly Reports (EQR) in PUDL!

With support from GridLab we were able to integrate an initial draft of the FERC Electric Quarterly Reports (EQR) into PUDL! EQR has been at the top of our data wishlist for almost half a decade and we’re excited to publish this first draft for beta testing.

Image

What’s in EQR?

We’ve heard from PUDL users that efforts to design cost-effective decarbonization pathways are hindered by limited visibility into the financial flows and market structures that shape the electricity sector. EQR offers an enormous amount of data describing transaction-level data on prices, quantities, and contractual terms between power producers, utilities and financial intermediaries. Yet the dataset remains largely underutilized due to accessibility and quality barriers. 

In its raw form, it is published by FERC as hundreds of gigabytes of nested zip files with little documentation, so querying and analyzing EQR data requires both domain knowledge and some data engineering skills. We hope that our minimally transformed version of EQR in PUDL will reduce barriers to access and get the data into the hands of the people that need it.

The EQR data records U.S. electricity wholesale market activity in great detail–sellers report transactions and their contractual terms on timescales ranging from 5-minutes in real-time spot markets, up to multi-year power purchase agreements (PPAs). With thousands of market participants and dozens of electricity products, the EQR contains billions of records.

EQR data is organized into three main tables: identifying sellers, describing contracts, and enumerating transactions. The seller identity table provides a unique FERC company ID that is used across multiple FERC Forms and associates that ID with the company name. The contracts table records 30 attributes of contracts that were active during the quarter, and whether or not any transactions took place under them. The transactions table contains 26 fields including the unit price, quantity and electricity product being sold. 

Data Access

Tutorial Video and Python Notebook

Here’s a tutorial video showing how to access data and what you can expect to see in the data. In the video, we walk through this Python based interactive notebook which demonstrates how to access and work with the EQR data, including a toy analysis of the contracts and transactions tables. 

Documentation

Review the FERC EQR data source documentation page and the data dictionary page for the 4 EQR tables: 

PUDL Data Viewer

You can browse the data through our PUDL Data Viewer UI and download small subsets of the data.

Parquet Files from Cloud Storage

We publish Apache Parquet files of all EQR data released between 2013 and present to publicly accessible cloud storage. The files are partitioned by quarter. These will be automatically updated quarterly by the 15th of Feburary, May, August and November. You can find the data in S3 under this path:

s3://pudl.catalyst.coop/ferceqr

Each table has its own subdirectory, containing quarterly parquet files:

  • s3://pudl.catalyst.coop/ferceqr/core_ferceqr__contracts/
  • s3://pudl.catalyst.coop/ferceqr/core_ferceqr__quarterly_identity/
  • s3://pudl.catalyst.coop/ferceqr/core_ferceqr__quarterly_index_pub/
  • s3://pudl.catalyst.coop/ferceqr/core_ferceqr__transactions/

See here for more information on accessing PUDL data from cloud storage.

Feedback

There’s a bunch of additional data cleaning work that we’d like to take on if we can secure additional funding for the project. This includes tasks like cleaning and standardization of categorical data, finding and fixing invalid dates, and dealing with duplicate values.

We’d love to get some feedback from real users, hear about your use case, and figure out what issues you may have run into. You can reach out to us at [email protected], or if you run into any problems, please feel free to create an issue on GitHub. You can find the issues we’re already tracking here:

Categories
updates

2025 PUDL Annual Report

Read the 2025 PUDL Annual Report to learn about PUDL developments as well as PUDL Sustainer plans for 2026.

2025 was the first year of the PUDL Sustainers Program. We designed the program to ensure the ongoing impact and reliability of PUDL:  data and software are kept up to date, bugs are fixed, and everything gets tested and released on a regular schedule. At the same time,  we sought to foster closer relationships with the organizations that rely heavily on PUDL data for their work, giving them an opportunity to see under the hood and weigh in on important decisions.

Thanks to support from our inaugural sustainers, RMI and GridLab, we were able to launch the program and keep PUDL  up and running!

24,000 unique IPs 
Accessing PUDL data from cloud storage buckets
577 users 
Registered for the PUDL Data Viewer
70+ schools 
Accessing PUDL data
20 archives
Of new data added to Zenodo
14 citations 
In academic journals, pre-prints, and theses (including Applied Energy, The Electricity Journal, and Environmental Research: Energy)
14 new tables 
Including imputed FERC 714 and EIA 930 data, SEC 10-K outputs tracking parent-subsidiary company relationships, and AEO projected energy use data
10 conferences
Attendance and presentation at USAEE, INFORMS, EPRI, TWEEDS and others!
8 terabytes
Of PUDL data distributed through S3 and Zenodo
4 interviews 
In the MIT Technology Review, the New York Times, Salon and Les Echos 
Categories
updates

Open Energy Data For All: Summer Data Lab 2026

Catalyst Cooperative has spent the last year or so developing educational materials for early career energy researchers, with generous support from the Sloan Foundation.

We’ve created a set of interactive Python tutorials about core data and software engineering skills which are useful in energy research. 

This summer we are organizing a 2-day energy data lab from August 11-12 at Georgia Tech. It will be a mix of tutorials, time to apply new skills to your own work, and networking opportunities. We hope this will provide you with some new skills and help you find people to collaborate with in your research career.

We are intentionally holding it immediately before the Macro-Energy Systems (MES) Workshop from August 13-14 so that you can go to both!

The lab is primarily for people who:

  • are graduate students or postdocs looking to use energy data in their research
  • feel nervous about coding, or are running into technical roadblocks
  • feel underrepresented in energy research

We have space for up to 20 attendees.

Breakfast, lunch, and dinner will be provided to all participants. Additionally, Catalyst Cooperative has funding to cover the food, travel, and lodging expenses for up to 15 attendees as a travel grant. This travel grant includes 4 nights of hotel (the lab, plus attendance at the MES workshop) plus domestic transportation (reimbursed via Catalyst Cooperative). We will accept self-funded participants as space allows. 

The tutorial section of the lab will pull from the following pool of topics, which we have been piloting over video during development:

  • working with non-CSV data types: JSON, XML, and Parquet
  • accessing remote data and APIs: the requests library
  • web scraping: the BeautifulSoup library
  • visual data exploration: pandas 201
  • making assumptions about data
  • moving from Jupyter notebooks into robust Python packages: the uv package and project manager
  • automated testing and debugging: pytest and pdb

We’ll be accepting applications through 2026-04-03 EDT and will email those accepted by the end of 2026-04-24 EDT. Note that the abstract submission deadline for the MES Workshop is 2026-03-01. You do not need to present at MES to be eligible for the data lab. All lab attendees may also attend MES as audience members.

If you have any questions about the lab or the application, please contact us at [email protected].

Categories
updates

Capturing the elusive FERC EQR

The Federal Energy Regulatory Commission (FERC) collects a lot of data in its role as the regulator of US electricity markets. One dataset that we’ve had our eyes on for years, but never had the resources to integrate into PUDL (until now!) is the Electric Quarterly Report or EQR.

The EQR is the reporting mechanism FERC uses for public utilities to fulfill their responsibility under section 205(c) of the Federal Power Act (FPA) to have their rates and charges on file in a convenient form and place.

Under this section of the FPA:

Where two or more public utilities are parties to the same rate schedule or tariff, each public utility transmitting or selling electric energy subject to the jurisdiction of this Commission shall post and file such rate schedule

In modern US electricity markets, there’s more than a million of these transactions every day. This makes posting them in a “convenient form and place” non-trivial. FERC provides an online EQR report viewer where you can scroll through a list of thousands of sellers and download their filing for a particular quarter as a summary PDF. But this is no way to do analysis.

Image

They also offer bulk downloads of the raw data as quarterly zipfiles containing CSVs or XML documents, but with more than 4 billion transactions recorded since the modern reporting era started in 2013, these formats are also not very ergonomic for bulk analysis. And even as compressed zipfiles, the full FERC EQR dataset is 100 GB. It’s far too big analyze in bulk locally with tools like pandas, let alone spreadsheets. However, with the development of tools like DuckDB and Polars over the last few years, working with this scale of tabular data has become much easier — even on a laptop!

The data locked up in the FERC EQR is potentially valuable for understanding how renewable energy power purchase agreements and battery storage prices have evolved over time, and what’s really driving rising energy costs. The EQR data is particularly valuable for less organized markets like the US Southeast and Intermountain West, where there is no ISO/RTO market data. We’re hopeful that this data can also make it easier to programmatically identify anti-competitive utility affiliate transactions and uneconomic self-dispatch by regulated monopolies.

Democratizing FERC EQR Access

Image

With additional support from Gigawatt-tier PUDL Sustainer GridLab, Catalyst is going to create an open access, cloud-native version of the FERC EQR dataset, designed for bulk analysis.

  • Timeframe: We expect to be working on the project from now through the first quarter of 2026.
  • Data Coverage: We are initially targeting the 2013Q3 to present data, since it is all in a very similar format, while older EQR data was submitted under a different reporting regime, in a different format, and is also not as rich in detail (though we have already archived the raw 2002-2013 data).
  • Outputs: We’re planning to mirror the structure of the current data, with four main tables: transactions, contracts, filer identities, and index publications, along with a new table that records all FERC company IDs and tracks how their reported name and other attributes change over time.
  • Format: The data will be distributed as a collection of Apache Parquet files, probably partitioned by quarter. All of the data from 2013Q3 to the present will use the same well-defined schema, informed by all of the historical EQR data dictionaries published by FERC.
  • Processing: Our first priority is making the bulk data freely accessible in a format that’s appropriate for analysis. We are not planning on doing any major data processing, beyond enforcing a uniform schema, converting the data to Parquet, and dealing with character encoding and CSV formatting issues that affect ~1% of all records.
  • Versioning: Due to the size of the data and the limited free cloud storage we have as part of the AWS Open Data Registry we won’t be able to provide multiple historical versions of the outputs like we do with other PUDL data. Whenever we update the outputs, the older version will be replaced.
  • Updates: We plan to update the outputs at least once a quarter, and will capture fresh snapshots of the raw EQR data using our automated data archiving scripts.
  • Access: We plan to publish the FERC EQR alongside the rest of our PUDL data in a freely accessible S3 bucket as part of the AWS Open Data Registry. We also intend to make the EQR outputs available for preview and querying through the PUDL Data Viewer, which will allow users to download smaller subsets of the data as CSVs for spreadsheet-based analysis if that’s what they need.

Get Involved!

  • If you’re already familiar with the FERC EQR and want to help test out the new data we’d love to hear from you: [email protected]
  • We’d like to better understand how this data is used in any context, whether it’s research, policy, journalistic, or commercial, so if you’ve got a use case in mind for historical electricity transaction data, let us know what it is. Maybe sign up for office hours so we can chat more, or help you get familiar with Parquet files.
  • What other data does the FERC EQR need be connected to? One of the things we’re hoping this analysis-ready, cloud-native version of EQR will enable is robust record linkage with other datasets. Can we identify the LMP nodes where power is being delivered? Can we identify electricity buyers in other FERC or EIA data, even though all we have is their name? What else should we be thinking about?
  • Stay up to date with the project: Subscribe to our ~monthly newsletter, keep track of the FERC EQR issues in the main PUDL repo, or follow the EQR scoping repo where we’ll be prototyping the new system.

After 5 years of ogling the EQR from afar, and projecting our (data) hopes and dreams into its depths, we’re excited to finally tackle it for real, and hope it’ll be a valuable resource for understanding US energy markets and accelerating the ever more economical decarbonization of the US electricity system.

Categories
updates

Keeping PUDL on the Pulse: Insights from our Annual Ecosystem Survey

Last July we conducted more than 60 interviews with energy data users to kick-off our work for the  NSF POSE grant! While that process was very informative, it was also a huge amount of work, and we also want to have more sustainable ways to understand the needs of energy data users (PUDL or otherwise).

So this winter, we put out our first annual Energy Data Ecosystem Survey—a short, mostly multiple-choice questionnaire built on insights from our conversations. 47 of you responded, and we learned a lot!

Before we dig into the results, we’ve made a few other changes in the wake of talking to you all:

One of our core takeaways from the interviews was how organizations struggle with technical skill building and capacity. This is something Catalyst can address with our consulting services and educational content creation. Thanks to funding from the Sloan Foundation, we’ve developed OpenEnergyData4All, a training series to help future and current energy data users gain more confidence in their technical skills. We’re excited to continue creating content like this to help people engage more readily with the data!

Another important theme was users’ desire for data that is easy to explore. A large portion of technical and non technical users relied heavily on our PUDL browser viewing tool, Datasette. While Datasette was simple for us to set up and enabled users to explore PUDL data on the web and download CSV’s, it isn’t really designed to handle the scale of data that we’re publishing now. We developed a Beta PUDL Viewer for technical and non technical users to interact with and search through the data with greater ease. We’re continuing to improve the tool, so be sure to leave any feedback or ideas here.

Now, the Ecosystem Survey. The TL;DR:

  • It’s difficult to find the right data to answer your questions, and PUDL is no exception. We need to make it easier for folks to find what they’re looking for and parse through our extensive data catalog. We’re excited about the new PUDL data viewer, but there’s more we can do to make PUDL more useful and less overwhelming.
  • We did a good job picking the most important datasets to pull into PUDL! Generally speaking the data that are used most often are the ones we’ve integrated (pats self on the back), though there was undoubtedly some bias in who took the survey.
  • Energy data users have more Python and open source contribution experience than we thought! This combined with the fact that most energy data users have at some point written down quirks about the data tells us that we need to make contributing to PUDL easier and more desirable.

Now here’s a deeper dive on the state of energy data according to you:

Who are you?

You’re mostly academics and educators, but you represent a wide range of organization types including NGOs, for-profits, and hobbyists. A smaller subset of you represent folks working directly in policy, utility regulation, or for utilities themselves. Notably, none of you are journalists.

The primary outputs of your work are peer reviewed publications, data visualizations, and reports.

Image

What are you using energy data for?

You primarily use this data for utility or facility level financial and economic modeling, but you’re also quite involved in developing new policy proposals, energy system models, regulatory interventions, and spreading awareness about the climate and its impact on the grid.

What data are you using the most?

The most sought after datasets tend to be those already integrated into PUDL: EIA 860, EIA 923, EIA 861, NREL ATB, and FERC Form 1. Many of you also rely on data that is not yet (fully) integrated into PUDL: Weather data, EIA AEO, NREL EFS, NREL ResStock, ISO/RTO data, Census, and EPA eGrid data.

Image

How do you choose and obtain data?

72% of you access data by downloading it directly from federal and state agencies.  Unsurprisingly (it’s our survey after all) 60% of you are PUDL users too.  A smaller, but not insubstantial portion (33%) of you endeavor to get data directly from utilities and/or the various RTOs/ISOs.

When choosing a dataset, 72% of you also indicated that clear data processing methodology is important. Other considerations include data sharability, format, and update cadence. These factors were cited more frequently than things like data resolution or completeness. 

How do you work with data? 

The majority of you rely heavily on Python (81%) to process your data. Excel and R are tied, followed by SQL and others. 

CSV, Parquet, and JSON are your most desired data formats, in that order. PDF and XML/XBRL shared a spot at the bottom with 0 votes. 

Most of you have written down quirks about the data you are working with or made visualizations to accompany your work with PUDL data. As noted above, a decent chunk of you have also had to contend with data not yet available through PUDL, thus extracting and transforming it on your own. Very few folks have tried to incorporate this work into PUDL. However, nearly half of you have experience creating GitHub issues or even writing pull requests to change source code.

What are your pain points when using energy data?

Nearly half of you struggle to find the right data to answer your questions, extract data from difficult formats (like PDFs), and connect data from disparate datasets with no common identifiers.

With regard to PUDL specifically, you aptly noted that we publish multiple versions of the same data without enough documentation to let users identify which version is right for their use case. For the amount of information in PUDL, you also noted that our docs are confusing and sometimes insufficient. 

What are we going to do about it?

After reading everyone’s responses and processing the information we collected from the 60 informational interviews, we came up with a list of projects that would improve both PUDL and people’s experience working with energy data in general. The core takeaway is a need for more and better documentation. We’re going to make a concerted effort to put ourselves in your shoes and answer some of the longstanding questions about what’s in PUDL and how to navigate it. 

In a similar vein, we’re going to try and improve the process of contributing to PUDL, whether through code or useful information about the data. Perhaps an energy data wiki? 

Lastly, for those of you that offered, we might reach out for support along the way! We want PUDL to be useful for you, and the best way to ensure that is to involve you in the process. 

Stay tuned on our progress by following along on GitHub, and, as always, reach out if there is something you’d like to see us do.

Live, laugh, clean data,

Austen & the Catalyst Team

Categories
updates

VCE RARE Renewable Generation Profiles in PUDL

Image

Catalyst is helping GridLab, Pattern Energy Group, and Vibrant Clean Energy distribute a new open (CC-BY-4.0 licensed) dataset produced by Vibrant Clean Energy which provides hourly, county-level wind and solar generation profiles based on NOAA’s High Resolution Rapid Refresh (HRRR) weather model. The data release was announced at the ESIG’s Fall Technical Workshop on October 21st, 2024.

The new data is included in PUDL v2024.10.0 in Apache Parquet format. You can explore the dataset in this Jupyter notebook on Kaggle. The original 8760 hourly data as CSVs can be downloaded from Zenodo as well.

To learn more about why this kind of data is vital to the energy transition, check out ESIG’s report on Weather Data for Power System Planning.

Image

VCE RARE Press Release

Categories
updates

Announcing: Open Energy Data for All!

A free training series for energy graduate students

In the course of our work at Catalyst, we’ve been lucky enough to work with energy researchers at many institutions. So often, we see people run into the same set of problems – handling data that’s too big for Excel, reproducibly connecting datasets without shared IDs, and writing code that can be easily re-run and updated when a paper is reviewed or when a new year of data comes in.

Now, thanks to generous funding from the Alfred P. Sloan Foundation Energy & Environment Program, we’re pleased to announce “Open Energy Data for All,” an initiative to support energy graduate students. We’ll offer training in foundational data and software skills that enable faster, more open, and more reproducible energy data analysis.

Here’s what we have planned:

  • We’re hosting a monthly online seminar series addressing key challenges in energy data analysis
  • We’re working to develop a hands-on energy data curriculum with support from The Carpentries
  • We’ll give open energy data tutorials at several conferences around the country
  • Finally, we’ll host a two-day, in-person data lab, bringing together graduate students from across the U.S. to collectively tackle real-world energy data problems.
Kicking off our monthly seminar series

Whether wrestling with APIs or just trying to find the right dataset for your research, it can be hard to know where to start. That’s why, for our first monthly skill-building webinar, we’re starting with an introduction to the US open energy data landscape. Join us!

Intro to the US Open Energy Data Landscape
Oct. 30th, 4:30 – 5:30 PM Eastern (20:30 – 21:30 UTC)

We’ll address questions like:

  • What free and public energy data exists for the U.S., and how can I access it?
  • How have people used that open data in the energy transition?
  • What common challenges are there with using this data?
  • How can I evaluate open datasets I find online?
  • How may data availability affect my research topics?

We’ll be running more webinars monthly – they’re designed to be drop-in, and will be recorded, so don’t worry too much about missing one or another.

Want to learn more?

If you’re interested in hearing more about these projects, sign up for our workshop newsletter and subscribe to our calendar of events! If you’re a faculty member interested in learning more, or in hosting us for a talk or workshop, we’d love to chat – drop us a line at [email protected].

Categories
updates

Insights From 60+ Energy Data User Interviews

To kick off our NSF POSE grant work, over 4 weeks in July and August we interviewed more than 60 energy data users as part of NSF’s Innovation Corps program (I-Corps). I-Corps helps POSE awardees better understand their users and contributors, and the potential for fostering a sustainable open source ecosystem.

Some of our interviewees were already PUDL users, and many of them weren’t. A fair number of the PUDL users were at organizations we’d never encountered before! We talked to academic researchers and advocates working at non-profits, but also people at for-profit companies, and folks working in the public sector. We even had the chance to talk to some utilities. Interviewee technical and energy domain backgrounds were diverse: from spreadsheet-only NGOs to startups working with cloud-based data pipelines and orchestration frameworks, and everything in between. There were software engineers and lawyers that argue at FERC, grass roots advocates and regional electricity planning organizations too.

It was an intense month for our sometimes introverted team, but overall it was a good experience and we learned a lot. So we thought we’d share some of our high-level takeaways, and see if they resonate the broader energy data community.

Categories
updates

Inside the 2024 Catalyst Member Retreat

Image
Feat. our ongoing quest to find a logo for PUDL.

Catalyst is a fully-remote organization with members living all over North America. Our floating cyborg heads are besties, but we’re privy to the power of an analog hang. As such, we host an annual member retreat. This year we chose to meet in Mexico City because it seemed like fun and because it overlapped well with this year’s CSV conf (see Zane’s talk here!)

We booked an Airbnb with a few buffer days so folks could explore the city before getting down to business hashing out Catalyst’s future. We perspired, ate popsicles, and pinched ourselves a few times to make sure we were actually there. Even though we had a lot of work planned for the next few days, we agreed that the retreat would be a success even if all we did was get together.

Categories
updates

Workplace Democracy and Open Source

This year at csv,conf,v8 in Puebla, Mexico I gave a talk on our experience as a democratic worker cooperative creating digital public goods, and why we think co-ops are potentially a good fit for creating public-interest technology. You can watch the recorded talk on YouTube, or read on for a bloggified version of the talk below.