Welcome to the tell Quickstarter!

``tell`` is an open-source Python package for projecting future electricty demand in the United States.

A little about tell

The Total ELectricity Load (TELL) model projects the short- and long-term evoluation of hourly electricity demand (load) in response to future changes in weather and climate. The purpose of tell is to generate end-of-century hourly profiles of electricity demand across the entire Conterminous United States (CONUS) at a spatial resolution adequate for input to a unit commitment/economic dispatch (UC/ED) model while also maintaining consistency with the long-term growth and evolution of annual state-level electricity demand projected by an economically driven human-Earth system model. tell takes as input future projections of the hourly time-series of meteorology and decadal populations and uses the temporal variations in weather to project hourly profiles of total electricity demand. The core predictions in tell are based on a series of multilayer perceptron (MLP) models for individual Balancing Authorities (BAs). Those MLP models are trained on historical observations of weather and electricity demand. Hourly projections from tell are scaled to match the annual state-level total electricity loads projected by the U.S. version of the Global Change Analysis Model (GCAM-USA). GCAM-USA captures the long-term co-evolution of the human-Earth system. Using this unique approach allows tell to reflect both changes in the shape of the load profile due to variations in weather and the long-term evolution of energy demand due to changes in population, technology, and economics. tell is unique from other load forecasting models in that it features an explicit spatial component that allows us to relate projected loads to where they would occur spatially within a grid operations model. The output of tell is a series of hourly projections for future electricity demand at the county, state, and BA scale that are quantitatively and conceptually consistent with one another. More information about how the model works and how it can be applied are available on the Read the Docs site for tell.

Lets get started!

In this quickstarter we will walk through a series of steps for exploring tell, starting with importing the package and ending with visualizing the output. This quickstarter is based on a subset of example forcing data for tell. This allows the user to walk through the entire tell package in a matter of minutes. For the visualizations throughout this notebook, the user can choose whether or not to save these plots by setting the save_images and image_resolution flags in each function.

1. Install tell

tell is available via GitHub repository by using the pip install functionality. tell requires a Python version between 3.8 and 4.0 as well as a pip install to import the package. tell has been tested on Windows and Mac platforms. (Note: For those installing on Windows, tell is supported by GeoPandas functionality. Please see suggestions for installing GeoPandas on Windows here: https://geopandas.org/en/stable/getting_started/install.html)

# Start by importing the TELL package and information about your operating system:
import os
import tell

2. Install the package of data underpinning tell

tell is based on open, publicly accessible data. For convienence, we’ve packaged all of the data underpinning the tell quickstarter notebook into a Zenodo data package. In order to run this notebook, first set the local directory where you would like to store the package data and the run the install_quickstarter_data function below. Note that the quickstarter data package will require ~650 MB of storage and can take several minutes to download. You will also need a dataset with sample forcing data for tell, also available in a Zenodo data package. The sample forcing data package will require ~250 MB of storage.

# Identify the current working directory, the subdirectory where the data will be stored, and the image output subdirectory:
current_dir =  os.path.join(os.path.dirname(os.getcwd()))
tell_data_dir = os.path.join(current_dir, r'tell_data')
tell_image_dir = os.path.join(tell_data_dir, r'visualizations')

# If the "tell_data_dir" subdirectory doesn't exist then create it:
if not os.path.exists(tell_data_dir):
   os.makedirs(tell_data_dir)

# If the "tell_image_dir" subdirectory doesn't exist then create it:
if not os.path.exists(tell_image_dir):
   os.makedirs(tell_image_dir)
# Download the TELL quickstarter data package from Zenodo:
tell.install_quickstarter_data(data_dir = tell_data_dir)
# Download the TELL sample forcing data package from Zenodo:
tell.install_sample_forcing_data(data_dir = tell_data_dir)

3. MLP model training and projection

This section of the notebook takes the data processed in the tell_data_preprocessing.ipynb notebook and trains a multilayer perceptron (MLP) model for each of the 54 BAs in tell. The MLP models use temporal variations in weather to project hourly demand. More information about this approach is in the MLP section of the tell User Guide. We include pre-trained models within the tell repository. If you want to explore the model training aspect you can use the code in Section 3.1 to retrain the MLP models for a single BA or a batch of BAs. Note that since the save_model parameter is set to false by default running these training steps will not overwrite the models included in tell. If you want to skip this step you can move to Section 3.2 to see how tell projects future loads by BA using weather projections.

3.1. MLP training

The first step is to train the MLP models using the historical weather and load datasets. The default settings for the MLP model training steps are included in the mlp_settings.yml file included in the data folder of the tell repository. By default the MLP models are trained on data from 2016-2018 and evaluated using data from 2019. The time windows for training and evaluating the models can be modified by altering the start_time, end_time, and split_datetime parameters when calling the tell.train function. The first code block shows how to train the MLP models for a single BA. We also include a function to do some basic analysis of the trained model’s performance. More extensive evaluation of the tell predictive models is included in the tell_mlp_calibration_evaluation.ipynb notebook.

# For more information about the training of predictive models you can call the help function:
help(tell.train)
# Run the MLP training step for a single BA (i.e., "region"):
prediction_df, validation_df = tell.train(region = 'PJM',
                                          data_dir = os.path.join(tell_data_dir, r'tell_quickstarter_data', r'outputs', r'compiled_historical_data'))

# View the head of the prediction dataframe that contains the time-series of projected load in the evaluation year:
display(prediction_df.head(10))

# View validation dataframe that contains error statistics for the trained model:
validation_df

You can also train multiple BAs at the same time using parallel processing. The example code block below retrains the models for all BAs in tell.

# Generate a list of BA abbreviations to process:
ba_abbrev_list = tell.get_balancing_authority_to_model_dict().keys()

# Run the MLP training step for the list of BAs using parallel processing streams:
prediction_df, validation_df = tell.train_batch(target_region_list = ba_abbrev_list,
                                                data_dir = os.path.join(tell_data_dir, r'tell_quickstarter_data', r'outputs', r'compiled_historical_data'),
                                                n_jobs = -1)

# View the validation dataframe that contains error statistics for the trained models:
validation_df
# Plot the statistical performance (e.g., RMS_ABS, RMS_NORM, MAPE, or R2) of the predictive models across all the BAs in TELL:
tell.plot_mlp_summary_statistics(validation_df,
                                 image_output_dir = tell_image_dir,
                                 image_resolution = 150,
                                 save_images = True)

3.2. MLP model projection

Next we use the trained MLP models to project future loads in each BA using the sample forcing data downloaded in Section 2. The outcomes of this projection step are then used in the forward execution of tell in Section 4. The sample forcing data includes four years of future meteorology for each BA: 2039, 2059, 2079, and 2099. Those are the only valid options for the year variable when calling the prediciton functions.

# Run the MLP prediction step for a single BA (i.e., "region"):
pdf = tell.predict(region = 'ERCO',
                   year = 2039,
                   data_dir = os.path.join(tell_data_dir, r'sample_forcing_data', r'future_weather', r'rcp85hotter_ssp5'),
                   datetime_field_name = 'Time_UTC',
                   save_prediction = True,
                   prediction_output_directory = os.path.join(tell_data_dir, r'tell_quickstarter_data', r'outputs', r'mlp_output', r'rcp85hotter_ssp5'))

# View the prediction dataframe:
pdf
# Generate a list of BA abbreviations to process:
ba_abbrev_list = tell.get_balancing_authority_to_model_dict().keys()

# Run the MLP prediction step for the list of BAs using parallel processing streams:
pdf = tell.predict_batch(target_region_list = ba_abbrev_list,
                         year = 2039,
                         data_dir = os.path.join(tell_data_dir, r'sample_forcing_data', r'future_weather', r'rcp85hotter_ssp5'),
                         datetime_field_name = 'Time_UTC',
                         save_prediction = True,
                         prediction_output_directory = os.path.join(tell_data_dir, r'tell_quickstarter_data', r'outputs', r'mlp_output', r'rcp85hotter_ssp5'),
                         n_jobs = -1)

# View the prediction dataframe:
pdf

4. Model forward execution

This section of the tell workflow takes the .csv files produced by the tell MLP models and distributes the projected load to the counties that each BA operates in. The county-level hourly loads are then summed to the state-level and scaled to match the state-level annual loads produced by GCAM-USA. Four sets of output files are generated: county-level hourly loads, state-level hourly loads, hourly loads for each BA, and a summary file that includes state-level annual loads from TELL and GCAM-USA as well as the scaling factors. Note that since it takes a while to write out the county-level output data this output is optional. To output county-level load projections just set the save_county_data flag to true.

# Run the TELL model forward in time for a given year:
summary_df, ba_time_series_df, state_time_series_df = tell.execute_forward(year_to_process = '2039',
                                                                       gcam_target_year = '2039',
                                                                       scenario_to_process = 'rcp85hotter_ssp5',
                                                                       data_output_dir = os.path.join(tell_data_dir, r'tell_quickstarter_data', r'outputs', r'tell_output'),
                                                                       gcam_usa_input_dir = os.path.join(tell_data_dir, r'sample_forcing_data', r'sample_gcam_usa_data'),
                                                                       map_input_dir = os.path.join(tell_data_dir, r'tell_quickstarter_data', r'outputs', r'ba_service_territory'),
                                                                       mlp_input_dir = os.path.join(tell_data_dir, r'tell_quickstarter_data', r'outputs', r'mlp_output'),
                                                                       pop_input_dir = os.path.join(tell_data_dir, r'sample_forcing_data', r'sample_population_projections'),
                                                                       save_county_data = False)

5. Model visualization

The final section of this quickstarter notebook plots some of the output of tell to give the user a flavor of what the model is doing. Note that the sample output data in the tell quickstarter covers the years 2039, 2059, 2079, and 2099 so those are the only valid values for the year_to_plot variable in each function call.

5.1. Plot the state annual total loads from GCAM-USA and tell

The first visualization plots the annual total loads from both GCAM-USA and tell. The data plotted here are in units of TWh and the tell values are the unscaled projections. The scaled projections tell are by definition equal to those from GCAM-USA.

# Plot the annual total loads from both GCAM-USA and TELL:
tell.plot_state_annual_total_loads(year_to_plot = '2039',
                                   gcam_target_year = '2039',
                                   scenario_to_plot = 'rcp85hotter_ssp5',
                                   data_input_dir = tell_data_dir,
                                   image_output_dir = tell_image_dir,
                                   image_resolution = 150,
                                   save_images = True)

5.2. Plot the time-series of total hourly loads for a given state

Here we plot time-series of the raw (unscaled) and scaled total loads from tell at the state level. The user specifies which state they want to plot using the `state_to_plot” variable in the function call.

# Plot the time-series of raw and scaled loads from TELL at the state level for a user-specified state:
tell.plot_state_load_time_series(state_to_plot = 'Connecticut',
                                 year_to_plot = '2039',
                                 gcam_target_year = '2039',
                                 scenario_to_plot = 'rcp85hotter_ssp5',
                                 data_input_dir = tell_data_dir,
                                 image_output_dir = tell_image_dir,
                                 image_resolution = 150,
                                 save_images = True)

5.3. Plot the load duration curve for a given state

Our last plot at the state level is the load duration curve which shows the frequency at which a given load occurs in a state. The user specifies which state they want to plot using the “state_to_plot” variable in the function call.

# Plot the load duration curve at the state level for a user-specified state:
tell.plot_state_load_duration_curve(state_to_plot = 'North Carolina',
                                    year_to_plot = '2039',
                                    gcam_target_year = '2039',
                                    scenario_to_plot = 'rcp85hotter_ssp5',
                                    data_input_dir = tell_data_dir,
                                    image_output_dir = tell_image_dir,
                                    image_resolution = 150,
                                    save_images = True)

5.4. Plot the time-series of total hourly loads for a given BA

Our final visualization plots the time-series of the raw (unscaled) and scaled total loads from tell at the BA level. The user specifies which BA they want to plot using the “ba_to_plot” variable in the function call.

# Plot the time-series of raw and scaled loads from TELL at the BA level for a user-specified BA (e.g., PJM, CISO, ERCO, etc.):
tell.plot_ba_load_time_series(ba_to_plot = 'NYIS',
                              year_to_plot = '2039',
                              gcam_target_year = '2039',
                              scenario_to_plot = 'rcp85hotter_ssp5',
                              data_input_dir = tell_data_dir,
                              image_output_dir = tell_image_dir,
                              image_resolution = 150,
                              save_images = True)