tell.data_process_compile module

tell.data_process_compile.compile_data(start_year: int, end_year: int, data_input_dir: str)[source]

Merge the load, population, and climate data into a single .csv file for each BA

Parameters:
  • start_year (int) – Year to start process; four digit year (e.g., 1990)

  • end_year (int) – Year to end process; four digit year (e.g., 1990)

  • data_input_dir (str) – Top-level data directory for TELL

tell.data_process_eia_930 module

tell.data_process_eia_930.eia_data_subset(file_string: str, data_input_dir: str)[source]

Extract only the columns TELL needs from the EIA-930 Excel files

Parameters:
  • file_string (str) – File name of EIA-930 hourly load data by BA

  • data_input_dir (str) – Top-level data directory for TELL

tell.data_process_eia_930.list_EIA_930_files(data_input_dir: str) list[source]

Make a list of all the file names for the EIA-930 hourly load dataset

Parameters:

data_input_dir (str) – Top-level data directory for TELL

Returns:

list

tell.data_process_eia_930.process_eia_930_data(data_input_dir: str, n_jobs: int)[source]

Read in list of EIA 930 files, subset the data, and save the output as a .csv file

Parameters:
  • data_input_dir (str) – Top-level data directory for TELL

  • n_jobs (int) – The maximum number of concurrently running jobs, such as the number of Python worker processes when backend=”multiprocessing” or the size of the thread-pool when backend=”threading”. If -1 all CPUs are used. If 1 is given, no parallel computing code is used at all, which is useful for debugging. For n_jobs below -1, (n_cpus + 1 + n_jobs) are used. Thus for n_jobs = -2, all CPUs but one are used. None is a marker for ‘unset’ that will be interpreted as n_jobs=1 (sequential execution) unless the call is performed under a parallel_backend context manager that sets another value for n_jobs.

tell.data_process_population module

tell.data_process_population.ba_pop_sum(map_input_dir: str, pop_input_dir: str, start_year: int, end_year: int) DataFrame[source]

Sum the total population within a BA’s service territory in a given year

Parameters:
  • map_input_dir (str) – Directory where the BA-to-county mapping is stored

  • pop_input_dir (str) – Directory where raw county population data is stored

  • start_year (int) – Year to start process; four digit year (e.g., 1990)

  • end_year (int) – Year to end process; four digit year (e.g., 1990)

Returns:

DataFrame

tell.data_process_population.extract_future_ba_population(year: int, ba_code: str, scenario: str, data_input_dir: str) DataFrame[source]

Calculate the total population living within a BA’s service territory in a given year under a given SSP scenario.

Parameters:
  • year (int) – Year to process; four digit year (e.g., 1990)

  • ba_code (str) – Code for the BA you want to process (e.g., ‘PJM’ or ‘CISO’)

  • scenario (str) – Code for the SSP scenario you want to process (either ‘ssp3’ or ‘ssp5’)

  • data_input_dir (str) – Top-level data directory for TELL

Returns:

Hourly total population living within the BA’s service territory

tell.data_process_population.fips_pop_yearly(pop_input_dir: str, start_year: int, end_year: int) DataFrame[source]

Read in the raw population data, format columns, and return single dataframe for all years

Parameters:
  • pop_input_dir (str) – Directory where raw county population data is stored

  • start_year (int) – Year to start process; four digit year (e.g., 1990)

  • end_year (int) – Year to end process; four digit year (e.g., 1990)

Returns:

DataFrame

tell.data_process_population.merge_mapping_data(map_input_dir: str, pop_input_dir: str, start_year: int, end_year: int) DataFrame[source]

Merge the BA mapping files and historical population data based on FIPS codes

Parameters:
  • map_input_dir (str) – Directory where the BA-to-county mapping is stored

  • pop_input_dir (str) – Directory where raw county population data is stored

  • start_year (int) – Year to start process; four digit year (e.g., 1990)

  • end_year (int) – Year to end process; four digit year (e.g., 1990)

Returns:

DataFrame

tell.data_process_population.process_ba_population_data(start_year: int, end_year: int, data_input_dir: str)[source]

Calculate a time-series of the total population living with a BAs service territory

Parameters:
  • start_year (int) – Year to start process; four digit year (e.g., 1990)

  • end_year (int) – Year to end process; four digit year (e.g., 1990)

  • data_input_dir (str) – Top-level data directory for TELL

tell.data_spatial_mapping module

tell.data_spatial_mapping.map_ba_service_territory(start_year: int, end_year: int, data_input_dir: str)[source]

Workflow function to run the “process_spatial_mapping” function to map BAs to counties

Parameters:
  • start_year (int) – Year to start process; four digit year (e.g., 1990)

  • end_year (int) – Year to end process; four digit year (e.g., 1990)

  • data_input_dir (str) – Top-level data directory for TELL

tell.data_spatial_mapping.process_spatial_mapping(target_year: int, fips_file: str, service_area_file: str, sales_ult_file: str, bal_auth_file: str, output_dir: str)[source]

Workflow function to execute the mapping of BAs to counties for a given year

Parameters:
  • target_year (int) – Year to process; four digit year (e.g., 1990)

  • fips_file (str) – County FIPS code .csv file

  • service_area_file (str) – Balancing authority service area Excel file

  • sales_ult_file (str) – Balancing authority sales to ultimate customer Excel file

  • bal_auth_file (str) – Balancing authority and ID codes Excel file

  • output_dir (str) – Directory to store the output .csv file

tell.execute_forward module

tell.execute_forward.aggregate_mlp_output_files(list_of_files: list) DataFrame[source]

Aggregates a list of MLP output files into a dataframe.

Parameters:

list_of_files (list) – List of MLP output files

Returns:

DataFrame of all MLP output concatenated together

tell.execute_forward.execute_forward(year_to_process: str, gcam_target_year: str, scenario_to_process: str, data_output_dir: str, gcam_usa_input_dir: str, map_input_dir: str, mlp_input_dir: str, pop_input_dir: str, save_county_data=False)[source]

Takes the .csv files produced by the TELL MLP model and distributes the predicted load to the counties that each balancing authority (BA) operates in. The county-level hourly loads are then summed to the state-level and scaled to match the state-level annual loads produced by GCAM-USA. Three sets of output files are generated: county-level hourly loads, state-level hourly loads, and hourly loads for each BA. There is one additional summary output file that includes state-level annual loads from TELL and GCAM-USA as well as the scaling factors.

Parameters:
  • year_to_process (str) – Year to process

  • gcam_target_year (str) – Year to scale against the GCAM-USA annual loads

  • scenario_to_process (str) – Scenario to process

  • data_output_dir (str) – Top-level data directory for TELL output

  • gcam_usa_input_dir (str) – Path to where the GCAM-USA data is stored

  • map_input_dir (str) – Path to where the BA-to-county mapping data are stored

  • mlp_input_dir (str) – Path to where the TELL MLP output data are stored

  • pop_input_dir (str) – Path to where the population projection data are stored

  • save_county_data (bool) – Set to True if you want to save the time-series of load for each county

Returns:

[0] DataFrame of summary statistics [1] DataFrame of BA-level total load time-series [2] DataFrame of state-level total load time-series

tell.execute_forward.extract_gcam_usa_loads(scenario_to_process: str, gcam_usa_input_dir: str) DataFrame[source]

Extracts the state-level annual loads from a GCAM-USA output file.

Parameters:
  • scenario_to_process (str) – Scenario to process

  • gcam_usa_input_dir (str) – Path to where the GCAM-USA data are stored

Returns:

DataFrame of state-level annual total electricity loads

tell.execute_forward.output_tell_ba_data(joint_mlp_df: DataFrame, year_to_process: str, gcam_target_year: str, data_output_dir: str) DataFrame[source]

Writes a file of the time-series of hourly loads for each BA.

Parameters:
  • joint_mlp_df (DataFrame) – DataFrame of processed TELL loads

  • year_to_process (str) – Year to process

  • gcam_target_year (str) – Year to scale against the GCAM-USA annual loads

  • data_output_dir (str) – Data output directory

Returns:

DataFrame of BA-level total load time-series

tell.execute_forward.output_tell_county_data(joint_mlp_df: DataFrame, year_to_process: str, gcam_target_year: str, data_output_dir: str)[source]

Writes a file of the time-series of hourly loads for each county.

Parameters:
  • joint_mlp_df (DataFrame) – DataFrame of processed TELL loads

  • year_to_process (str) – Year to process

  • gcam_target_year (str) – Year to scale against the GCAM-USA annual loads

  • data_output_dir (str) – Data output directory

tell.execute_forward.output_tell_state_data(joint_mlp_df: DataFrame, year_to_process: str, gcam_target_year: str, data_output_dir: str) DataFrame[source]

Writes a file of the time-series of hourly loads for each state.

Parameters:
  • joint_mlp_df (DataFrame) – DataFrame of processed TELL loads

  • year_to_process (str) – Year to process

  • gcam_target_year (str) – Year to scale against the GCAM-USA annual loads

  • data_output_dir (str) – Data output directory

Returns:

DataFrame of state-level total load time-series

tell.execute_forward.output_tell_summary_data(joint_mlp_df: DataFrame, year_to_process: str, gcam_target_year: str, data_output_dir: str) DataFrame[source]

Writes a summary file describing state-level annual total loads from TELL and GCAM-USA.

Parameters:
  • joint_mlp_df (DataFrame) – DataFrame of processed TELL loads

  • year_to_process (str) – Year to process

  • gcam_target_year (str) – Year to scale against the GCAM-USA annual loads

  • data_output_dir (str) – Data output directory

Returns:

DataFrame of summary statistics

tell.execute_forward.process_population_scenario(scenario_to_process: str, population_data_input_dir: str) DataFrame[source]

Read in a future population file and interpolate the data to an annual resolution.

Parameters:
  • scenario_to_process (str) – Scenario to process

  • population_data_input_dir (str) – Path to where the sample population projections are stored

Returns:

DataFrame of population projections at an annual resolution

tell.install_forcing_data module

class tell.install_forcing_data.InstallForcingSample(data_dir=None)[source]

Bases: object

Download the TELL sample forcing data package from Zenodo that matches the current installed tell distribution

Parameters:

data_dir (str) – Optional. Full path to the directory you wish to store the data in. Default is to install it in data directory of the package.

DATA_VERSION_URLS = {'0.0.1': 'https://zenodo.org/record/6354665/files/sample_forcing_data.zip?download=1', '0.1.0': 'https://zenodo.org/record/6354665/files/sample_forcing_data.zip?download=1', '0.1.1': 'https://zenodo.org/record/6354665/files/sample_forcing_data.zip?download=1', '0.1.2': 'https://zenodo.org/record/6354665/files/sample_forcing_data.zip?download=1', '0.1.3': 'https://zenodo.org/record/6354665/files/sample_forcing_data.zip?download=1', '0.1.4': 'https://zenodo.org/record/6354665/files/sample_forcing_data.zip?download=1', '0.1.5': 'https://zenodo.org/record/6354665/files/sample_forcing_data.zip?download=1', '1.0.0': 'https://zenodo.org/record/6354665/files/sample_forcing_data.zip?download=1', '1.1.0': 'https://zenodo.org/records/13344803/files/sample_forcing_data.zip?download=1', '1.2.0': 'https://zenodo.org/records/13344803/files/sample_forcing_data.zip?download=1', '1.2.1': 'https://zenodo.org/records/13344803/files/sample_forcing_data.zip?download=1', '1.3.0': 'https://zenodo.org/records/13344803/files/sample_forcing_data.zip?download=1'}
DEFAULT_VERSION = 'https://zenodo.org/records/13344803/files/sample_forcing_data.zip?download=1'
fetch_zenodo()[source]

Download the tell sample forcing data package from Zenodo that matches the current tell distribution

tell.install_forcing_data.install_sample_forcing_data(data_dir=None)[source]

Download the tell sample forcing data package from Zenodo

Parameters:

data_dir (str) – Optional. Full path to the directory you wish to store the data in. Default is to install it in data directory of the package.

tell.install_quickstarter_data module

class tell.install_quickstarter_data.InstallQuickstarterData(data_dir=None)[source]

Bases: object

Download the TELL sample output data package from Zenodo that matches the current installed tell distribution

Parameters:

data_dir (str) – Optional. Full path to the directory you wish to store the data in. Default is to install it in data directory of the package.

DATA_VERSION_URLS = {'0.0.1': 'https://zenodo.org/record/6804242/files/tell_quickstarter_data.zip?download=1', '0.1.0': 'https://zenodo.org/record/6804242/files/tell_quickstarter_data.zip?download=1', '0.1.1': 'https://zenodo.org/record/6804242/files/tell_quickstarter_data.zip?download=1', '0.1.2': 'https://zenodo.org/record/6804242/files/tell_quickstarter_data.zip?download=1', '0.1.3': 'https://zenodo.org/record/6804242/files/tell_quickstarter_data.zip?download=1', '0.1.4': 'https://zenodo.org/record/6804242/files/tell_quickstarter_data.zip?download=1', '0.1.5': 'https://zenodo.org/record/6804242/files/tell_quickstarter_data.zip?download=1', '1.0.0': 'https://zenodo.org/record/6804242/files/tell_quickstarter_data.zip?download=1', '1.1.0': 'https://zenodo.org/records/13344957/files/tell_quickstarter_data.zip?download=1', '1.2.0': 'https://zenodo.org/records/13344957/files/tell_quickstarter_data.zip?download=1', '1.2.1': 'https://zenodo.org/records/13344957/files/tell_quickstarter_data.zip?download=1', '1.3.0': 'https://zenodo.org/records/13344957/files/tell_quickstarter_data.zip?download=1'}
DEFAULT_VERSION = 'https://zenodo.org/records/13344957/files/tell_quickstarter_data.zip?download=1'
fetch_zenodo()[source]

Download the TELL quickstarter data package from Zenodo that matches the current installed tell distribution

tell.install_quickstarter_data.install_quickstarter_data(data_dir=None)[source]

Download the TELL quickstarter data package from Zenodo

Parameters:

data_dir (str) – Optional. Full path to the directory you wish to store the data in. Default is to install it in data directory of the package.

tell.install_raw_data module

class tell.install_raw_data.InstallRawData(data_dir=None)[source]

Bases: object

Download the TELL raw data package from Zenodo that matches the current installed tell distribution.

Parameters:

data_dir (str) – Optional. Full path to the directory you wish to store the data in. Default is to install it in data directory of the package.

DATA_VERSION_URLS = {'0.0.1': 'https://zenodo.org/record/6378036/files/tell_raw_data.zip?download=1', '0.1.0': 'https://zenodo.org/record/6378036/files/tell_raw_data.zip?download=1', '0.1.1': 'https://zenodo.org/record/6378036/files/tell_raw_data.zip?download=1', '0.1.2': 'https://zenodo.org/record/6378036/files/tell_raw_data.zip?download=1', '0.1.3': 'https://zenodo.org/record/6378036/files/tell_raw_data.zip?download=1', '0.1.4': 'https://zenodo.org/record/6378036/files/tell_raw_data.zip?download=1', '0.1.5': 'https://zenodo.org/record/6378036/files/tell_raw_data.zip?download=1', '1.0.0': 'https://zenodo.org/record/6378036/files/tell_raw_data.zip?download=1', '1.1.0': 'https://zenodo.org/record/6378036/files/tell_raw_data.zip?download=1', '1.1.1': 'https://zenodo.org/record/6378036/files/tell_raw_data.zip?download=1', '1.2.0': 'https://zenodo.org/record/6378036/files/tell_raw_data.zip?download=1', '1.2.1': 'https://zenodo.org/record/6378036/files/tell_raw_data.zip?download=1', '1.3.0': 'https://zenodo.org/record/6378036/files/tell_raw_data.zip?download=1'}
DEFAULT_VERSION = 'https://zenodo.org/record/6378036/files/tell_raw_data.zip?download=1'
fetch_zenodo()[source]

Download and unpack the Zenodo raw data package for the current TELL distribution.

tell.install_raw_data.install_tell_raw_data(data_dir=None)[source]

Download and unpack the raw TELL data package from Zenodo that matches the current installed tell distribution.

Parameters:

data_dir (str) – Optional. Full path to the directory you wish to store the data in. Default is to install it in data directory of the package.

tell.metadata_eia module

tell.metadata_eia.metadata_eia(numbers: int) DataFrame[source]

Define the state FIPS code and state name from a given state abbreviation.

Parameters:

numbers (int) – EIA 930 BA number

Returns:

DataFrame with BA short and long name

tell.mlp_predict module

tell.mlp_predict.predict(region: str, year: int, data_dir: str, datetime_field_name: str = 'Time_UTC', save_prediction: bool = False, prediction_output_directory: str | None = None, **kwargs)[source]

Generate predictions for MLP model for a target region from an input CSV file.

Parameters:
  • region (str) – Indicating region / balancing authority we want to train and test on. Must match with string in CSV files.

  • year (int) – Target year to use in YYYY format.

  • data_dir (str) – Full path to the directory that houses the input CSV files.

  • save_prediction (bool) – Choice to write predictions to a .csv file

  • prediction_output_directory (Union[str, None]) – Full path to output directory where prediction files will be written.

  • datetime_field_name (str) – Name of the datetime field.

  • data_column_rename_dict (Optional[dict[str]]) – Dictionary for the field names present in the input CSV file (keys) to what the code expects them to be (values).

  • expected_datetime_columns (Optional[list[str]]) – Expected names of the date time columns in the input CSV file.

  • hour_field_name (Optional[str]) – Field name of the hour field in the input CSV file.

  • month_field_name (Optional[str]) – Field name of the month field in the input CSV file.

  • x_variables (Optional[list[str]]) – Target variable list.

  • add_dayofweek_xvars (Optional[bool]) – True if the user wishes to add weekday and holiday targets to the x variables.

  • y_variables (Optional[list[str]]) – Feature variable list.

  • day_list (Optional[list[str]]) – List of day abbreviations and their order.

  • seed_value (Optional[int]) – Seed value to reproduce randomization.

  • verbose (bool) – Choice to see logged outputs.

Returns:

Prediction data frame

tell.mlp_predict.predict_batch(target_region_list: list, year: int, data_dir: str, n_jobs: int = -1, datetime_field_name: str = 'Time_UTC', save_prediction: bool = False, prediction_output_directory: str | None = None, **kwargs)[source]

Generate predictions for MLP model for a target region from an input CSV file for all regions in input list in parallel.

Parameters:
  • target_region_list (list) – List of names indicating region / balancing authority we want to train and test on. Must match with string in CSV files.

  • year (int) – Target year to use in YYYY format.

  • data_dir (str) – Full path to the directory that houses the input CSV files.

  • n_jobs (int) – The maximum number of concurrently running jobs, such as the number of Python worker processes when backend=”multiprocessing” or the size of the thread-pool when backend=”threading”. If -1 all CPUs are used. If 1 is given, no parallel computing code is used at all, which is useful for debugging. For n_jobs below -1, (n_cpus + 1 + n_jobs) are used. Thus for n_jobs = -2, all CPUs but one are used. None is a marker for ‘unset’ that will be interpreted as n_jobs=1 (sequential execution) unless the call is performed under a parallel_backend context manager that sets another value for n_jobs.

  • datetime_field_name (str) – Name of the datetime field.

  • save_prediction (bool) – Choice to write predictions to a .csv file

  • prediction_output_directory (Union[str, None]) – Full path to output directory where prediction files will be written.

  • data_column_rename_dict (Optional[dict[str]]) – Dictionary for the field names present in the input CSV file (keys) to what the code expects them to be (values).

  • expected_datetime_columns (Optional[list[str]]) – Expected names of the date time columns in the input CSV file.

  • hour_field_name (Optional[str]) – Field name of the hour field in the input CSV file.

  • month_field_name (Optional[str]) – Field name of the month field in the input CSV file.

  • x_variables (Optional[list[str]]) – Target variable list.

  • add_dayofweek_xvars (Optional[bool]) – True if the user wishes to add weekday and holiday targets to the x variables.

  • y_variables (Optional[list[str]]) – Feature variable list.

  • day_list (Optional[list[str]]) – List of day abbreviations and their order.

  • seed_value (Optional[int]) – Seed value to reproduce randomization.

  • verbose (bool) – Choice to see logged outputs.

Returns:

Prediction data frame

tell.mlp_prepare_data module

class tell.mlp_prepare_data.DatasetPredict(region: str, year: int, data_dir: str, datetime_field_name: str = 'Time_UTC', **kwargs)[source]

Bases: DefaultSettings

Clean and format input weather data for use in predictive models.

Parameters:
  • region (str) – Indicating region / balancing authority we want to train and test on. Must match with string in CSV files.

  • year (int) – Target year to use in YYYY format.

  • data_dir (str) – Full path to the directory that houses the input CSV files.

  • datetime_field_name (str) – Name of the datetime field.

DATETIME_FIELD = 'Datetime'
HOLIDAY_FIELD = 'Holidays'
NODATA_VALUE = nan
WEEKDAY_FIELD = 'Weekday'
breakout_day_designation(df: DataFrame) DataFrame[source]

Add a field for weekday, each day of the week, and holidays to the data frame.

Weekdays are designated as 1 for weekdays (Mon through Fri) and weekends are designated as 0 (Sat and Sun). Each day of the week is given its own field which has a 1 if the record is in that day and a 0 if not. Holidays are set to 1 to indicate a US Federal holiday and 0 if not.

Parameters:

df (pd.DataFrame) – Data frame for the target region.

Returns:

[0] Formatted data frame [1] List of extended x_variables

clean_data(df: DataFrame, drop_records: bool = True) DataFrame[source]

Clean data based on criteria for handling NoData and extreme values.

Parameters:
  • df (pd.DataFrame) – Input data frame for the target region.

  • drop_records (bool) – If True, drop records; else, alter records

Returns:

Processed data frame

extract_targets_features(df) DataFrame[source]

Keep datetime, target, and feature fields.

Parameters:

df (pd.DataFrame) – Input data frame for the target region.

fetch_read_file() DataFrame[source]

Get the input file from the data directory matching the region name and year and read it into a pandas data frame.

format_filter_data(df: DataFrame) DataFrame[source]

Format the input data file. Filter data by user provided date range and sort in ascending order by the timestamp.

Parameters:

df (pd.DataFrame) – Data frame for the target region

Returns:

Formatted data frame

generate_data()[source]

Workhorse function to clean and format input data for use in the predictive model.

static update_default_settings(kwargs) dict

Read the default settings YAML file into a dictionary. Updates any settings passed in from via kwargs from the user.

Parameters:

kwargs (dict) – Keyword argument dictionary from user.

Returns:

A dictionary of updated default settings.

update_hyperparameters()

Update hyperparameter values from defaults if the user does not provide them.

class tell.mlp_prepare_data.DatasetTrain(region: str, data_dir: str, **kwargs)[source]

Bases: DefaultSettings

Clean and format input data for use in training predictive models.

Parameters:
  • region (str) – Indicating region / balancing authority we want to train and test on. Must match with string in CSV files.

  • data_dir (str) – Full path to the directory that houses the input CSV files.

DATETIME_FIELD = 'Datetime'
HOLIDAY_FIELD = 'Holidays'
NODATA_VALUE = nan
WEEKDAY_FIELD = 'Weekday'
breakout_day_designation(df: DataFrame) DataFrame[source]

Add a field for weekday, each day of the week, and holidays to the data frame.

Weekdays are designated as 1 for weekdays (Mon through Fri) and weekends are designated as 0 (Sat and Sun). Each day of the week is given its own field which has a 1 if the record is in that day and a 0 if not. Holidays are set to 1 to indicate a US Federal holiday and 0 if not.

Parameters:

df (pd.DataFrame) – Data frame for the target region.

Returns:

[0] Formatted data frame [1] List of extended x_variables

clean_data(df: DataFrame, drop_records: bool = True, iqr_scale_constant: float = 3.5) DataFrame[source]

Clean data based on criteria for handling NoData and extreme values.

Parameters:
  • df (pd.DataFrame) – Input data frame for the target region.

  • drop_records (bool) – If True, drop records; else, alter records

  • iqr_scale_constant (float) – Scale factor controlling the sensitivity of the IQR to outliers

Returns:

Processed data frame

extract_targets_features(df) DataFrame[source]

Keep datetime, target, and feature fields.

Parameters:

df (pd.DataFrame) – Input data frame for the target region.

fetch_read_file() DataFrame[source]

Get the input file from the data directory matching the region name and read it into a pandas data frame.

format_filter_data(df: DataFrame) DataFrame[source]

Format the input data file. Filter data by user provided date range and sort in ascending order by the timestamp.

Parameters:

df (pd.DataFrame) – Data frame for the target region

Returns:

Formatted data frame

generate_data()[source]

Workhorse function to clean and format input data for use in the predictive model.

iqr_outlier_detection(df: DataFrame, drop_records: bool = True, scale_constant: float = 3.5) DataFrame[source]

Outlier detection using interquartile range (IQR). Drops or adjusts outliers that are outside the acceptable range, NaN, or at or below 0.

Parameters:
  • df (pd.DataFrame) – Input data frame for the target region.

  • drop_records (bool) – If True, drop records; else, alter records

  • scale_constant (float) – Scale factor controlling the sensitivity of the IQR to outliers

Returns:

Processed data frame

split_train_test(df: DataFrame)[source]

Split the data frame into test and training data based on a datetime.

Parameters:

df (pd.DataFrame) – Input data frame for the target region.

Returns:

[0] training data frame [1] testing data frame

static update_default_settings(kwargs) dict

Read the default settings YAML file into a dictionary. Updates any settings passed in from via kwargs from the user.

Parameters:

kwargs (dict) – Keyword argument dictionary from user.

Returns:

A dictionary of updated default settings.

update_hyperparameters()

Update hyperparameter values from defaults if the user does not provide them.

class tell.mlp_prepare_data.DefaultSettings(region: str, data_dir: str, **kwargs)[source]

Bases: object

Default settings for the MLP model. Updates any settings passed in from via kwargs from the user.

Parameters:
  • region (str) – Indicating region / balancing authority we want to train and test on. Must match with string in CSV files.

  • data_dir (str) – Full path to the directory that houses the input CSV files.

  • mlp_hidden_layer_sizes (Optional[int]) – The ith element represents the number of neurons in the ith hidden layer.

  • mlp_max_iter (Optional[int]) – Maximum number of iterations. The solver iterates until convergence (determined by ‘tol’) or this number of iterations. For stochastic solvers (‘sgd’, ‘adam’), note that this determines the number of epochs (how many times each data point will be used), not the number of gradient steps.

  • mlp_validation_fraction (Optional[float]) – The proportion of training data to set aside as validation set for early stopping. Must be between 0 and 1.

  • data_column_rename_dict (Optional[dict[str]]) – Dictionary for the field names present in the input CSV file (keys) to what the code expects them to be (values).

  • expected_datetime_columns (Optional[list[str]]) – Expected names of the date time columns in the input CSV file.

  • hour_field_name (Optional[str]) – Field name of the hour field in the input CSV file.

  • month_field_name (Optional[str]) – Field name of the month field in the input CSV file.

  • year_field_name (Optional[str]) – Field name of the year field in the input CSV file.

  • x_variables (Optional[list[str]]) – Target variable list.

  • add_dayofweek_xvars (Optional[bool]) – True if the user wishes to add weekday and holiday targets to the x variables.

  • y_variables (Optional[list[str]]) – Feature variable list.

  • day_list (Optional[list[str]]) – List of day abbreviations and their order.

  • start_time (Optional[str]) – Timestamp showing the datetime of for the run to start (e.g., 2016-01-01 00:00:00).

  • end_time (Optional[str]) – Timestamp showing the datetime of for the run to end (e.g., 2019-12-31 23:00:00).

  • split_datetime (Optional[str]) – Timestamp showing the datetime to split the train and test data by (e.g., 2018-12-31 23:00:00).

  • seed_value (Optional[int]) – Seed value to reproduce randomization.

  • save_model (bool) – Choice to write ML models to a pickled file via joblib.

  • model_output_directory (Union[str, None]) – Full path to output directory where model file will be written.

  • save_prediction (bool) – Choice to write predictions to a .csv file

  • prediction_output_directory (Union[str, None]) – Full path to output directory where prediction files will be written.

  • verbose (bool) – Choice to see logged outputs.

DATETIME_FIELD = 'Datetime'
HOLIDAY_FIELD = 'Holidays'
NODATA_VALUE = nan
WEEKDAY_FIELD = 'Weekday'
static update_default_settings(kwargs) dict[source]

Read the default settings YAML file into a dictionary. Updates any settings passed in from via kwargs from the user.

Parameters:

kwargs (dict) – Keyword argument dictionary from user.

Returns:

A dictionary of updated default settings.

update_hyperparameters()[source]

Update hyperparameter values from defaults if the user does not provide them.

tell.mlp_train module

tell.mlp_train.train(region: str, data_dir: str, **kwargs)[source]

Generate predictions for MLP model for a target region from an input CSV file.

Parameters:
  • region (str) – Indicating region / balancing authority we want to train and test on. Must match with string in CSV files.

  • data_dir (str) – Full path to the directory that houses the input CSV files.

  • mlp_hidden_layer_sizes (Optional[int]) – The ith element represents the number of neurons in the ith hidden layer.

  • mlp_max_iter (Optional[int]) – Maximum number of iterations. The solver iterates until convergence (determined by ‘tol’) or this number of iterations. For stochastic solvers (‘sgd’, ‘adam’), note that this determines the number of epochs (how many times each data point will be used), not the number of gradient steps.

  • mlp_validation_fraction (Optional[float]) – The proportion of training data to set aside as validation set for early stopping. Must be between 0 and 1.

  • data_column_rename_dict (Optional[dict[str]]) – Dictionary for the field names present in the input CSV file (keys) to what the code expects them to be (values).

  • expected_datetime_columns (Optional[list[str]]) – Expected names of the date time columns in the input CSV file.

  • hour_field_name (Optional[str]) – Field name of the hour field in the input CSV file.

  • month_field_name (Optional[str]) – Field name of the month field in the input CSV file.

  • x_variables (Optional[list[str]]) – Target variable list.

  • add_dayofweek_xvars (Optional[bool]) – True if the user wishes to add weekday and holiday targets to the x variables.

  • y_variables (Optional[list[str]]) – Feature variable list.

  • day_list (Optional[list[str]]) – List of day abbreviations and their order.

  • start_time (Optional[str]) – Timestamp showing the datetime of for the run to start (e.g., 2016-01-01 00:00:00).

  • end_time (Optional[str]) – Timestamp showing the datetime of for the run to end (e.g., 2019-12-31 23:00:00).

  • split_datetime (Optional[str]) – Timestamp showing the datetime to split the train and test data by (e.g., 2018-12-31 23:00:00).

  • nodata_value (Optional[int]) – No data value in the input CSV file.

  • seed_value (Optional[int]) – Seed value to reproduce randomization.

  • save_model (bool) – Choice to write ML models to a pickled file via joblib.

  • model_output_directory (Union[str, None]) – Full path to output directory where model file will be written.

  • verbose (bool) – Choice to see logged outputs.

Returns:

[0] Predictions as a dataframe [1] Summary statistics as a dataframe

tell.mlp_train.train_batch(target_region_list: list, data_dir: str, n_jobs: int = -1, **kwargs)[source]

Generate predictions for MLP model for a target region from an input CSV file.

Parameters:
  • target_region_list (list) – List of names indicating region / balancing authority we want to train and test on. Must match with string in CSV files.

  • data_dir (str) – Full path to the directory that houses the input CSV files.

  • n_jobs (int) – The maximum number of concurrently running jobs, such as the number of Python worker processes when backend=”multiprocessing” or the size of the thread-pool when backend=”threading”. If -1 all CPUs are used. If 1 is given, no parallel computing code is used at all, which is useful for debugging. For n_jobs below -1, (n_cpus + 1 + n_jobs) are used. Thus for n_jobs = -2, all CPUs but one are used. None is a marker for ‘unset’ that will be interpreted as n_jobs=1 (sequential execution) unless the call is performed under a parallel_backend context manager that sets another value for n_jobs.

  • mlp_hidden_layer_sizes (Optional[int]) – The ith element represents the number of neurons in the ith hidden layer.

  • mlp_max_iter (Optional[int]) – Maximum number of iterations. The solver iterates until convergence (determined by ‘tol’) or this number of iterations. For stochastic solvers (‘sgd’, ‘adam’), note that this determines the number of epochs (how many times each data point will be used), not the number of gradient steps.

  • mlp_validation_fraction (Optional[float]) – The proportion of training data to set aside as validation set for early stopping. Must be between 0 and 1.

  • data_column_rename_dict (Optional[dict[str]]) – Dictionary for the field names present in the input CSV file (keys) to what the code expects them to be (values).

  • expected_datetime_columns (Optional[list[str]]) – Expected names of the date time columns in the input CSV file.

  • hour_field_name (Optional[str]) – Field name of the hour field in the input CSV file.

  • month_field_name (Optional[str]) – Field name of the month field in the input CSV file.

  • x_variables (Optional[list[str]]) – Target variable list.

  • add_dayofweek_xvars (Optional[bool]) – True if the user wishes to add weekday and holiday targets to the x variables.

  • y_variables (Optional[list[str]]) – Feature variable list.

  • day_list (Optional[list[str]]) – List of day abbreviations and their order.

  • start_time (Optional[str]) – Timestamp showing the datetime of for the run to start (e.g., 2016-01-01 00:00:00).

  • end_time (Optional[str]) – Timestamp showing the datetime of for the run to end (e.g., 2019-12-31 23:00:00).

  • split_datetime (Optional[str]) – Timestamp showing the datetime to split the train and test data by (e.g., 2018-12-31 23:00:00).

  • nodata_value (Optional[int]) – No data value in the input CSV file.

  • seed_value (Optional[int]) – Seed value to reproduce randomization.

  • save_model (bool) – Choice to write ML models to a pickled file via joblib.

  • model_output_directory (Union[str, None]) – Full path to output directory where model file will be written.

  • verbose (bool) – Choice to see logged outputs.

tell.mlp_train.train_mlp_model(region: str, x_train: ndarray, y_train: ndarray, x_test: ndarray, mlp_hidden_layer_sizes: int, mlp_max_iter: int, mlp_validation_fraction: float, save_model: bool = False, model_output_directory: str | None = None) ndarray[source]

Trains the MLP model.

Parameters:
  • region (str) – Indicating region / balancing authority we want to train and test on. Must match with string in CSV files.

  • x_train (np.ndarray) – Training features

  • y_train (np.ndarray) – Training targets

  • x_test (np.ndarray) – Test features

  • mlp_hidden_layer_sizes (int) – The ith element represents the number of neurons in the ith hidden layer.

  • mlp_max_iter (int) – Maximum number of iterations. The solver iterates until convergence (determined by ‘tol’) or this number of iterations. For stochastic solvers (‘sgd’, ‘adam’), note that this determines the number of epochs (how many times each data point will be used), not the number of gradient steps.

  • mlp_validation_fraction (float) – The proportion of training data to set aside as validation set for early stopping. Must be between 0 and 1.

  • save_model (bool) – Choice to write ML models to a pickled file via joblib.

  • model_output_directory (Union[str, None]) – Full path to output directory where model file will be written.

Returns:

y_p: np.ndarray -> predictions over test set

tell.mlp_utils module

tell.mlp_utils.denormalize_features(region: str, normalized_dict: dict, y_predicted_normalized: ndarray, y_comparison: ndarray, datetime_arr: ndarray) DataFrame[source]

Function to denormalize the predictions of the model.

Parameters:
  • region (str) – Indicating region / balancing authority we want to train and test on. Must match with string in CSV files.

  • normalized_dict (dict) – Dictionary output from normalization function.

  • y_predicted_normalized (np.ndarray) – Normalized predictions over the test set.

  • y_comparison (np.ndarray) – Testing data to compare predictions to.

  • datetime_arr (np.ndarray) – Array of datetimes corresponding to the predictions.

Returns:

Denormalized predictions

tell.mlp_utils.evaluate(region: str, y_predicted: ndarray, y_comparison: ndarray) DataFrame[source]

Evaluation of model performance using the predicted compared to the test data.

Parameters:
  • region (str) – Indicating region / balancing authority we want to train and test on. Must match with string in CSV files.

  • y_predicted (np.ndarray) – Predicted Y result array.

  • y_comparison (np.ndarray) – Comparison test data for Y array.

Returns:

Data frame of stats.

tell.mlp_utils.get_balancing_authority_to_model_dict()[source]

Return a list of balancing authority abbreviations.

tell.mlp_utils.load_model(model_file: str) object[source]

Load pickled model from file using joblib. Version of scikit-learn is included in the file name as a compatible version is required to reload the data safely.

Parameters:

model_file (str) – Full path with filename an extension to the joblib pickled model file.

Returns:

Model as an object.

tell.mlp_utils.load_normalization_dict(file: str) dict[source]

Load pickled model from file using joblib.

Parameters:

file (str) – Full path with file name and extension to the pickled normalization dictionary

Returns:

Normalization dictionary

tell.mlp_utils.load_predictive_models(region: str, model_output_directory: str | None)[source]

Load predictive models and the normalization dictionary based off of what is stored in the package or from a user provided directory. The scikit-learn version being used must match the one the model was generated with.

Parameters:
  • region (str) – Indicating region / balancing authority we want to train and test on. Must match with string in CSV files.

  • model_output_directory (Union[str, None]) – Full path to output directory where model file will be written.

Returns:

[0] MLP model [1] normalization dictionary

tell.mlp_utils.normalize_features(x_train: ndarray, x_test: ndarray, y_train: ndarray, y_test: ndarray) dict[source]

Normalize the features and targets of the model.

Parameters:
  • x_train (np.ndarray) – Training features

  • x_test (np.ndarray) – Test features

  • y_train (np.ndarray) – Training targets

  • y_test (np.ndarray) – Training targets

Returns:

Dictionary of scaled features

tell.mlp_utils.normalize_prediction_data(data_arr: ndarray, min_train_arr: ndarray, max_train_arr: ndarray) ndarray[source]

Normalize target data using exising min, max for training data.

Parameters:
  • data_arr (np.ndarray) – Array of target data

  • min_train_arr (np.ndarray) – Array of previously trained minimum target data

  • max_train_arr (np.ndarray) – Array of previously trained minimum target data

tell.mlp_utils.pickle_model(region: str, model_object: object, model_name: str, model_output_directory: str | None)[source]

Pickle model to file using joblib. Version of scikit-learn is included in the file name as a compatible version is required to reload the data safely.

Parameters:
  • region (str) – Indicating region / balancing authority we want to train and test on. Must match with string in CSV files.

  • model_object (object) – scikit-learn model object.

  • model_name (str) – Name of sklearn model.

  • model_output_directory (str) – Full path to output directory where model file will be written.

tell.mlp_utils.pickle_normalization_dict(region: str, normalization_dict: dict, model_output_directory: str | None)[source]

Pickle model to file using joblib. Version of scikit-learn is included in the file name as a compatible version is required to reload the data safely.

Parameters:
  • region (str) – Indicating region / balancing authority we want to train and test on. Must match with string in CSV files.

  • normalization_dict (dict) – Dictionary of normalization data

  • model_output_directory (str) – Full path to output directory where model file will be written.

tell.package_data module

tell.package_data.get_ba_abbreviations() list[source]

Get balancing authority abbreviations from the reference YAML file.

Returns:

List of BA abbreviations

tell.package_data.read_yaml(yaml_file: str) dict[source]

Read a YAML file.

Parameters:

yaml_file (str) – Full path with file name and extension to the input YAML file

Returns:

Dictionary

tell.states_fips_function module

tell.states_fips_function.state_metadata_from_state_abbreviation(state_abbreviation: str) tuple[int, str][source]

Define the state FIPS code and state name from a given state abbreviation.

Parameters:

state_abbreviation (str) – state abbreviation

Returns:

[0] state FIPS code [1] state name

tell.visualization module

tell.visualization.plot_ba_load_time_series(ba_to_plot: str, year_to_plot: str, gcam_target_year: str, scenario_to_plot: str, data_input_dir: str, image_output_dir: str, image_resolution: int, save_images=False)[source]

Plot the time series of load for a given Balancing Authority

Parameters:
  • ba_to_plot (str) – Balancing Authority code for the BA you want to plot

  • year_to_plot (str) – Year you want to plot (valid 2039, 2059, 2079, 2099)

  • gcam_target_year (str) – Year to scale against the GCAM-USA annual loads

  • scenario_to_plot (str) – Scenario you want to plot

  • data_input_dir (str) – Top-level data directory for TELL

  • image_output_dir (str) – Directory to store the images

  • image_resolution (int) – Resolution at which you want to save the images in DPI

  • save_images (bool) – Set to True if you want to save the images after they’re generated

tell.visualization.plot_ba_service_territory(ba_to_plot: str, year_to_plot: str, data_input_dir: str, image_output_dir: str, image_resolution: int, save_images=False)[source]

Plot maps of the service territory for a given BA in a given year

Parameters:
  • ba_to_plot (str) – Code for the BA you want to plot

  • year_to_plot (str) – Year you want to plot (valid 2015-2019)

  • data_input_dir (str) – Top-level data directory for TELL

  • image_output_dir (str) – Directory to store the images

  • image_resolution (int) – Resolution at which you want to save the images in DPI

  • save_images (bool) – Set to True if you want to save the images after they’re generated

tell.visualization.plot_ba_variable_correlations(ba_to_plot: str, data_input_dir: str, image_output_dir: str, image_resolution: int, save_images=False)[source]

Plot the correlation matrix between predictive variables and observed demand for individual or all BAs.

Parameters:
  • ba_to_plot (str) – BA code for the BA you want to plot. Set to “All” to plot the average correlation across all BAs.

  • data_input_dir (str) – Top-level data directory for TELL

  • image_output_dir (str) – Directory to store the images

  • image_resolution (int) – Resolution at which you want to save the images in DPI

  • save_images (bool) – Set to True if you want to save the images after they’re generated

tell.visualization.plot_mlp_ba_peak_week(prediction_df, ba_to_plot: str, image_output_dir: str, image_resolution: int, save_images=False)[source]

Plot the time-series of load during the peak week of the year for a given BA.

Parameters:
  • prediction_df (df) – Prediction dataframe produced by the batch training of MLP models for all BAs

  • ba_to_plot (str) – Code for the BA you want to plot

  • image_output_dir (str) – Directory to store the images

  • image_resolution (int) – Resolution at which you want to save the images in DPI

  • save_images (bool) – Set to True if you want to save the images after they’re generated

tell.visualization.plot_mlp_ba_time_series(prediction_df, ba_to_plot: str, image_output_dir: str, image_resolution: int, save_images=False)[source]

Plot the performance metrics for an individual BA

Parameters:
  • prediction_df (df) – Prediction dataframe produced by the batch training of MLP models for all BAs

  • ba_to_plot (str) – Code for the BA you want to plot

  • image_output_dir (str) – Directory to store the images

  • image_resolution (int) – Resolution at which you want to save the images in DPI

  • save_images (bool) – Set to True if you want to save the images after they’re generated

tell.visualization.plot_mlp_errors_vs_load(prediction_df, validation_df, image_output_dir: str, image_resolution: int, save_images=False)[source]

Plot the summary statistics of the MLP evaluation data as a function of mean load

Parameters:
  • prediction_df (df) – Prediction dataframe produced by the batch training of MLP models for all BAs

  • validation_df (df) – Validation dataframe produced by the batch training of MLP models for all BAs

  • image_output_dir (str) – Directory to store the images

  • image_resolution (int) – Resolution at which you want to save the images in DPI

  • save_images (bool) – Set to True if you want to save the images after they’re generated

tell.visualization.plot_mlp_summary_statistics(validation_df, image_output_dir: str, image_resolution: int, save_images=False)[source]

Plot the summary statistics of the MLP evaluation data across BAs

Parameters:
  • validation_df – Validation dataframe produced by the batch training of MLP models for all BAs

  • image_output_dir (str) – Directory to store the images

  • image_resolution (int) – Resolution at which you want to save the images in DPI

  • save_images (bool) – Set to True if you want to save the images after they’re generated

tell.visualization.plot_state_annual_total_loads(year_to_plot: str, gcam_target_year: str, scenario_to_plot: str, data_input_dir: str, image_output_dir: str, image_resolution: int, save_images=False)[source]

Plot annual total loads from both GCAM-USA and TELL

Parameters:
  • year_to_plot (str) – Year you want to plot (valid 2039, 2059, 2079, 2099)

  • gcam_target_year (str) – Year to scale against the GCAM-USA annual loads

  • scenario_to_plot (str) – Scenario you want to plot

  • data_input_dir (str) – Top-level data directory for TELL

  • image_output_dir (str) – Directory to store the images

  • image_resolution (int) – Resolution at which you want to save the images in DPI

  • save_images (bool) – Set to True if you want to save the images after they’re generated

tell.visualization.plot_state_load_duration_curve(state_to_plot: str, year_to_plot: str, gcam_target_year: str, scenario_to_plot: str, data_input_dir: str, image_output_dir: str, image_resolution: int, save_images=False)[source]

Plot the load duration curve for a given state

Parameters:
  • state_to_plot (str) – State you want to plot

  • year_to_plot (str) – Year you want to plot (valid 2039, 2059, 2079, 2099)

  • gcam_target_year (str) – Year to scale against the GCAM-USA annual loads

  • scenario_to_plot (str) – Scenario you want to plot

  • data_input_dir (str) – Top-level data directory for TELL

  • image_output_dir (str) – Directory to store the images

  • image_resolution (int) – Resolution at which you want to save the images in DPI

  • save_images (bool) – Set to True if you want to save the images after they’re generated

tell.visualization.plot_state_load_time_series(state_to_plot: str, year_to_plot: str, gcam_target_year: str, scenario_to_plot: str, data_input_dir: str, image_output_dir: str, image_resolution: int, save_images=False)[source]

Plot the time series of load for a given state

Parameters:
  • state_to_plot (str) – State you want to plot

  • year_to_plot (str) – Year you want to plot (valid 2039, 2059, 2079, 2099)

  • gcam_target_year (str) – Year to scale against the GCAM-USA annual loads

  • scenario_to_plot (str) – Scenario you want to plot

  • data_input_dir (str) – Top-level data directory for TELL

  • image_output_dir (str) – Directory to store the images

  • image_resolution (int) – Resolution at which you want to save the images in DPI

  • save_images (bool) – Set to True if you want to save the images after they’re generated

tell.visualization.plot_state_scaling_factors(year_to_plot: str, gcam_target_year: str, scenario_to_plot: str, data_input_dir: str, image_output_dir: str, image_resolution: int, save_images=False)[source]

Plot the scaling factor that force TELL annual total state loads to agree with GCAM-USA

Parameters:
  • year_to_plot (str) – Year you want to plot (valid 2039, 2059, 2079, 2099)

  • gcam_target_year (str) – Year to scale against the GCAM-USA annual loads

  • scenario_to_plot (str) – Scenario you want to plot

  • data_input_dir (str) – Top-level data directory for TELL

  • image_output_dir (str) – Directory to store the images

  • image_resolution (int) – Resolution at which you want to save the images in DPI

  • save_images (bool) – Set to True if you want to save the images after they’re generated