tell.data_process_compile module
- tell.data_process_compile.compile_data(start_year: int, end_year: int, data_input_dir: str)[source]
Merge the load, population, and climate data into a single .csv file for each BA
- Parameters:
start_year (int) – Year to start process; four digit year (e.g., 1990)
end_year (int) – Year to end process; four digit year (e.g., 1990)
data_input_dir (str) – Top-level data directory for TELL
tell.data_process_eia_930 module
- tell.data_process_eia_930.eia_data_subset(file_string: str, data_input_dir: str)[source]
Extract only the columns TELL needs from the EIA-930 Excel files
- Parameters:
file_string (str) – File name of EIA-930 hourly load data by BA
data_input_dir (str) – Top-level data directory for TELL
- tell.data_process_eia_930.list_EIA_930_files(data_input_dir: str) list [source]
Make a list of all the file names for the EIA-930 hourly load dataset
- Parameters:
data_input_dir (str) – Top-level data directory for TELL
- Returns:
list
- tell.data_process_eia_930.process_eia_930_data(data_input_dir: str, n_jobs: int)[source]
Read in list of EIA 930 files, subset the data, and save the output as a .csv file
- Parameters:
data_input_dir (str) – Top-level data directory for TELL
n_jobs (int) – The maximum number of concurrently running jobs, such as the number of Python worker processes when backend=”multiprocessing” or the size of the thread-pool when backend=”threading”. If -1 all CPUs are used. If 1 is given, no parallel computing code is used at all, which is useful for debugging. For n_jobs below -1, (n_cpus + 1 + n_jobs) are used. Thus for n_jobs = -2, all CPUs but one are used. None is a marker for ‘unset’ that will be interpreted as n_jobs=1 (sequential execution) unless the call is performed under a parallel_backend context manager that sets another value for n_jobs.
tell.data_process_population module
- tell.data_process_population.ba_pop_sum(map_input_dir: str, pop_input_dir: str, start_year: int, end_year: int) DataFrame [source]
Sum the total population within a BA’s service territory in a given year
- Parameters:
map_input_dir (str) – Directory where the BA-to-county mapping is stored
pop_input_dir (str) – Directory where raw county population data is stored
start_year (int) – Year to start process; four digit year (e.g., 1990)
end_year (int) – Year to end process; four digit year (e.g., 1990)
- Returns:
DataFrame
- tell.data_process_population.extract_future_ba_population(year: int, ba_code: str, scenario: str, data_input_dir: str) DataFrame [source]
Calculate the total population living within a BA’s service territory in a given year under a given SSP scenario.
- Parameters:
year (int) – Year to process; four digit year (e.g., 1990)
ba_code (str) – Code for the BA you want to process (e.g., ‘PJM’ or ‘CISO’)
scenario (str) – Code for the SSP scenario you want to process (either ‘ssp3’ or ‘ssp5’)
data_input_dir (str) – Top-level data directory for TELL
- Returns:
Hourly total population living within the BA’s service territory
- tell.data_process_population.fips_pop_yearly(pop_input_dir: str, start_year: int, end_year: int) DataFrame [source]
Read in the raw population data, format columns, and return single dataframe for all years
- Parameters:
pop_input_dir (str) – Directory where raw county population data is stored
start_year (int) – Year to start process; four digit year (e.g., 1990)
end_year (int) – Year to end process; four digit year (e.g., 1990)
- Returns:
DataFrame
- tell.data_process_population.merge_mapping_data(map_input_dir: str, pop_input_dir: str, start_year: int, end_year: int) DataFrame [source]
Merge the BA mapping files and historical population data based on FIPS codes
- Parameters:
map_input_dir (str) – Directory where the BA-to-county mapping is stored
pop_input_dir (str) – Directory where raw county population data is stored
start_year (int) – Year to start process; four digit year (e.g., 1990)
end_year (int) – Year to end process; four digit year (e.g., 1990)
- Returns:
DataFrame
- tell.data_process_population.process_ba_population_data(start_year: int, end_year: int, data_input_dir: str)[source]
Calculate a time-series of the total population living with a BAs service territory
- Parameters:
start_year (int) – Year to start process; four digit year (e.g., 1990)
end_year (int) – Year to end process; four digit year (e.g., 1990)
data_input_dir (str) – Top-level data directory for TELL
tell.data_spatial_mapping module
- tell.data_spatial_mapping.map_ba_service_territory(start_year: int, end_year: int, data_input_dir: str)[source]
Workflow function to run the “process_spatial_mapping” function to map BAs to counties
- Parameters:
start_year (int) – Year to start process; four digit year (e.g., 1990)
end_year (int) – Year to end process; four digit year (e.g., 1990)
data_input_dir (str) – Top-level data directory for TELL
- tell.data_spatial_mapping.process_spatial_mapping(target_year: int, fips_file: str, service_area_file: str, sales_ult_file: str, bal_auth_file: str, output_dir: str)[source]
Workflow function to execute the mapping of BAs to counties for a given year
- Parameters:
target_year (int) – Year to process; four digit year (e.g., 1990)
fips_file (str) – County FIPS code .csv file
service_area_file (str) – Balancing authority service area Excel file
sales_ult_file (str) – Balancing authority sales to ultimate customer Excel file
bal_auth_file (str) – Balancing authority and ID codes Excel file
output_dir (str) – Directory to store the output .csv file
tell.execute_forward module
- tell.execute_forward.aggregate_mlp_output_files(list_of_files: list) DataFrame [source]
Aggregates a list of MLP output files into a dataframe.
- Parameters:
list_of_files (list) – List of MLP output files
- Returns:
DataFrame of all MLP output concatenated together
- tell.execute_forward.execute_forward(year_to_process: str, gcam_target_year: str, scenario_to_process: str, data_output_dir: str, gcam_usa_input_dir: str, map_input_dir: str, mlp_input_dir: str, pop_input_dir: str, save_county_data=False)[source]
Takes the .csv files produced by the TELL MLP model and distributes the predicted load to the counties that each balancing authority (BA) operates in. The county-level hourly loads are then summed to the state-level and scaled to match the state-level annual loads produced by GCAM-USA. Three sets of output files are generated: county-level hourly loads, state-level hourly loads, and hourly loads for each BA. There is one additional summary output file that includes state-level annual loads from TELL and GCAM-USA as well as the scaling factors.
- Parameters:
year_to_process (str) – Year to process
gcam_target_year (str) – Year to scale against the GCAM-USA annual loads
scenario_to_process (str) – Scenario to process
data_output_dir (str) – Top-level data directory for TELL output
gcam_usa_input_dir (str) – Path to where the GCAM-USA data is stored
map_input_dir (str) – Path to where the BA-to-county mapping data are stored
mlp_input_dir (str) – Path to where the TELL MLP output data are stored
pop_input_dir (str) – Path to where the population projection data are stored
save_county_data (bool) – Set to True if you want to save the time-series of load for each county
- Returns:
[0] DataFrame of summary statistics [1] DataFrame of BA-level total load time-series [2] DataFrame of state-level total load time-series
- tell.execute_forward.extract_gcam_usa_loads(scenario_to_process: str, gcam_usa_input_dir: str) DataFrame [source]
Extracts the state-level annual loads from a GCAM-USA output file.
- Parameters:
scenario_to_process (str) – Scenario to process
gcam_usa_input_dir (str) – Path to where the GCAM-USA data are stored
- Returns:
DataFrame of state-level annual total electricity loads
- tell.execute_forward.output_tell_ba_data(joint_mlp_df: DataFrame, year_to_process: str, gcam_target_year: str, data_output_dir: str) DataFrame [source]
Writes a file of the time-series of hourly loads for each BA.
- Parameters:
joint_mlp_df (DataFrame) – DataFrame of processed TELL loads
year_to_process (str) – Year to process
gcam_target_year (str) – Year to scale against the GCAM-USA annual loads
data_output_dir (str) – Data output directory
- Returns:
DataFrame of BA-level total load time-series
- tell.execute_forward.output_tell_county_data(joint_mlp_df: DataFrame, year_to_process: str, gcam_target_year: str, data_output_dir: str)[source]
Writes a file of the time-series of hourly loads for each county.
- Parameters:
joint_mlp_df (DataFrame) – DataFrame of processed TELL loads
year_to_process (str) – Year to process
gcam_target_year (str) – Year to scale against the GCAM-USA annual loads
data_output_dir (str) – Data output directory
- tell.execute_forward.output_tell_state_data(joint_mlp_df: DataFrame, year_to_process: str, gcam_target_year: str, data_output_dir: str) DataFrame [source]
Writes a file of the time-series of hourly loads for each state.
- Parameters:
joint_mlp_df (DataFrame) – DataFrame of processed TELL loads
year_to_process (str) – Year to process
gcam_target_year (str) – Year to scale against the GCAM-USA annual loads
data_output_dir (str) – Data output directory
- Returns:
DataFrame of state-level total load time-series
- tell.execute_forward.output_tell_summary_data(joint_mlp_df: DataFrame, year_to_process: str, gcam_target_year: str, data_output_dir: str) DataFrame [source]
Writes a summary file describing state-level annual total loads from TELL and GCAM-USA.
- Parameters:
joint_mlp_df (DataFrame) – DataFrame of processed TELL loads
year_to_process (str) – Year to process
gcam_target_year (str) – Year to scale against the GCAM-USA annual loads
data_output_dir (str) – Data output directory
- Returns:
DataFrame of summary statistics
- tell.execute_forward.process_population_scenario(scenario_to_process: str, population_data_input_dir: str) DataFrame [source]
Read in a future population file and interpolate the data to an annual resolution.
- Parameters:
scenario_to_process (str) – Scenario to process
population_data_input_dir (str) – Path to where the sample population projections are stored
- Returns:
DataFrame of population projections at an annual resolution
tell.install_forcing_data module
- class tell.install_forcing_data.InstallForcingSample(data_dir=None)[source]
Bases:
object
Download the TELL sample forcing data package from Zenodo that matches the current installed tell distribution
- Parameters:
data_dir (str) – Optional. Full path to the directory you wish to store the data in. Default is to install it in data directory of the package.
- DATA_VERSION_URLS = {'0.0.1': 'https://zenodo.org/record/6354665/files/sample_forcing_data.zip?download=1', '0.1.0': 'https://zenodo.org/record/6354665/files/sample_forcing_data.zip?download=1', '0.1.1': 'https://zenodo.org/record/6354665/files/sample_forcing_data.zip?download=1', '0.1.2': 'https://zenodo.org/record/6354665/files/sample_forcing_data.zip?download=1', '0.1.3': 'https://zenodo.org/record/6354665/files/sample_forcing_data.zip?download=1', '0.1.4': 'https://zenodo.org/record/6354665/files/sample_forcing_data.zip?download=1', '0.1.5': 'https://zenodo.org/record/6354665/files/sample_forcing_data.zip?download=1', '1.0.0': 'https://zenodo.org/record/6354665/files/sample_forcing_data.zip?download=1', '1.1.0': 'https://zenodo.org/records/13344803/files/sample_forcing_data.zip?download=1', '1.2.0': 'https://zenodo.org/records/13344803/files/sample_forcing_data.zip?download=1', '1.2.1': 'https://zenodo.org/records/13344803/files/sample_forcing_data.zip?download=1', '1.3.0': 'https://zenodo.org/records/13344803/files/sample_forcing_data.zip?download=1'}
- DEFAULT_VERSION = 'https://zenodo.org/records/13344803/files/sample_forcing_data.zip?download=1'
tell.install_quickstarter_data module
- class tell.install_quickstarter_data.InstallQuickstarterData(data_dir=None)[source]
Bases:
object
Download the TELL sample output data package from Zenodo that matches the current installed tell distribution
- Parameters:
data_dir (str) – Optional. Full path to the directory you wish to store the data in. Default is to install it in data directory of the package.
- DATA_VERSION_URLS = {'0.0.1': 'https://zenodo.org/record/6804242/files/tell_quickstarter_data.zip?download=1', '0.1.0': 'https://zenodo.org/record/6804242/files/tell_quickstarter_data.zip?download=1', '0.1.1': 'https://zenodo.org/record/6804242/files/tell_quickstarter_data.zip?download=1', '0.1.2': 'https://zenodo.org/record/6804242/files/tell_quickstarter_data.zip?download=1', '0.1.3': 'https://zenodo.org/record/6804242/files/tell_quickstarter_data.zip?download=1', '0.1.4': 'https://zenodo.org/record/6804242/files/tell_quickstarter_data.zip?download=1', '0.1.5': 'https://zenodo.org/record/6804242/files/tell_quickstarter_data.zip?download=1', '1.0.0': 'https://zenodo.org/record/6804242/files/tell_quickstarter_data.zip?download=1', '1.1.0': 'https://zenodo.org/records/13344957/files/tell_quickstarter_data.zip?download=1', '1.2.0': 'https://zenodo.org/records/13344957/files/tell_quickstarter_data.zip?download=1', '1.2.1': 'https://zenodo.org/records/13344957/files/tell_quickstarter_data.zip?download=1', '1.3.0': 'https://zenodo.org/records/13344957/files/tell_quickstarter_data.zip?download=1'}
- DEFAULT_VERSION = 'https://zenodo.org/records/13344957/files/tell_quickstarter_data.zip?download=1'
tell.install_raw_data module
- class tell.install_raw_data.InstallRawData(data_dir=None)[source]
Bases:
object
Download the TELL raw data package from Zenodo that matches the current installed tell distribution.
- Parameters:
data_dir (str) – Optional. Full path to the directory you wish to store the data in. Default is to install it in data directory of the package.
- DATA_VERSION_URLS = {'0.0.1': 'https://zenodo.org/record/6378036/files/tell_raw_data.zip?download=1', '0.1.0': 'https://zenodo.org/record/6378036/files/tell_raw_data.zip?download=1', '0.1.1': 'https://zenodo.org/record/6378036/files/tell_raw_data.zip?download=1', '0.1.2': 'https://zenodo.org/record/6378036/files/tell_raw_data.zip?download=1', '0.1.3': 'https://zenodo.org/record/6378036/files/tell_raw_data.zip?download=1', '0.1.4': 'https://zenodo.org/record/6378036/files/tell_raw_data.zip?download=1', '0.1.5': 'https://zenodo.org/record/6378036/files/tell_raw_data.zip?download=1', '1.0.0': 'https://zenodo.org/record/6378036/files/tell_raw_data.zip?download=1', '1.1.0': 'https://zenodo.org/record/6378036/files/tell_raw_data.zip?download=1', '1.1.1': 'https://zenodo.org/record/6378036/files/tell_raw_data.zip?download=1', '1.2.0': 'https://zenodo.org/record/6378036/files/tell_raw_data.zip?download=1', '1.2.1': 'https://zenodo.org/record/6378036/files/tell_raw_data.zip?download=1', '1.3.0': 'https://zenodo.org/record/6378036/files/tell_raw_data.zip?download=1'}
- DEFAULT_VERSION = 'https://zenodo.org/record/6378036/files/tell_raw_data.zip?download=1'
- tell.install_raw_data.install_tell_raw_data(data_dir=None)[source]
Download and unpack the raw TELL data package from Zenodo that matches the current installed tell distribution.
- Parameters:
data_dir (str) – Optional. Full path to the directory you wish to store the data in. Default is to install it in data directory of the package.
tell.metadata_eia module
tell.mlp_predict module
- tell.mlp_predict.predict(region: str, year: int, data_dir: str, datetime_field_name: str = 'Time_UTC', save_prediction: bool = False, prediction_output_directory: str | None = None, **kwargs)[source]
Generate predictions for MLP model for a target region from an input CSV file.
- Parameters:
region (str) – Indicating region / balancing authority we want to train and test on. Must match with string in CSV files.
year (int) – Target year to use in YYYY format.
data_dir (str) – Full path to the directory that houses the input CSV files.
save_prediction (bool) – Choice to write predictions to a .csv file
prediction_output_directory (Union[str, None]) – Full path to output directory where prediction files will be written.
datetime_field_name (str) – Name of the datetime field.
data_column_rename_dict (Optional[dict[str]]) – Dictionary for the field names present in the input CSV file (keys) to what the code expects them to be (values).
expected_datetime_columns (Optional[list[str]]) – Expected names of the date time columns in the input CSV file.
hour_field_name (Optional[str]) – Field name of the hour field in the input CSV file.
month_field_name (Optional[str]) – Field name of the month field in the input CSV file.
x_variables (Optional[list[str]]) – Target variable list.
add_dayofweek_xvars (Optional[bool]) – True if the user wishes to add weekday and holiday targets to the x variables.
y_variables (Optional[list[str]]) – Feature variable list.
day_list (Optional[list[str]]) – List of day abbreviations and their order.
seed_value (Optional[int]) – Seed value to reproduce randomization.
verbose (bool) – Choice to see logged outputs.
- Returns:
Prediction data frame
- tell.mlp_predict.predict_batch(target_region_list: list, year: int, data_dir: str, n_jobs: int = -1, datetime_field_name: str = 'Time_UTC', save_prediction: bool = False, prediction_output_directory: str | None = None, **kwargs)[source]
Generate predictions for MLP model for a target region from an input CSV file for all regions in input list in parallel.
- Parameters:
target_region_list (list) – List of names indicating region / balancing authority we want to train and test on. Must match with string in CSV files.
year (int) – Target year to use in YYYY format.
data_dir (str) – Full path to the directory that houses the input CSV files.
n_jobs (int) – The maximum number of concurrently running jobs, such as the number of Python worker processes when backend=”multiprocessing” or the size of the thread-pool when backend=”threading”. If -1 all CPUs are used. If 1 is given, no parallel computing code is used at all, which is useful for debugging. For n_jobs below -1, (n_cpus + 1 + n_jobs) are used. Thus for n_jobs = -2, all CPUs but one are used. None is a marker for ‘unset’ that will be interpreted as n_jobs=1 (sequential execution) unless the call is performed under a parallel_backend context manager that sets another value for n_jobs.
datetime_field_name (str) – Name of the datetime field.
save_prediction (bool) – Choice to write predictions to a .csv file
prediction_output_directory (Union[str, None]) – Full path to output directory where prediction files will be written.
data_column_rename_dict (Optional[dict[str]]) – Dictionary for the field names present in the input CSV file (keys) to what the code expects them to be (values).
expected_datetime_columns (Optional[list[str]]) – Expected names of the date time columns in the input CSV file.
hour_field_name (Optional[str]) – Field name of the hour field in the input CSV file.
month_field_name (Optional[str]) – Field name of the month field in the input CSV file.
x_variables (Optional[list[str]]) – Target variable list.
add_dayofweek_xvars (Optional[bool]) – True if the user wishes to add weekday and holiday targets to the x variables.
y_variables (Optional[list[str]]) – Feature variable list.
day_list (Optional[list[str]]) – List of day abbreviations and their order.
seed_value (Optional[int]) – Seed value to reproduce randomization.
verbose (bool) – Choice to see logged outputs.
- Returns:
Prediction data frame
tell.mlp_prepare_data module
- class tell.mlp_prepare_data.DatasetPredict(region: str, year: int, data_dir: str, datetime_field_name: str = 'Time_UTC', **kwargs)[source]
Bases:
DefaultSettings
Clean and format input weather data for use in predictive models.
- Parameters:
region (str) – Indicating region / balancing authority we want to train and test on. Must match with string in CSV files.
year (int) – Target year to use in YYYY format.
data_dir (str) – Full path to the directory that houses the input CSV files.
datetime_field_name (str) – Name of the datetime field.
- DATETIME_FIELD = 'Datetime'
- HOLIDAY_FIELD = 'Holidays'
- NODATA_VALUE = nan
- WEEKDAY_FIELD = 'Weekday'
- breakout_day_designation(df: DataFrame) DataFrame [source]
Add a field for weekday, each day of the week, and holidays to the data frame.
Weekdays are designated as 1 for weekdays (Mon through Fri) and weekends are designated as 0 (Sat and Sun). Each day of the week is given its own field which has a 1 if the record is in that day and a 0 if not. Holidays are set to 1 to indicate a US Federal holiday and 0 if not.
- Parameters:
df (pd.DataFrame) – Data frame for the target region.
- Returns:
[0] Formatted data frame [1] List of extended x_variables
- clean_data(df: DataFrame, drop_records: bool = True) DataFrame [source]
Clean data based on criteria for handling NoData and extreme values.
- Parameters:
df (pd.DataFrame) – Input data frame for the target region.
drop_records (bool) – If True, drop records; else, alter records
- Returns:
Processed data frame
- extract_targets_features(df) DataFrame [source]
Keep datetime, target, and feature fields.
- Parameters:
df (pd.DataFrame) – Input data frame for the target region.
- fetch_read_file() DataFrame [source]
Get the input file from the data directory matching the region name and year and read it into a pandas data frame.
- format_filter_data(df: DataFrame) DataFrame [source]
Format the input data file. Filter data by user provided date range and sort in ascending order by the timestamp.
- Parameters:
df (pd.DataFrame) – Data frame for the target region
- Returns:
Formatted data frame
- generate_data()[source]
Workhorse function to clean and format input data for use in the predictive model.
- static update_default_settings(kwargs) dict
Read the default settings YAML file into a dictionary. Updates any settings passed in from via kwargs from the user.
- Parameters:
kwargs (dict) – Keyword argument dictionary from user.
- Returns:
A dictionary of updated default settings.
- update_hyperparameters()
Update hyperparameter values from defaults if the user does not provide them.
- class tell.mlp_prepare_data.DatasetTrain(region: str, data_dir: str, **kwargs)[source]
Bases:
DefaultSettings
Clean and format input data for use in training predictive models.
- Parameters:
region (str) – Indicating region / balancing authority we want to train and test on. Must match with string in CSV files.
data_dir (str) – Full path to the directory that houses the input CSV files.
- DATETIME_FIELD = 'Datetime'
- HOLIDAY_FIELD = 'Holidays'
- NODATA_VALUE = nan
- WEEKDAY_FIELD = 'Weekday'
- breakout_day_designation(df: DataFrame) DataFrame [source]
Add a field for weekday, each day of the week, and holidays to the data frame.
Weekdays are designated as 1 for weekdays (Mon through Fri) and weekends are designated as 0 (Sat and Sun). Each day of the week is given its own field which has a 1 if the record is in that day and a 0 if not. Holidays are set to 1 to indicate a US Federal holiday and 0 if not.
- Parameters:
df (pd.DataFrame) – Data frame for the target region.
- Returns:
[0] Formatted data frame [1] List of extended x_variables
- clean_data(df: DataFrame, drop_records: bool = True, iqr_scale_constant: float = 3.5) DataFrame [source]
Clean data based on criteria for handling NoData and extreme values.
- Parameters:
df (pd.DataFrame) – Input data frame for the target region.
drop_records (bool) – If True, drop records; else, alter records
iqr_scale_constant (float) – Scale factor controlling the sensitivity of the IQR to outliers
- Returns:
Processed data frame
- extract_targets_features(df) DataFrame [source]
Keep datetime, target, and feature fields.
- Parameters:
df (pd.DataFrame) – Input data frame for the target region.
- fetch_read_file() DataFrame [source]
Get the input file from the data directory matching the region name and read it into a pandas data frame.
- format_filter_data(df: DataFrame) DataFrame [source]
Format the input data file. Filter data by user provided date range and sort in ascending order by the timestamp.
- Parameters:
df (pd.DataFrame) – Data frame for the target region
- Returns:
Formatted data frame
- generate_data()[source]
Workhorse function to clean and format input data for use in the predictive model.
- iqr_outlier_detection(df: DataFrame, drop_records: bool = True, scale_constant: float = 3.5) DataFrame [source]
Outlier detection using interquartile range (IQR). Drops or adjusts outliers that are outside the acceptable range, NaN, or at or below 0.
- Parameters:
df (pd.DataFrame) – Input data frame for the target region.
drop_records (bool) – If True, drop records; else, alter records
scale_constant (float) – Scale factor controlling the sensitivity of the IQR to outliers
- Returns:
Processed data frame
- split_train_test(df: DataFrame)[source]
Split the data frame into test and training data based on a datetime.
- Parameters:
df (pd.DataFrame) – Input data frame for the target region.
- Returns:
[0] training data frame [1] testing data frame
- static update_default_settings(kwargs) dict
Read the default settings YAML file into a dictionary. Updates any settings passed in from via kwargs from the user.
- Parameters:
kwargs (dict) – Keyword argument dictionary from user.
- Returns:
A dictionary of updated default settings.
- update_hyperparameters()
Update hyperparameter values from defaults if the user does not provide them.
- class tell.mlp_prepare_data.DefaultSettings(region: str, data_dir: str, **kwargs)[source]
Bases:
object
Default settings for the MLP model. Updates any settings passed in from via kwargs from the user.
- Parameters:
region (str) – Indicating region / balancing authority we want to train and test on. Must match with string in CSV files.
data_dir (str) – Full path to the directory that houses the input CSV files.
mlp_hidden_layer_sizes (Optional[int]) – The ith element represents the number of neurons in the ith hidden layer.
mlp_max_iter (Optional[int]) – Maximum number of iterations. The solver iterates until convergence (determined by ‘tol’) or this number of iterations. For stochastic solvers (‘sgd’, ‘adam’), note that this determines the number of epochs (how many times each data point will be used), not the number of gradient steps.
mlp_validation_fraction (Optional[float]) – The proportion of training data to set aside as validation set for early stopping. Must be between 0 and 1.
data_column_rename_dict (Optional[dict[str]]) – Dictionary for the field names present in the input CSV file (keys) to what the code expects them to be (values).
expected_datetime_columns (Optional[list[str]]) – Expected names of the date time columns in the input CSV file.
hour_field_name (Optional[str]) – Field name of the hour field in the input CSV file.
month_field_name (Optional[str]) – Field name of the month field in the input CSV file.
year_field_name (Optional[str]) – Field name of the year field in the input CSV file.
x_variables (Optional[list[str]]) – Target variable list.
add_dayofweek_xvars (Optional[bool]) – True if the user wishes to add weekday and holiday targets to the x variables.
y_variables (Optional[list[str]]) – Feature variable list.
day_list (Optional[list[str]]) – List of day abbreviations and their order.
start_time (Optional[str]) – Timestamp showing the datetime of for the run to start (e.g., 2016-01-01 00:00:00).
end_time (Optional[str]) – Timestamp showing the datetime of for the run to end (e.g., 2019-12-31 23:00:00).
split_datetime (Optional[str]) – Timestamp showing the datetime to split the train and test data by (e.g., 2018-12-31 23:00:00).
seed_value (Optional[int]) – Seed value to reproduce randomization.
save_model (bool) – Choice to write ML models to a pickled file via joblib.
model_output_directory (Union[str, None]) – Full path to output directory where model file will be written.
save_prediction (bool) – Choice to write predictions to a .csv file
prediction_output_directory (Union[str, None]) – Full path to output directory where prediction files will be written.
verbose (bool) – Choice to see logged outputs.
- DATETIME_FIELD = 'Datetime'
- HOLIDAY_FIELD = 'Holidays'
- NODATA_VALUE = nan
- WEEKDAY_FIELD = 'Weekday'
tell.mlp_train module
- tell.mlp_train.train(region: str, data_dir: str, **kwargs)[source]
Generate predictions for MLP model for a target region from an input CSV file.
- Parameters:
region (str) – Indicating region / balancing authority we want to train and test on. Must match with string in CSV files.
data_dir (str) – Full path to the directory that houses the input CSV files.
mlp_hidden_layer_sizes (Optional[int]) – The ith element represents the number of neurons in the ith hidden layer.
mlp_max_iter (Optional[int]) – Maximum number of iterations. The solver iterates until convergence (determined by ‘tol’) or this number of iterations. For stochastic solvers (‘sgd’, ‘adam’), note that this determines the number of epochs (how many times each data point will be used), not the number of gradient steps.
mlp_validation_fraction (Optional[float]) – The proportion of training data to set aside as validation set for early stopping. Must be between 0 and 1.
data_column_rename_dict (Optional[dict[str]]) – Dictionary for the field names present in the input CSV file (keys) to what the code expects them to be (values).
expected_datetime_columns (Optional[list[str]]) – Expected names of the date time columns in the input CSV file.
hour_field_name (Optional[str]) – Field name of the hour field in the input CSV file.
month_field_name (Optional[str]) – Field name of the month field in the input CSV file.
x_variables (Optional[list[str]]) – Target variable list.
add_dayofweek_xvars (Optional[bool]) – True if the user wishes to add weekday and holiday targets to the x variables.
y_variables (Optional[list[str]]) – Feature variable list.
day_list (Optional[list[str]]) – List of day abbreviations and their order.
start_time (Optional[str]) – Timestamp showing the datetime of for the run to start (e.g., 2016-01-01 00:00:00).
end_time (Optional[str]) – Timestamp showing the datetime of for the run to end (e.g., 2019-12-31 23:00:00).
split_datetime (Optional[str]) – Timestamp showing the datetime to split the train and test data by (e.g., 2018-12-31 23:00:00).
nodata_value (Optional[int]) – No data value in the input CSV file.
seed_value (Optional[int]) – Seed value to reproduce randomization.
save_model (bool) – Choice to write ML models to a pickled file via joblib.
model_output_directory (Union[str, None]) – Full path to output directory where model file will be written.
verbose (bool) – Choice to see logged outputs.
- Returns:
[0] Predictions as a dataframe [1] Summary statistics as a dataframe
- tell.mlp_train.train_batch(target_region_list: list, data_dir: str, n_jobs: int = -1, **kwargs)[source]
Generate predictions for MLP model for a target region from an input CSV file.
- Parameters:
target_region_list (list) – List of names indicating region / balancing authority we want to train and test on. Must match with string in CSV files.
data_dir (str) – Full path to the directory that houses the input CSV files.
n_jobs (int) – The maximum number of concurrently running jobs, such as the number of Python worker processes when backend=”multiprocessing” or the size of the thread-pool when backend=”threading”. If -1 all CPUs are used. If 1 is given, no parallel computing code is used at all, which is useful for debugging. For n_jobs below -1, (n_cpus + 1 + n_jobs) are used. Thus for n_jobs = -2, all CPUs but one are used. None is a marker for ‘unset’ that will be interpreted as n_jobs=1 (sequential execution) unless the call is performed under a parallel_backend context manager that sets another value for n_jobs.
mlp_hidden_layer_sizes (Optional[int]) – The ith element represents the number of neurons in the ith hidden layer.
mlp_max_iter (Optional[int]) – Maximum number of iterations. The solver iterates until convergence (determined by ‘tol’) or this number of iterations. For stochastic solvers (‘sgd’, ‘adam’), note that this determines the number of epochs (how many times each data point will be used), not the number of gradient steps.
mlp_validation_fraction (Optional[float]) – The proportion of training data to set aside as validation set for early stopping. Must be between 0 and 1.
data_column_rename_dict (Optional[dict[str]]) – Dictionary for the field names present in the input CSV file (keys) to what the code expects them to be (values).
expected_datetime_columns (Optional[list[str]]) – Expected names of the date time columns in the input CSV file.
hour_field_name (Optional[str]) – Field name of the hour field in the input CSV file.
month_field_name (Optional[str]) – Field name of the month field in the input CSV file.
x_variables (Optional[list[str]]) – Target variable list.
add_dayofweek_xvars (Optional[bool]) – True if the user wishes to add weekday and holiday targets to the x variables.
y_variables (Optional[list[str]]) – Feature variable list.
day_list (Optional[list[str]]) – List of day abbreviations and their order.
start_time (Optional[str]) – Timestamp showing the datetime of for the run to start (e.g., 2016-01-01 00:00:00).
end_time (Optional[str]) – Timestamp showing the datetime of for the run to end (e.g., 2019-12-31 23:00:00).
split_datetime (Optional[str]) – Timestamp showing the datetime to split the train and test data by (e.g., 2018-12-31 23:00:00).
nodata_value (Optional[int]) – No data value in the input CSV file.
seed_value (Optional[int]) – Seed value to reproduce randomization.
save_model (bool) – Choice to write ML models to a pickled file via joblib.
model_output_directory (Union[str, None]) – Full path to output directory where model file will be written.
verbose (bool) – Choice to see logged outputs.
- tell.mlp_train.train_mlp_model(region: str, x_train: ndarray, y_train: ndarray, x_test: ndarray, mlp_hidden_layer_sizes: int, mlp_max_iter: int, mlp_validation_fraction: float, save_model: bool = False, model_output_directory: str | None = None) ndarray [source]
Trains the MLP model.
- Parameters:
region (str) – Indicating region / balancing authority we want to train and test on. Must match with string in CSV files.
x_train (np.ndarray) – Training features
y_train (np.ndarray) – Training targets
x_test (np.ndarray) – Test features
mlp_hidden_layer_sizes (int) – The ith element represents the number of neurons in the ith hidden layer.
mlp_max_iter (int) – Maximum number of iterations. The solver iterates until convergence (determined by ‘tol’) or this number of iterations. For stochastic solvers (‘sgd’, ‘adam’), note that this determines the number of epochs (how many times each data point will be used), not the number of gradient steps.
mlp_validation_fraction (float) – The proportion of training data to set aside as validation set for early stopping. Must be between 0 and 1.
save_model (bool) – Choice to write ML models to a pickled file via joblib.
model_output_directory (Union[str, None]) – Full path to output directory where model file will be written.
- Returns:
y_p: np.ndarray -> predictions over test set
tell.mlp_utils module
- tell.mlp_utils.denormalize_features(region: str, normalized_dict: dict, y_predicted_normalized: ndarray, y_comparison: ndarray, datetime_arr: ndarray) DataFrame [source]
Function to denormalize the predictions of the model.
- Parameters:
region (str) – Indicating region / balancing authority we want to train and test on. Must match with string in CSV files.
normalized_dict (dict) – Dictionary output from normalization function.
y_predicted_normalized (np.ndarray) – Normalized predictions over the test set.
y_comparison (np.ndarray) – Testing data to compare predictions to.
datetime_arr (np.ndarray) – Array of datetimes corresponding to the predictions.
- Returns:
Denormalized predictions
- tell.mlp_utils.evaluate(region: str, y_predicted: ndarray, y_comparison: ndarray) DataFrame [source]
Evaluation of model performance using the predicted compared to the test data.
- Parameters:
region (str) – Indicating region / balancing authority we want to train and test on. Must match with string in CSV files.
y_predicted (np.ndarray) – Predicted Y result array.
y_comparison (np.ndarray) – Comparison test data for Y array.
- Returns:
Data frame of stats.
- tell.mlp_utils.get_balancing_authority_to_model_dict()[source]
Return a list of balancing authority abbreviations.
- tell.mlp_utils.load_model(model_file: str) object [source]
Load pickled model from file using joblib. Version of scikit-learn is included in the file name as a compatible version is required to reload the data safely.
- Parameters:
model_file (str) – Full path with filename an extension to the joblib pickled model file.
- Returns:
Model as an object.
- tell.mlp_utils.load_normalization_dict(file: str) dict [source]
Load pickled model from file using joblib.
- Parameters:
file (str) – Full path with file name and extension to the pickled normalization dictionary
- Returns:
Normalization dictionary
- tell.mlp_utils.load_predictive_models(region: str, model_output_directory: str | None)[source]
Load predictive models and the normalization dictionary based off of what is stored in the package or from a user provided directory. The scikit-learn version being used must match the one the model was generated with.
- Parameters:
region (str) – Indicating region / balancing authority we want to train and test on. Must match with string in CSV files.
model_output_directory (Union[str, None]) – Full path to output directory where model file will be written.
- Returns:
[0] MLP model [1] normalization dictionary
- tell.mlp_utils.normalize_features(x_train: ndarray, x_test: ndarray, y_train: ndarray, y_test: ndarray) dict [source]
Normalize the features and targets of the model.
- Parameters:
x_train (np.ndarray) – Training features
x_test (np.ndarray) – Test features
y_train (np.ndarray) – Training targets
y_test (np.ndarray) – Training targets
- Returns:
Dictionary of scaled features
- tell.mlp_utils.normalize_prediction_data(data_arr: ndarray, min_train_arr: ndarray, max_train_arr: ndarray) ndarray [source]
Normalize target data using exising min, max for training data.
- Parameters:
data_arr (np.ndarray) – Array of target data
min_train_arr (np.ndarray) – Array of previously trained minimum target data
max_train_arr (np.ndarray) – Array of previously trained minimum target data
- tell.mlp_utils.pickle_model(region: str, model_object: object, model_name: str, model_output_directory: str | None)[source]
Pickle model to file using joblib. Version of scikit-learn is included in the file name as a compatible version is required to reload the data safely.
- Parameters:
region (str) – Indicating region / balancing authority we want to train and test on. Must match with string in CSV files.
model_object (object) – scikit-learn model object.
model_name (str) – Name of sklearn model.
model_output_directory (str) – Full path to output directory where model file will be written.
- tell.mlp_utils.pickle_normalization_dict(region: str, normalization_dict: dict, model_output_directory: str | None)[source]
Pickle model to file using joblib. Version of scikit-learn is included in the file name as a compatible version is required to reload the data safely.
- Parameters:
region (str) – Indicating region / balancing authority we want to train and test on. Must match with string in CSV files.
normalization_dict (dict) – Dictionary of normalization data
model_output_directory (str) – Full path to output directory where model file will be written.
tell.package_data module
tell.states_fips_function module
tell.visualization module
- tell.visualization.plot_ba_load_time_series(ba_to_plot: str, year_to_plot: str, gcam_target_year: str, scenario_to_plot: str, data_input_dir: str, image_output_dir: str, image_resolution: int, save_images=False)[source]
Plot the time series of load for a given Balancing Authority
- Parameters:
ba_to_plot (str) – Balancing Authority code for the BA you want to plot
year_to_plot (str) – Year you want to plot (valid 2039, 2059, 2079, 2099)
gcam_target_year (str) – Year to scale against the GCAM-USA annual loads
scenario_to_plot (str) – Scenario you want to plot
data_input_dir (str) – Top-level data directory for TELL
image_output_dir (str) – Directory to store the images
image_resolution (int) – Resolution at which you want to save the images in DPI
save_images (bool) – Set to True if you want to save the images after they’re generated
- tell.visualization.plot_ba_service_territory(ba_to_plot: str, year_to_plot: str, data_input_dir: str, image_output_dir: str, image_resolution: int, save_images=False)[source]
Plot maps of the service territory for a given BA in a given year
- Parameters:
ba_to_plot (str) – Code for the BA you want to plot
year_to_plot (str) – Year you want to plot (valid 2015-2019)
data_input_dir (str) – Top-level data directory for TELL
image_output_dir (str) – Directory to store the images
image_resolution (int) – Resolution at which you want to save the images in DPI
save_images (bool) – Set to True if you want to save the images after they’re generated
- tell.visualization.plot_ba_variable_correlations(ba_to_plot: str, data_input_dir: str, image_output_dir: str, image_resolution: int, save_images=False)[source]
Plot the correlation matrix between predictive variables and observed demand for individual or all BAs.
- Parameters:
ba_to_plot (str) – BA code for the BA you want to plot. Set to “All” to plot the average correlation across all BAs.
data_input_dir (str) – Top-level data directory for TELL
image_output_dir (str) – Directory to store the images
image_resolution (int) – Resolution at which you want to save the images in DPI
save_images (bool) – Set to True if you want to save the images after they’re generated
- tell.visualization.plot_mlp_ba_peak_week(prediction_df, ba_to_plot: str, image_output_dir: str, image_resolution: int, save_images=False)[source]
Plot the time-series of load during the peak week of the year for a given BA.
- Parameters:
prediction_df (df) – Prediction dataframe produced by the batch training of MLP models for all BAs
ba_to_plot (str) – Code for the BA you want to plot
image_output_dir (str) – Directory to store the images
image_resolution (int) – Resolution at which you want to save the images in DPI
save_images (bool) – Set to True if you want to save the images after they’re generated
- tell.visualization.plot_mlp_ba_time_series(prediction_df, ba_to_plot: str, image_output_dir: str, image_resolution: int, save_images=False)[source]
Plot the performance metrics for an individual BA
- Parameters:
prediction_df (df) – Prediction dataframe produced by the batch training of MLP models for all BAs
ba_to_plot (str) – Code for the BA you want to plot
image_output_dir (str) – Directory to store the images
image_resolution (int) – Resolution at which you want to save the images in DPI
save_images (bool) – Set to True if you want to save the images after they’re generated
- tell.visualization.plot_mlp_errors_vs_load(prediction_df, validation_df, image_output_dir: str, image_resolution: int, save_images=False)[source]
Plot the summary statistics of the MLP evaluation data as a function of mean load
- Parameters:
prediction_df (df) – Prediction dataframe produced by the batch training of MLP models for all BAs
validation_df (df) – Validation dataframe produced by the batch training of MLP models for all BAs
image_output_dir (str) – Directory to store the images
image_resolution (int) – Resolution at which you want to save the images in DPI
save_images (bool) – Set to True if you want to save the images after they’re generated
- tell.visualization.plot_mlp_summary_statistics(validation_df, image_output_dir: str, image_resolution: int, save_images=False)[source]
Plot the summary statistics of the MLP evaluation data across BAs
- Parameters:
validation_df – Validation dataframe produced by the batch training of MLP models for all BAs
image_output_dir (str) – Directory to store the images
image_resolution (int) – Resolution at which you want to save the images in DPI
save_images (bool) – Set to True if you want to save the images after they’re generated
- tell.visualization.plot_state_annual_total_loads(year_to_plot: str, gcam_target_year: str, scenario_to_plot: str, data_input_dir: str, image_output_dir: str, image_resolution: int, save_images=False)[source]
Plot annual total loads from both GCAM-USA and TELL
- Parameters:
year_to_plot (str) – Year you want to plot (valid 2039, 2059, 2079, 2099)
gcam_target_year (str) – Year to scale against the GCAM-USA annual loads
scenario_to_plot (str) – Scenario you want to plot
data_input_dir (str) – Top-level data directory for TELL
image_output_dir (str) – Directory to store the images
image_resolution (int) – Resolution at which you want to save the images in DPI
save_images (bool) – Set to True if you want to save the images after they’re generated
- tell.visualization.plot_state_load_duration_curve(state_to_plot: str, year_to_plot: str, gcam_target_year: str, scenario_to_plot: str, data_input_dir: str, image_output_dir: str, image_resolution: int, save_images=False)[source]
Plot the load duration curve for a given state
- Parameters:
state_to_plot (str) – State you want to plot
year_to_plot (str) – Year you want to plot (valid 2039, 2059, 2079, 2099)
gcam_target_year (str) – Year to scale against the GCAM-USA annual loads
scenario_to_plot (str) – Scenario you want to plot
data_input_dir (str) – Top-level data directory for TELL
image_output_dir (str) – Directory to store the images
image_resolution (int) – Resolution at which you want to save the images in DPI
save_images (bool) – Set to True if you want to save the images after they’re generated
- tell.visualization.plot_state_load_time_series(state_to_plot: str, year_to_plot: str, gcam_target_year: str, scenario_to_plot: str, data_input_dir: str, image_output_dir: str, image_resolution: int, save_images=False)[source]
Plot the time series of load for a given state
- Parameters:
state_to_plot (str) – State you want to plot
year_to_plot (str) – Year you want to plot (valid 2039, 2059, 2079, 2099)
gcam_target_year (str) – Year to scale against the GCAM-USA annual loads
scenario_to_plot (str) – Scenario you want to plot
data_input_dir (str) – Top-level data directory for TELL
image_output_dir (str) – Directory to store the images
image_resolution (int) – Resolution at which you want to save the images in DPI
save_images (bool) – Set to True if you want to save the images after they’re generated
- tell.visualization.plot_state_scaling_factors(year_to_plot: str, gcam_target_year: str, scenario_to_plot: str, data_input_dir: str, image_output_dir: str, image_resolution: int, save_images=False)[source]
Plot the scaling factor that force TELL annual total state loads to agree with GCAM-USA
- Parameters:
year_to_plot (str) – Year you want to plot (valid 2039, 2059, 2079, 2099)
gcam_target_year (str) – Year to scale against the GCAM-USA annual loads
scenario_to_plot (str) – Scenario you want to plot
data_input_dir (str) – Top-level data directory for TELL
image_output_dir (str) – Directory to store the images
image_resolution (int) – Resolution at which you want to save the images in DPI
save_images (bool) – Set to True if you want to save the images after they’re generated