PredictorConverter

class arsa_ml.converters.PredictorConverter(predictor, test_data, df_name, feature_imp_needed = True)

This subclass of the Converter abstract class, which is used to transform Autogluon trained TabularPredictor object into leaderboard and dictionaries that can be used to build the Rashomon Set.

Parameters

predictor : TabularPredictor
Trained TabularPredictor AutoGluon object.
test_data : TabularDataset
Test data for analysis in a TabularDataset format.
df_name : str
The name of the dataset to be used while saving converted results.
feature_imp_needed : bool, default = True
Whether to obtain feature importances from trained models or not. Can
result in a longer runtime of .convert() method.

Attributes

predictor : TabularPredictor
predictor parameter
test_data : TabularDataset
test_data parameter
df_name : str
df_name parameter
feature_imp_needed : bool
feature_imp_needed parameter
metrics : list
list of all evaluation metrics extracted from AutoGluon - multiclass or binary based on the predictor's problem_type attribute.
leaderboard : pd.DataFrame
leaderboard created with the create_leaderboard() method consisting only of the selected evaluation metrics.

Methods

create_leaderboard()

Creates a dataframe with all trained models and their evaluation metrics values obtained from predictor.

Returns :
leaderboard : pd.DataFrame

create_predictions_dict()

Creates a dictionary with model names as keys and their class prediction vectors as values.

Returns :
predictions_dict : dict[str, pd.Series]

create_proba_predictions_dict()

Creates a dictionary with model names as keys and their class probabilities predictions as values.

Returns :
proba_predictions_dict : dict[str, pd.Dataframe(n_observations, n_classes)]

create_feature_importance_dict()

Creates a dictionary where the keys are model names and the values are lists of features sorted in descending order of importance, so that the most important feature appears first in each list.

Returns :
feature_importance_dict : dict[str, list]

extract_target_column()

Extracts the target column from the test dataset using the .label attribute of the TabularPredictor object.

Returns :
y_true : pd.DataFrame

save_results(leaderboard, predictions_dict, proba_predictions_dict, feature_importance_dict, y_true, saving_path)

Method for saving results from creating a leaderboard and all dictionaries on disk in .csv and .pickle formats.

Parameters :
leaderboard : pd.DataFrame
created_ leaderboard to be saved as csv

predictions_dict : dict
created predictions dict to be saved as pickle

proba_predictions_dict : dict
created proba predictions dict to be saved as pickle

feature_importance_dict : dict
created feature importance dict to be saved as pickle

y_true : pd.DataFrame
extracted target column to be saved as a csv

saving_path : Path
path to a directory where results should be saved, if not specified the default of timestamp + df_name is used to create a new directory

convert(saving_path)

Final method used to create leaderboard, predictions_dict, proba_predictions dict and feature_importance_dict and save the results using save_results() method. If feature_imp_needed parameter is False, feature_importance_dict is not created and the method returns NaN as its value.

Parameters :
saving_path : Path
path to a directory where results should be saved, if not specified the default of timestamp + df_name is used to create a new directory

Returns :
leaderboard : pd.DataFrame
created_ leaderboard using create_leaderboard() method

predictions_dict : dict[str, pd.Series]
created predictions dict created using create_predictions_dict() method

proba_predictions_dict : dict[str, pd.DataFrame]
created proba predictions dict created using create_proba_predictions_dict() method

feature_importance_dict : dict[str, list]
created feature importance dict created using create_feature_importance_dict() method

y_true : pd.DataFrame
extracted target column using extract_target_column() method