Skip to content

back

PredictorConverter

class arsa_ml.converters.PredictorConverter(predictor, test_data, df_name, feature_imp_needed = True)


This subclass of the Converter abstract class, which is used to transform Autogluon trained TabularPredictor object into leaderboard and dictionaries that can be used to build the Rashomon Set.

Parameters


predictor : TabularPredictor
  Trained TabularPredictor AutoGluon object.
test_data : TabularDataset
  Test data for analysis in a TabularDataset format.
df_name : str
  The name of the dataset to be used while saving converted results.
feature_imp_needed : bool, default = True
  Whether to obtain feature importances from trained models or not. Can
result in a longer runtime of .convert() method.

Attributes


predictor : TabularPredictor
predictor parameter
test_data : TabularDataset
  test_data parameter
df_name : str
  df_name parameter
feature_imp_needed : bool
  feature_imp_needed parameter
metrics : list
  list of all evaluation metrics extracted from AutoGluon - multiclass or binary based on the predictor's problem_type attribute.
leaderboard : pd.DataFrame
  leaderboard created with the create_leaderboard() method consisting only of the selected evaluation metrics.

Methods


create_leaderboard()

  Creates a dataframe with all trained models and their evaluation metrics values obtained from predictor.

Returns :
leaderboard : pd.DataFrame

create_predictions_dict()

  Creates a dictionary with model names as keys and their class prediction vectors as values.

Returns :
predictions_dict : dict[str, pd.Series]


create_proba_predictions_dict()

  Creates a dictionary with model names as keys and their class probabilities predictions as values.

Returns :
proba_predictions_dict : dict[str, pd.Dataframe(n_observations, n_classes)]


create_feature_importance_dict()

  Creates a dictionary where the keys are model names and the values are lists of features sorted in descending order of importance, so that the most important feature appears first in each list.

Returns :
feature_importance_dict : dict[str, list]


extract_target_column()

  Extracts the target column from the test dataset using the .label attribute of the TabularPredictor object.

Returns :
y_true : pd.DataFrame


save_results(leaderboard, predictions_dict, proba_predictions_dict, feature_importance_dict, y_true, saving_path)

 Method for saving results from creating a leaderboard and all dictionaries on disk in .csv and .pickle formats.

Parameters :
leaderboard : pd.DataFrame
    created_ leaderboard to be saved as csv

predictions_dict : dict
    created predictions dict to be saved as pickle

proba_predictions_dict : dict
    created proba predictions dict to be saved as pickle

feature_importance_dict : dict
    created feature importance dict to be saved as pickle

y_true : pd.DataFrame
    extracted target column to be saved as a csv

saving_path : Path
    path to a directory where results should be saved, if not specified the default of timestamp + df_name is used to create a new directory


convert(saving_path)

  Final method used to create leaderboard, predictions_dict, proba_predictions dict and feature_importance_dict and save the results using save_results() method. If feature_imp_needed parameter is False, feature_importance_dict is not created and the method returns NaN as its value.

Parameters :
saving_path : Path
    path to a directory where results should be saved, if not specified the default of timestamp + df_name is used to create a new directory

Returns :
leaderboard : pd.DataFrame
    created_ leaderboard using create_leaderboard() method

predictions_dict : dict[str, pd.Series]
    created predictions dict created using create_predictions_dict() method

proba_predictions_dict : dict[str, pd.DataFrame]
    created proba predictions dict created using create_proba_predictions_dict() method

feature_importance_dict : dict[str, list]
    created feature importance dict created using create_feature_importance_dict() method

y_true : pd.DataFrame
    extracted target column using extract_target_column() method