BuildRashomonAutogluon

class arsa_ml.pipelines.pipelines_user_input.BuildRashomonAutogluon(predictor, test_data, df_name, base_metric, delta, feature_imp_needed, converter_results_directory)

This is a subclass of a Builder abstract class, which provides pipeline for creating and exploring the Rashomon Set from user-provided models from AutoGluon framework.

Example usage of this pipeline can be found at demo_notebooks/AutoGluon_pipeline.ipynb.

Parameters

predictor : TabularPredictor
Trained AutoGluon TabularPredictor object containing all models and training results.

test_data : TabularDataset
Test dataset for evaluation, must be converted to TabularDataset object.
Note 1: It is crucial to stratify your data while performing the train test split, so that there are no classes in test data that were not present in the train set.
Note 2 (for binary classification task): It is crucial to convert binary target column labels into 0 for negative class and 1 for positive class. Otherwise, some evaluation metrics may not be calculated correctly.

df_name : str
Name of the dataset used for saving the data generated by PredictorConverter output.

base_metric : str
Evaluation metric to be used as a primary value for sorting model performances and constructing the Rashomon Set.

delta : float
Delta parameter for probabilistic ambiguity and discrepancy (used only for binary task type).
If not specified the default value of 0.1 will be used.

feature_imp_needed : bool
Boolean value specifying whether feature importance computation is required.
Defaults to True; set to False to skip feature importance computation for faster execution.

converter_results_directory : Path
Path to the directory where the converter outputs will be saved.
If None, a default directory current_working_directory/df_name_timestamp will be automatically created, where timestamp corresponds to the creation time of the converted outputs.

BuildRashomonAutogluon Pipeline – Key Steps

Initialization (__init__) – Convert AutoGluon TabularPredictor object and test data to format for Rashomon Set analysis.
Preview Rashomon (preview_rashomon()) – Visualize leaderboard and Rashomon Set sizes to guide epsilon selection.
Set Epsilon (set_epsilon()) – Set epsilon parameter value to be used when creating Rashomon Set.
Build Pipeline (build()) – Create RashomonSet, Visualizer objects and launch the Streamlit dashboard.
Interactive Analysis – Explore plots via Visualizer object or the dashboard.
Close Dashboard (dashboard_close()) – Stop Streamlit processes.

Pipeline Initialization

During initialization, the pipeline converts the user's input models into the internal format specified by PredictorConverter class object.
All processed results are stored in the specified directory or in a default directory (see converter_results_directory parameter).

Methods

preview_rashomon()

Method illustrating the leaderboard and the plot with all possible epsilon values and the Rashomon Set sizes for different epsilon values.
Should be called to guide the selection of an appropriate epsilon threshold.

visualize_rashomon_set_volume()

Method for visualising Rashomon set size depending on different epsilon values.

set_epsilon(epsilon)

Sets the epsilon parameter value to be used when constructing the Rashomon Set object.
Epsilon value must be set before calling build() method.

Parameters :
epsilon : float

build(launch_dashboard)

Builds the Rashomon Set pipeline from AutoGluon output.
Creates Rashomon Set object and Visualizer object from user's input and launches a Streamlit dashboard in a subprocess for interactive visualization.

This method performs the following steps:

Validates that the epsilon threshold has been set using set_epsilon() method.

Creates the RashomonSet object based on the leaderboard, predictions, probability predictions, feature importances, base metric, and epsilon threshold.
All inputs required to build the Rashomon Set object are extracted from the user's data, which is processed and converted during the pipeline initialization.

Initializes the Visualizer object for interactive analysis of the Rashomon Set.
Individual plots can later be generated directly from the Visualizer object.

If launch_dashboard is set to True (default) it generates plots for analysis depending on the task type (binary or multiclass), and stores them temporarily with their descriptions for the Streamlit dashboard. It closes any previous Streamlit processes to avoid conflicts.

If launch_dashboard is set to True (default) it launches the Streamlit dashboard in a subprocess on the local machine (localhost), allowing interactive exploration of the Rashomon Set properties without blocking the main workflow.

Returns :
rashomon_set : RashomonSet
visualizer : Visualizer

dashboard_close()

Method for stopping all Streamlit processes and closing the dashboard.
Note: Always call this method after finishing the analysis to ensure the dashboard is properly closed.