BuildRashomonH2O
This is a subclass of a Builder abstract class, which provides pipeline for creating and exploring the Rashomon Set from user-provided models from H2O framework.
Example usage of this pipeline can be found at demo_notebooks/H2O_pipeline.ipynb.
models_directory : Path
Path to a folder where all models saved from H2O output are stored.
test_data : h2o.H2OFrame
Test dataset for evaluation, must be converted to h20.H2OFrame object.
Note 1: It is crucial to stratify your data while performing the train test split, so that there are no classes in test data that were not present in the train set.
Note 2 (for binary classification task): It is crucial to convert binary target column labels into 0 for negative class and 1 for positive class. Otherwise, some evaluation metrics may not be calculated correctly.
target_column : str
Name of the target column for the classification task. Used to determine whether the task type is binary or multiclass.
df_name : str
Name of the dataset used for saving the data generated by H2O_Converter output.
base_metric : str
Evaluation metric to be used as a primary value for sorting model performances and constructing the Rashomon Set.
delta : float
Delta parameter for probabilistic ambiguity and discrepancy (used only for binary task type).
If not specified the default value of 0.1 will be used.
feature_imp_needed : bool
Boolean value specifying whether feature importance computation is required.
Defaults to True; set to False to skip feature importance computation for faster execution.
converter_results_directory : Path
Path to the directory where the converter outputs will be saved.
If None, a default directory converter_results/df_name_timestamp will be automatically created in the same directory as models_directory, where timestamp corresponds to the creation time of the converted outputs.
- Initialization (
__init__) – Convert H2O models to format for Rashomon Set analysis. - Preview Rashomon (
preview_rashomon()) – Visualize leaderboard and Rashomon Set sizes to guide epsilon selection. - Set Epsilon (
set_epsilon()) – Set epsilon parameter value to be used when creating Rashomon Set. - Build Pipeline (
build()) – Create RashomonSet, Visualizer objects and launch the Streamlit dashboard. - Interactive Analysis – Explore plots via Visualizer object or the dashboard.
- Close Dashboard (
dashboard_close()) – Stop Streamlit processes.
During initialization, the pipeline converts the user's input models into the internal format specified by H2O_Converter class object.
All processed results are stored in the specified directory or in a default directory created based on the dataset name and timestamp (see converter_results_directory parameter).
Method illustrating the leaderboard and the plot with all possible epsilon values and the Rashomon Set sizes for different epsilon values.
Should be called to guide the selection of an appropriate epsilon threshold.
Method for visualising Rashomon set size depending on different epsilon values.
epsilon)
Sets the epsilon parameter value to be used when constructing the Rashomon Set object.
Epsilon value must be set before calling build() method.
Parameters :
epsilon : float
launch_dashboard)
Builds the Rashomon Set pipeline from H2O models.
Creates Rashomon Set object and Visualizer object from user's input and launches a Streamlit dashboard in a subprocess for interactive visualization.
This method performs the following steps:
- Validates that the epsilon threshold has been set using
set_epsilon()method. - Creates the
RashomonSetobject based on the leaderboard, predictions, probability predictions, feature importances, base metric, and epsilon threshold.
All inputs required to build the Rashomon Set object are extracted from the user's data, which is processed and converted during the pipeline initialization. - Initializes the
Visualizerobject for interactive analysis of the Rashomon Set.
Individual plots can later be generated directly from theVisualizerobject. - If
launch_dashboardis set to True (default) it generates plots for analysis depending on the task type (binary or multiclass), and stores them temporarily with their descriptions for the Streamlit dashboard. It closes any previous Streamlit processes to avoid conflicts. - If
launch_dashboardis set to True (default) launches the Streamlit dashboard in a subprocess on the local machine (localhost), allowing interactive exploration of the Rashomon Set properties without blocking the main workflow.
Returns :
rashomon_set : RashomonSet
visualizer : Visualizer
Method for stopping all Streamlit processes and closing the dashboard.
Note: Always call this method after finishing the analysis to ensure the dashboard is properly closed.