BuildRashomonIntersectionH2O
This is a subclass of a Builder abstract class, which provides pipeline for creating and exploring the Rashomon Intersection from user-provided models from H2O framework.
Example usage of this pipeline can be found at demo_notebooks/H2O_pipeline.ipynb.
models_directory : Path
Path to a folder where all models saved from H2O output are stored.
test_data : h2o.H2OFrame
Test dataset for evaluation, must be converted to h20.H2OFrame object.
Note 1: It is crucial to stratify your data while performing the train test split, so that there are no classes in test data that were not present in the train set.
Note 2 (for binary classification task): It is crucial to convert binary target column labels into 0 for negative class and 1 for positive class. Otherwise, some evaluation metrics may not be calculated correctly.
target_column : str
Name of the target column for the classification task. Used to determine whether the task type is binary or multiclass.
df_name : str
Name of the dataset used for saving the data generated by H2O_Converter output.
metrics : list
List of two metrics to be used in the intersection calculation.
weighted_sum_method : str
Specifies the method for selecting the base model for the Rashomon Intersection. Options are:
Noneor'entropy'(default): selects the base model using an entropy-based method.'critic': selects the base model using a critic-based method.'custom_weights': selects the base model using weights provided by the user.
weighted_sum_method is set to 'custom_weights'. User must specify weights in 2-element list.
delta : float Delta parameter for probabilistic ambiguity and discrepancy (used only for binary task type).
If not specified the default value of 0.1 will be used.
feature_imp_needed : bool Boolean value specifying whether feature importance computation is required.
Defaults to True; set to False to skip feature importance computation for faster execution.
converter_results_directory : Path Path to the directory where the converter outputs will be saved.
If
None, a default directory converter_results/df_name_timestamp will be automatically created in the same directory as models_directory, where timestamp corresponds to the creation time of the converted outputs.
- Initialization (
__init__) – Convert H2O models to format for Rashomon Intersection analysis. - Preview Rashomon (
preview_rashomon()) – Visualize leaderboard and Rashomon Intersection sizes to guide epsilon selection. - Set Epsilon (
set_epsilon()) – Set epsilon parameter value to be used when creating Rashomon Intersection. - Build Pipeline (
build()) – Create RashomonIntersection, IntersectionVisualizer objects and launch the Streamlit dashboard. - Interactive Analysis – Explore plots via IntersectionVisualizer object or the dashboard.
- Close Dashboard (
dashboard_close()) – Stop Streamlit processes.
During initialization, the pipeline converts the user's input models into the internal format specified by H2O_Converter class object.
All processed results are stored in the specified directory or in a default directory created based on the dataset name and timestamp (see converter_results_directory parameter).
Method illustrating the leaderboard and the plot with all possible epsilon values and the Rashomon Intersection sizes for different epsilon values.
Should be called to guide the selection of an appropriate epsilon threshold.
Method for visualising Rashomon Intersection size depending on different epsilon values.
epsilon)
Sets the epsilon parameter value to be used when constructing the Rashomon Intersection object.
Epsilon value must be set before calling build() method.
Parameters :
epsilon : float
launch_dashboard)
Builds the Rashomon Intersection pipeline from H2O models.
Creates Rashomon Intersection object and Intersection Visualizer object from user's input and launches a Streamlit dashboard in a subprocess for interactive visualization.
This method performs the following steps:
- Validates that the epsilon threshold has been set using
set_epsilon()method. - Creates the
RashomonIntersectionobject based on the leaderboard, predictions, probability predictions, feature importances, metrics, methods to select base model, and epsilon threshold.
All inputs required to build the Rashomon Intersection object are extracted from the user's data, which is processed and converted during the pipeline initialization. - Initializes the
IntersectionVisualizerobject for interactive analysis of the Rashomon Intersection.
Individual plots can later be generated directly from theIntersectionVisualizerobject. - If
launch_dashboardis set to True (default) it generates plots for analysis depending on the task type (binary or multiclass), and stores them temporarily with their descriptions for the Streamlit dashboard. It closes any previous Streamlit processes to avoid conflicts. - If
launch_dashboardis set to True (default) it launches the Streamlit dashboard in a subprocess on the local machine (localhost), allowing interactive exploration of the Rashomon Intersection properties without blocking the main workflow.
Returns :
rashomon_set : RashomonIntersection
visualizer : IntersectionVisualizer
Method for stopping all Streamlit processes and closing the dashboard.
Note: Always call this method after finishing the analysis to ensure the dashboard is properly closed.