Skip to content

back

BuildRashomonIntersectionH2O

class arsa_ml.pipelines.pipelines_user_input.BuildRashomonIntersectionH2O(models_directory, test_data, target_column, df_name, metrics, custom_weights, weighted_sum_method, delta, feature_imp_needed, converter_results_directory)


This is a subclass of a Builder abstract class, which provides pipeline for creating and exploring the Rashomon Intersection from user-provided models from H2O framework.

Example usage of this pipeline can be found at demo_notebooks/H2O_pipeline.ipynb.


Parameters


models_directory : Path
  Path to a folder where all models saved from H2O output are stored.

test_data : h2o.H2OFrame
  Test dataset for evaluation, must be converted to h20.H2OFrame object.
  Note 1: It is crucial to stratify your data while performing the train test split, so that there are no classes in test data that were not present in the train set.
  Note 2 (for binary classification task): It is crucial to convert binary target column labels into 0 for negative class and 1 for positive class. Otherwise, some evaluation metrics may not be calculated correctly.

target_column : str
  Name of the target column for the classification task. Used to determine whether the task type is binary or multiclass.

df_name : str
  Name of the dataset used for saving the data generated by H2O_Converter output.

metrics : list
  List of two metrics to be used in the intersection calculation.

weighted_sum_method : str
  Specifies the method for selecting the base model for the Rashomon Intersection. Options are:

  • None or 'entropy' (default): selects the base model using an entropy-based method.
  • 'critic': selects the base model using a critic-based method.
  • 'custom_weights': selects the base model using weights provided by the user.
custom_weights : list
  Specifies the weights for base model selection when weighted_sum_method is set to 'custom_weights'. User must specify weights in 2-element list.

delta : float
  Delta parameter for probabilistic ambiguity and discrepancy (used only for binary task type).
  If not specified the default value of 0.1 will be used.

feature_imp_needed : bool
  Boolean value specifying whether feature importance computation is required.
  Defaults to True; set to False to skip feature importance computation for faster execution.

converter_results_directory : Path
  Path to the directory where the converter outputs will be saved.
  If None, a default directory converter_results/df_name_timestamp will be automatically created in the same directory as models_directory, where timestamp corresponds to the creation time of the converted outputs.


BuildRashomonIntersectionH2O Pipeline – Key Steps


  1. Initialization (__init__) – Convert H2O models to format for Rashomon Intersection analysis.
  2. Preview Rashomon (preview_rashomon()) – Visualize leaderboard and Rashomon Intersection sizes to guide epsilon selection.
  3. Set Epsilon (set_epsilon()) – Set epsilon parameter value to be used when creating Rashomon Intersection.
  4. Build Pipeline (build()) – Create RashomonIntersection, IntersectionVisualizer objects and launch the Streamlit dashboard.
  5. Interactive Analysis – Explore plots via IntersectionVisualizer object or the dashboard.
  6. Close Dashboard (dashboard_close()) – Stop Streamlit processes.


Pipeline Initialization


  During initialization, the pipeline converts the user's input models into the internal format specified by H2O_Converter class object.
  All processed results are stored in the specified directory or in a default directory created based on the dataset name and timestamp (see converter_results_directory parameter).

Methods


preview_rashomon()

  Method illustrating the leaderboard and the plot with all possible epsilon values and the Rashomon Intersection sizes for different epsilon values.
  Should be called to guide the selection of an appropriate epsilon threshold.

visualize_rashomon_set_volume()

  Method for visualising Rashomon Intersection size depending on different epsilon values.

set_epsilon(epsilon)

  Sets the epsilon parameter value to be used when constructing the Rashomon Intersection object.
Epsilon value must be set before calling build() method.

Parameters :
epsilon : float

build(launch_dashboard)

 Builds the Rashomon Intersection pipeline from H2O models.
  Creates Rashomon Intersection object and Intersection Visualizer object from user's input and launches a Streamlit dashboard in a subprocess for interactive visualization.

  This method performs the following steps:

  • Validates that the epsilon threshold has been set using set_epsilon() method.

  • Creates the RashomonIntersection object based on the leaderboard, predictions, probability predictions, feature importances, metrics, methods to select base model, and epsilon threshold.
    All inputs required to build the Rashomon Intersection object are extracted from the user's data, which is processed and converted during the pipeline initialization.

  • Initializes the IntersectionVisualizer object for interactive analysis of the Rashomon Intersection.
    Individual plots can later be generated directly from the IntersectionVisualizer object.

  • If launch_dashboard is set to True (default) it generates plots for analysis depending on the task type (binary or multiclass), and stores them temporarily with their descriptions for the Streamlit dashboard. It closes any previous Streamlit processes to avoid conflicts.

  • If launch_dashboard is set to True (default) it launches the Streamlit dashboard in a subprocess on the local machine (localhost), allowing interactive exploration of the Rashomon Intersection properties without blocking the main workflow.


Returns :
rashomon_set : RashomonIntersection
visualizer : IntersectionVisualizer

dashboard_close()

  Method for stopping all Streamlit processes and closing the dashboard.
  Note: Always call this method after finishing the analysis to ensure the dashboard is properly closed.