RashomonIntersection

class arsa_ml.rashomon_intersection.RashomonIntersection(leaderboard, predictions, proba_predictions, feature_importances, metrics, epsilon, custom_weights, weighted_sum_method)

This class is used for calculating Rashomon Set Intersection for two Rashomon Sets based on two different metrics. The class inherits from RashomonSet, providing access to all metrics and attriutes implemented in the RashomonSet class, only modifies the selection of Rashomon Sets and the base model.

Parameters

leaderboard : pd.DataFrame
Leaderboard in a DataFrame format consisting of models and their evaluation metrics scores.(Returned by converters)
predictions : dict
Dictionary with model names as keys and class prediction vectors as values. (Returned by converters)
proba_predictions : dict
Dictionary with model names as keys and class probabilities prediction DataFrames as values. (Returned by converters)
metrics : str
List containing two distinct evaluation metrics used to compute Rashomon Sets and determine their intersection. Evaluation metrics are used as a primary value for sorting model performances. Allowed metrics are specified in METRICS attribute.
epsilon : float
Epsilon parameter specifying the allowable deviation from the best score for each evaluation metric, within which models are included in their respective Rashomon Sets and used to compute their intersection.
custom_weights : list
List with two elements specifying custom weights when searching for base model. If weighted_sum is 'custom_weights' then the user must specify weights in 2-element list with elements that sum to 1, if not leave it at default value = None.
weighted_sum_method : str
Parameter that specifies the method or weights for selecting the base model. Options are:

None or 'entropy' (default): Uses an entropy-based weighting method to select the base model.
'critic': Uses the CRITIC method to determine weights for selecting the base model.
'custom_weights': Allows manual specification of weights as a 2-element list whose values sum to 1.

Attributes

base_model : str
Name of the model selected based on the weighted sum of the two evaluation metrics.
all_rashomon_sets : list
A dictionary where each key is a metric and each value is the corresponding Rashomon set for that metric.
rashomon_set : list
Names of the models that are included in the Rashomon Set Intersection for given parameters. Obtained from get_rashomon_set() method. If the size of the set happens to contain less than 2 models, the constructor throws ValueError and asks to specify different parameters such as metrics and epsilon.
rashomon_predictions : dict
A subset of predictions dict parameter containing only models that are included in the Rashomon Set Intersection.
rashomon_proba_predictions : dict
A subset of proba_predictions dict parameter containing only models that are included in the Rashomon Set Intersection.
metrics : list
List of evaluation metrics used to compute the intersection of two Rashomon Sets, each calculated based on one of the metrics.
weights : pd.Series
Pandas Series with weights either computed by selected method or passed by user as init parameter.
leaderboard : pd.DataFrame
User parameter value.
predictions : dict
User parameter value.
proba_predictions : dict
User parameter value.
metrics : str
User parameter value.
epsilon : float
User parameter value.
custom_weights : list
User parameter value
weighted_sum_method : list
User parameter value

Methods

get_rashomon_set_for_metric(metric, epsilon)

Method for getting Rashomon set for a given metric. Parameter metric specifies metric for which Rashomon set is to be calculated. If the value of epsilon is not specified, uses self.epsilon value. Returns a list of models that are in the Rashomon set for the given metric.

Parameters :
metric: str
epsilon: float, default = None

Returns :
rashomon_models_names : list

find_rashomon_intersection(epsilon)

Method for calculating the intersection of Rashomon sets by identifying models that are present in the Rashomon set for each metric and the given epsilon. If the value of epsilon is not specified, uses self.epsilon value. Returns a list of model names present in the Rashomon Intersection. If the intersection is empty, returns an empty list.

Parameters :
epsilon: float, default = None

Returns :
rashomon_models_names : list

get_rashomon_set(epsilon)

Method that returns the names of models included in the Rashomon Intersection for a specified epsilon and two metrics. If the value of epsilon is not specified, uses self.epsilon value. This method overrides the implementation from the RashomonSet class to use intersection models instead of single-metric models.

Parameters :
epsilon: float, default = None

Returns :
rashomon_intersection : list

get_rashomon_predictions()

Override method to use intersection models, not single metric models. Returns predictions and proba predictions for models present in Rashomon Intersection.

Returns :
rashomon_predictions : dict
rashomon_proba_predictions : dict

get_rashomon_feature_importances()

Override method to select feature importance information for only models present in the Rashomon Intersection.

Returns :
rashomon_importances : dict

find_base_model(weight1, weight2)

Method for finding base_model based on user passed weight values. Returns model name which maximizes value of w1 * metrics[0] + w2 * metric[1]. If many models have the same sum value, base model is the first one in the leaderboard. This method overrides the implementation from the RashomonSet class to use intersection models instead of single-metric models.
Note: Weight parameters must sum to 1.

Parameters :
weight1: float
weight2: float

Returns :
best_model : str

find_worst_model()

Method for finding the worst model based on user passed weight values. Returns model name which minimizes value of w1 * metrics[0] + w2 * metric[1]. If many models have the same sum value, the worst model is the first one in the leaderboard. This method overrides the implementation from the RashomonSet class to use intersection models instead of single-metric models.

Returns :
worst_model : str

find_same_score_as_base()

Method used to find models in the Rashomon Intersection set that produce the same weighted sum of metrics values as base model. Returns number of models found with the same score as base model and list of their names. This method overrides the implementation from the RashomonSet class to use intersection models instead of single-metric models.

Returns :
same_scores_count : int
same_scores_models : list

find_weights_entropy_based()

Method for finding weights based on the entropy method. Returns weights as a pandas Series with index as metric names and values as weights. The weights should satisfy the constraint that they sum to 1. Entropy based weights are defined as:

\[ { w_i := \frac{1 - e(x_i)}{\sum_{i=1}^m(1-e(x_i))}} \]

where

\[ { e(x_i) := - \frac{\sum_{i=1}^mf_iln(f_i)}{ln(m)}} \]

In the context of calculating the weights of two base_metrics, \(f_i\) is the scaled metric value and m is the number of models in the Rashomon Intersection. Then entropy is calculated for both base_metrics as \(e(x_i)\) where \(x_1\) is th first metric and \(x_2\) the second one.
Read more about the entropy method : Effectiveness of Entropy Weight Method in Decision-Making

Returns :
weights : pd.Series

find_base_model_entropy_based()

Method for finding base_model based on entropy method. Finds the base model by computing the weighted sum of the two metrics using entropy-based weights and selecting the model with the highest score. If many models have the same sum value, base model is the first one in the leaderboard.

Returns :
best_model : str

find_weights_critic_method()

Method for finding weights based on the CRITIC (Criteria Importance Through Intercriteria Correlation) method. Returns weights as a pandas Series with index as metric names and values as weights. The weights should satisfy the constraint that they sum to 1. CRITIC based weights are defined as:

\[ \displaystyle {w_j = \frac{c_j}{\sum_{k=1}^{n} c_k}, \quad j = 1,\dots,n} \]

where

\[ \displaystyle {c_j = \sigma_j \sum_{k=1}^{n} (1 - \rho_{jk}), \quad j = 1,\dots,n} \]

\(\sigma_j\) denotes standard deviation of the normalized values of criterion \(j\) across all alternatives and \(\rho_{jk}\) is Pearson correlation coefficient between criteria \(j\) and \(k\). In case of finding weights based on two base metrics \(n=2\).

Note: This method involves computing the correlation matrix and standard deviations of the metrics. If CRITIC cannot be applied due to insufficient variation caused by identical metric values, or other numerical issues, it falls back to the entropy-based method.

Returns :
weights : pd.Series

find_base_model_critic_based()

Method for finding base_model based on CRITIC method. Finds the base model by computing the weighted sum of the two metrics using CRITIC-based weights and selecting the model with the highest score. If many models have the same sum value, base model is the first one in the leaderboard.

Returns :
best_model : str

get_rashomon_metrics(delta)

Method for getting all Rashomon Intersection metrics in a dictionary format. Delta parameter is a threshold for probabilistic ambiguity and discrepancy. Returns all numeric metrics and information on Rashomon intersection (not including VPRs and Agreement Rates). This method overrides the implementation from the RashomonSet class to use intersection models instead of single-metric models.

Parameters :
delta: float, default = 0.1

Returns :
metrics : dict

summarize_rashomon(delta)

Method for printing all calculated metrics for Rashomon Intersection in a structured format. This method overrides the implementation from the RashomonSet class to use intersection models instead of single-metric models.

Parameters :
delta: float, default = 0.1