An Empirical Evaluation of the Rashomon Effect in Explainable Machine Learning
Sebastian M\"uller, Vanessa Toborek, Katharina Beckh, Matthias Jakobs,, Christian Bauckhage, Pascal Welke

TL;DR
This paper empirically investigates the Rashomon Effect in explainable machine learning, revealing how multiple models with similar performance can have diverse explanations, influenced by hyperparameters and metric choices.
Contribution
It offers a comprehensive empirical analysis of the Rashomon Effect's impact on explainability, highlighting the importance of hyperparameter tuning and metric selection.
Findings
Hyperparameter tuning influences explanation diversity
Metric choice significantly affects explanation comparability
Multiple models can produce similar performance but different explanations
Abstract
The Rashomon Effect describes the following phenomenon: for a given dataset there may exist many models with equally good performance but with different solution strategies. The Rashomon Effect has implications for Explainable Machine Learning, especially for the comparability of explanations. We provide a unified view on three different comparison scenarios and conduct a quantitative evaluation across different datasets, models, attribution methods, and metrics. We find that hyperparameter-tuning plays a role and that metric selection matters. Our results provide empirical support for previously anecdotal evidence and exhibit challenges for both scientists and practitioners.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Forecasting Techniques and Applications · Stock Market Forecasting Methods
