Hypothesis Class Determines Explanation: Why Accurate Models Disagree on Feature Attribution
Thackshanaramana B

TL;DR
This paper demonstrates that models with identical predictions can produce significantly different feature attributions depending on their hypothesis class, challenging assumptions in explainable AI and introducing a diagnostic tool for explanation stability.
Contribution
It reveals the structural influence of hypothesis class on explanation disagreement and introduces the Explanation Reliability Score R(x) to assess explanation stability.
Findings
Models within the same hypothesis class agree on explanations
Cross-class models show near-random explanation agreement
Hypothesis class determines explanation variability
Abstract
The assumption that prediction-equivalent models produce equivalent explanations underlies many practices in explainable AI, including model selection, auditing, and regulatory evaluation. In this work, we show that this assumption does not hold. Through a large-scale empirical study across 24 datasets and multiple model classes, we find that models with identical predictive behavior can produce substantially different feature attributions. This disagreement is highly structured: models within the same hypothesis class exhibit strong agreement, while cross-class pairs (e.g., tree-based vs. linear) trained on identical data splits show substantially reduced agreement, consistently near or below the lottery threshold. We identify hypothesis class as the structural driver of this phenomenon, which we term the Explanation Lottery. We theoretically show that the resulting Agreement Gap…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Artificial Intelligence in Healthcare and Education
