Hypothesis Class Determines Explanation: Why Accurate Models Disagree on Feature Attribution

Thackshanaramana B

arXiv:2603.15821·cs.LG·March 18, 2026

Hypothesis Class Determines Explanation: Why Accurate Models Disagree on Feature Attribution

Thackshanaramana B

PDF

Open Access

TL;DR

This paper demonstrates that models with identical predictions can produce significantly different feature attributions depending on their hypothesis class, challenging assumptions in explainable AI and introducing a diagnostic tool for explanation stability.

Contribution

It reveals the structural influence of hypothesis class on explanation disagreement and introduces the Explanation Reliability Score R(x) to assess explanation stability.

Findings

01

Models within the same hypothesis class agree on explanations

02

Cross-class models show near-random explanation agreement

03

Hypothesis class determines explanation variability

Abstract

The assumption that prediction-equivalent models produce equivalent explanations underlies many practices in explainable AI, including model selection, auditing, and regulatory evaluation. In this work, we show that this assumption does not hold. Through a large-scale empirical study across 24 datasets and multiple model classes, we find that models with identical predictive behavior can produce substantially different feature attributions. This disagreement is highly structured: models within the same hypothesis class exhibit strong agreement, while cross-class pairs (e.g., tree-based vs. linear) trained on identical data splits show substantially reduced agreement, consistently near or below the lottery threshold. We identify hypothesis class as the structural driver of this phenomenon, which we term the Explanation Lottery. We theoretically show that the resulting Agreement Gap…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Artificial Intelligence in Healthcare and Education