EXAGREE: Mitigating Explanation Disagreement with Stakeholder-Aligned Models
Sichao Li, Tommy Liu, Quanling Deng, Amanda S. Barnard

TL;DR
EXAGREE is a framework that resolves explanation disagreements in machine learning models by selecting stakeholder-aligned explanations, improving faithfulness, plausibility, and fairness without sacrificing accuracy.
Contribution
It introduces a novel two-stage method combining differentiable attribution and sorting to select models aligned with stakeholder explanations, addressing explanation conflict issues.
Findings
Improves faithfulness, plausibility, and fairness over baselines
Maintains task accuracy while enhancing explanation quality
Demonstrates robustness across six real-world datasets
Abstract
Conflicting explanations, arising from different attribution methods or model internals, limit the adoption of machine learning models in safety-critical domains. We turn this disagreement into an advantage and introduce EXplanation AGREEment (EXAGREE), a two-stage framework that selects a Stakeholder-Aligned Explanation Model (SAEM) from a set of similar-performing models. The selection maximizes Stakeholder-Machine Agreement (SMA), a single metric that unifies faithfulness and plausibility. EXAGREE couples a differentiable mask-based attribution network (DMAN) with monotone differentiable sorting, enabling gradient-based search inside the constrained model space. Experiments on six real-world datasets demonstrate simultaneous gains of faithfulness, plausibility, and fairness over baselines, while preserving task accuracy. Extensive ablation studies, significance tests, and case…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Scientific Computing and Data Management · Machine Learning in Healthcare
MethodsSparse Evolutionary Training
