Rethinking Explanation Evaluation under the Retraining Scheme
Yi Cai, Thibaud Ardoin, Mayank Gulati, Gerhard Wunder

TL;DR
This paper critically examines retraining-based explanation evaluation methods, identifies key issues affecting their reliability, and proposes improved variants that enhance evaluation accuracy and efficiency for model explainability.
Contribution
It reveals the sign issue as a core problem in retraining-based evaluation and introduces new variants that better align empirical results with theoretical expectations.
Findings
Revealed the sign issue causes residual information in evaluation.
Proposed variants improve evaluation efficiency and reliability.
Empirical results provide deeper insights into explainer performance.
Abstract
Feature attribution has gained prominence as a tool for explaining model decisions, yet evaluating explanation quality remains challenging due to the absence of ground-truth explanations. To circumvent this, explanation-guided input manipulation has emerged as an indirect evaluation strategy, measuring explanation effectiveness through the impact of input modifications on model outcomes during inference. Despite the widespread use, a major concern with inference-based schemes is the distribution shift caused by such manipulations, which undermines the reliability of their assessments. The retraining-based scheme ROAR overcomes this issue by adapting the model to the altered data distribution. However, its evaluation results often contradict the theoretical foundations of widely accepted explainers. This work investigates this misalignment between empirical observations and theoretical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications · Advanced Graph Neural Networks
