ICE: Intervention-Consistent Explanation Evaluation with Statistical Grounding for LLMs
Abhinaba Basu, Pavan Chakraborty

TL;DR
This paper introduces ICE, a statistical framework for evaluating explanation faithfulness in large language models across multiple languages and tasks, revealing operator-dependent results and the limitations of existing benchmarks.
Contribution
We propose ICE, a novel statistical testing framework for explanation evaluation, and provide a comprehensive multilingual benchmark revealing complex model and language interactions.
Findings
Faithfulness varies significantly across intervention operators.
Randomized baselines often show anti-faithfulness, challenging existing metrics.
Faithfulness does not correlate with human plausibility.
Abstract
Evaluating whether explanations faithfully reflect a model's reasoning remains an open problem. Existing benchmarks use single interventions without statistical testing, making it impossible to distinguish genuine faithfulness from chance-level performance. We introduce ICE (Intervention-Consistent Explanation), a framework that compares explanations against matched random baselines via randomization tests under multiple intervention operators, yielding win rates with confidence intervals. Evaluating 7 LLMs across 4 English tasks, 6 non-English languages, and 2 attribution methods, we find that faithfulness is operator-dependent: operator gaps reach up to 44 percentage points, with deletion typically inflating estimates on short text but the pattern reversing on long text, suggesting that faithfulness should be interpreted comparatively across intervention operators rather than as a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Natural Language Processing Techniques
