Faithfulness Serum: Mitigating the Faithfulness Gap in Textual Explanations of LLM Decisions via Attribution Guidance

Bar Alon; Itamar Zimerman; Lior Wolf

arXiv:2604.14325·cs.CL·April 17, 2026

Faithfulness Serum: Mitigating the Faithfulness Gap in Textual Explanations of LLM Decisions via Attribution Guidance

Bar Alon, Itamar Zimerman, Lior Wolf

PDF

TL;DR

This paper introduces a training-free approach to improve the faithfulness of LLM explanations by guiding them through attention-level interventions based on attribution heatmaps, addressing the gap between subjective and epistemic faithfulness.

Contribution

It proposes a novel, training-free method that enhances the epistemic faithfulness of LLM explanations using attribution-guided attention interventions.

Findings

01

Significantly improves epistemic faithfulness across multiple models

02

Addresses the gap between subjective appearance and actual evidence reliance

03

Works effectively across various benchmarks and prompts

Abstract

Large language models (LLMs) achieve strong performance and have revolutionized NLP, but their lack of explainability keeps them treated as black boxes, limiting their use in domains that demand transparency and trust. A promising direction to address this issue is post-hoc text-based explanations, which aim to explain model decisions in natural language. Prior work has focused on generating convincing rationales that appear to be subjectively faithful, but it remains unclear whether these explanations are epistemically faithful, whether they reflect the internal evidence the model actually relied on for its decision. In this paper, we first assess the epistemic faithfulness of LLM-generated explanations via counterfactuals and show that they are often unfaithful. We then introduce a training-free method that enhances faithfulness by guiding explanation generation through…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.