Towards Transparent Reasoning: What Drives Faithfulness in Large Language Models?

Teague McMillan; Gabriele Dominici; Martin Gjoreski; Marc Langheinrich

arXiv:2510.24236·cs.CL·November 4, 2025

Towards Transparent Reasoning: What Drives Faithfulness in Large Language Models?

Teague McMillan, Gabriele Dominici, Martin Gjoreski, Marc Langheinrich

PDF

TL;DR

This paper investigates how inference and training choices affect the faithfulness of explanations generated by large language models, with implications for improving trustworthiness in healthcare and social bias contexts.

Contribution

It systematically evaluates how few-shot examples, prompting strategies, and training procedures influence explanation faithfulness in LLMs.

Findings

01

Few-shot example quantity and quality impact faithfulness

02

Prompting design significantly affects explanation faithfulness

03

Instruction-tuning improves faithfulness in medical tasks

Abstract

Large Language Models (LLMs) often produce explanations that do not faithfully reflect the factors driving their predictions. In healthcare settings, such unfaithfulness is especially problematic: explanations that omit salient clinical cues or mask spurious shortcuts can undermine clinician trust and lead to unsafe decision support. We study how inference and training-time choices shape explanation faithfulness, focusing on factors practitioners can control at deployment. We evaluate three LLMs (GPT-4.1-mini, LLaMA 70B, LLaMA 8B) on two datasets-BBQ (social bias) and MedQA (medical licensing questions), and manipulate the number and type of few-shot examples, prompting strategies, and training procedure. Our results show: (i) both the quantity and quality of few-shot examples significantly impact model faithfulness; (ii) faithfulness is sensitive to prompting design; (iii) the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.