Evaluating the False Trust Engendered by LLM Explanations
Vardhan Palod, Upasana Biswas, Subbarao Kambhampati

TL;DR
This study investigates how different types of explanations from Large Language Models influence user trust and ability to assess answer correctness, revealing that dual explanations improve discernment while others may foster false trust.
Contribution
The paper introduces an evaluation protocol and demonstrates that dual explanations enhance user judgment of AI correctness compared to reasoning traces and post-hoc explanations.
Findings
Reasoning traces and post-hoc explanations increase false trust regardless of correctness.
Dual explanations improve users' ability to distinguish correct from incorrect answers.
Other explanation types are persuasive but not informative.
Abstract
Large Language Models (LLMs) and Large Reasoning Models (LRMs) are increasingly used for critical tasks, yet they provide no guarantees about the correctness of their solutions. Users must decide whether to trust the model's answer, aided by reasoning traces, their summaries, or post-hoc generated explanations. These reasoning traces, despite evidence that they are neither faithful representations of the model's computations nor necessarily semantically meaningful, are often interpreted as provenance explanations. It is unclear whether explanations or reasoning traces help users identify when the AI is incorrect, or whether they simply persuade users to trust the AI regardless. In this paper, we take a user-centered approach and develop an evaluation protocol to study how different explanation types affect users' ability to judge the correctness of AI-generated answers and engender…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
