Do LLM Self-Explanations Help Users Predict Model Behavior? Evaluating Counterfactual Simulatability with Pragmatic Perturbations
Pingjun Hong, Benjamin Roth

TL;DR
This study investigates whether LLM self-explanations aid users in predicting model behavior, finding that explanations improve accuracy but depend on perturbation strategy and user expertise.
Contribution
It introduces a comprehensive evaluation of the usefulness of LLM-generated explanations for counterfactual prediction, comparing pragmatic perturbations with chain-of-thought explanations.
Findings
Self-explanations improve prediction accuracy for humans and LLM judges.
Effectiveness varies with perturbation strategy and judge expertise.
Qualitative analysis shows explanations help humans form more accurate predictions.
Abstract
Large Language Models (LLMs) can produce verbalized self-explanations, yet prior studies suggest that such rationales may not reliably reflect the model's true decision process. We ask whether these explanations nevertheless help users predict model behavior, operationalized as counterfactual simulatability. Using StrategyQA, we evaluate how well humans and LLM judges can predict a model's answers to counterfactual follow-up questions, with and without access to the model's chain-of-thought or post-hoc explanations. We compare LLM-generated counterfactuals with pragmatics-based perturbations as alternative ways to construct test cases for assessing the potential usefulness of explanations. Our results show that self-explanations consistently improve simulation accuracy for both LLM judges and humans, but the degree and stability of gains depend strongly on the perturbation strategy and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Artificial Intelligence in Healthcare and Education · Computational and Text Analysis Methods
