Do LLM Self-Explanations Help Users Predict Model Behavior? Evaluating Counterfactual Simulatability with Pragmatic Perturbations

Pingjun Hong; Benjamin Roth

arXiv:2601.03775·cs.CL·January 8, 2026

Do LLM Self-Explanations Help Users Predict Model Behavior? Evaluating Counterfactual Simulatability with Pragmatic Perturbations

Pingjun Hong, Benjamin Roth

PDF

Open Access

TL;DR

This study investigates whether LLM self-explanations aid users in predicting model behavior, finding that explanations improve accuracy but depend on perturbation strategy and user expertise.

Contribution

It introduces a comprehensive evaluation of the usefulness of LLM-generated explanations for counterfactual prediction, comparing pragmatic perturbations with chain-of-thought explanations.

Findings

01

Self-explanations improve prediction accuracy for humans and LLM judges.

02

Effectiveness varies with perturbation strategy and judge expertise.

03

Qualitative analysis shows explanations help humans form more accurate predictions.

Abstract

Large Language Models (LLMs) can produce verbalized self-explanations, yet prior studies suggest that such rationales may not reliably reflect the model's true decision process. We ask whether these explanations nevertheless help users predict model behavior, operationalized as counterfactual simulatability. Using StrategyQA, we evaluate how well humans and LLM judges can predict a model's answers to counterfactual follow-up questions, with and without access to the model's chain-of-thought or post-hoc explanations. We compare LLM-generated counterfactuals with pragmatics-based perturbations as alternative ways to construct test cases for assessing the potential usefulness of explanations. Our results show that self-explanations consistently improve simulation accuracy for both LLM judges and humans, but the degree and stability of gains depend strongly on the perturbation strategy and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Artificial Intelligence in Healthcare and Education · Computational and Text Analysis Methods