Using Reinforcement Learning to Train Large Language Models to Explain Human Decisions
Jian-Qiao Zhu, Hanbo Xie, Dilip Arumugam, Robert C. Wilson, Thomas L. Griffiths

TL;DR
This paper investigates using reinforcement learning to fine-tune large language models so they can both predict human decision-making and generate interpretable natural language explanations of cognitive processes.
Contribution
It introduces a reinforcement learning approach to train LLMs for dual tasks of prediction and explanation of human decisions, enhancing interpretability.
Findings
LLMs can generate high-quality explanations of human risky choices.
Reinforcement learning improves the alignment of explanations with decision predictions.
The method achieves strong predictive performance and interpretability in cognitive modeling.
Abstract
A central goal of cognitive modeling is to develop models that not only predict human behavior but also provide insight into the underlying cognitive mechanisms. While neural network models trained on large-scale behavioral data often achieve strong predictive performance, they typically fall short in offering interpretable explanations of the cognitive processes they capture. In this work, we explore the potential of pretrained large language models (LLMs) to serve as dual-purpose cognitive models--capable of both accurate prediction and interpretable explanation in natural language. Specifically, we employ reinforcement learning with outcome-based rewards to guide LLMs toward generating explicit reasoning traces for explaining human risky choices. Our findings demonstrate that this approach produces high-quality explanations alongside strong quantitative predictions of human decisions.
Peer Reviews
Decision·ICLR 2026 Poster
The paper has a clear goal, and that goal is well executed. I think this type of modeling and analysis efforts will be interesting for the cognitive science community. The evaluations and ablations are quite comprehensive. The Appendix, in particular, contains several insightful analyses. The two that are very important for the paper’s message are: - The ablation experiment in C.3, where the authors swap the CoT between the RL and the base models to show the importance of these traces. - The e
I’m not convinced that RL is essential for this pipeline. The development of predictive and explanatory reasoning traces are attributed to RL by making comparisons to the base model. However, perhaps SFT (either Centaur style or full) is also sufficient to develop such traces. If this is the case, SFT may be preferred over RL fine-tuning given a) reduced computational costs during training and b) difficulties around getting RL to work, as the authors have also pointed out in section F. **If the
Strength: 1 The topic is interesting, and to my knowledge, this is the first work that applies RL to analyze the risky decision of human behavior. 2 The experiments are abundant. 3 Figure 1 provides a clear and concrete example that effectively illustrates the task.
Weakness: 1 Although the paper integrates RL to explain human decision-making and defines a reward function in Formula (1), it lacks an in-depth analysis of the task. As a result, the contribution appears incremental, and the work reads more like an experimental report than a research paper. 2 The paper leans more toward psychological or cognitive science research than computer science, as the main contributions involve cognitive interpretation rather than methodological innovation. Moreover
The problem of explaining human decisions is interesting and worth exploring.
I do not see the contribution of this paper. The outcome reward that compares prediction correctness is standard in RLVR, the GRPO is directly borrowed from literature, the "step-by-step" prompt is also directly borrowed from literature. In that sense, I do not see anything new or unique that is proposed by this paper. Additionally, this paper focuses on the interpretability of human decisions, however, I do not see any special design for this purpose. Specifically, the paper proposes an outco
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications · Artificial Intelligence in Healthcare and Education
