Do Large Language Models Mentalize When They Teach?
Sevan K. Harootonian, Mark K. Ho, Thomas L. Griffiths, Yael Niv, Ilia Sucholutsky

TL;DR
This study investigates whether large language models teach by mentalizing or using simple heuristics, finding that they often resemble human-like Bayesian inference but do not consistently improve with scaffolding prompts.
Contribution
It applies cognitive models to analyze LLM teaching strategies, revealing that most LLMs behave similarly to Bayesian teachers and that scaffolding does not reliably enhance their teaching performance.
Findings
Most LLMs perform well and resemble human teaching strategies.
Bayes-Optimal models best explain LLM teaching choices.
Scaffolding prompts do not reliably improve LLM teaching performance.
Abstract
How do LLMs decide what to teach next: by reasoning about a learner's knowledge, or by using simpler rules of thumb? We test this in a controlled task previously used to study human teaching strategies. On each trial, a teacher LLM sees a hypothetical learner's trajectory through a reward-annotated directed graph and must reveal a single edge so the learner would choose a better path if they replanned. We run a range of LLMs as simulated teachers and fit their trial-by-trial choices with the same cognitive models used for humans: a Bayes-Optimal teacher that infers which transitions the learner is missing (inverse planning), weaker Bayesian variants, heuristic baselines (e.g., reward based), and non-mentalizing utility models. In a baseline experiment matched to the stimuli presented to human subjects, most LLMs perform well, show little change in strategy over trials, and their…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
