Simulated Annealing Enhances Theory-of-Mind Reasoning in Autoregressive Language Models

Xucong Hu; Jian-Qiao Zhu

arXiv:2601.12269·cs.CL·January 21, 2026

Simulated Annealing Enhances Theory-of-Mind Reasoning in Autoregressive Language Models

Xucong Hu, Jian-Qiao Zhu

PDF

Open Access

TL;DR

This paper demonstrates that simulated annealing, a sampling-based optimization technique, significantly improves the ability of autoregressive language models to perform Theory of Mind reasoning without additional training.

Contribution

The study introduces a novel application of annealing in power-sampling methods to enhance latent mental state reasoning in language models without retraining.

Findings

01

Annealing improves ToM reasoning performance.

02

Sampling-based methods extract latent capabilities.

03

No additional training required for improved ToM.

Abstract

Autoregressive language models are next-token predictors and have been criticized for only optimizing surface plausibility (i.e., local coherence) rather than maintaining correct latent-state representations (i.e., global coherence). Because Theory of Mind (ToM) tasks crucially depend on reasoning about latent mental states of oneself and others, such models are therefore often thought to fail at ToM. While post-training methods can improve ToM performance, we show that strong ToM capability can be recovered directly from the base model without any additional weight updates or verifications. Our approach builds on recent power-sampling methods (Karan & Du, 2025) that use Markov chain Monte Carlo (MCMC) to sample from sharpened sequence-level (rather than token-level) probability distributions of autoregressive language models. We further find that incorporating annealing, where the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeurobiology of Language and Bilingualism · Topic Modeling · Embodied and Extended Cognition