Probing the Lack of Stable Internal Beliefs in LLMs
Yifan Luo, Kangping Xu, Yanzhen Lu, Yang Yuan, Andrew Chi-Chih Yao

TL;DR
This paper investigates whether large language models can maintain stable internal goals during multi-turn interactions, revealing significant challenges in achieving consistent persona-driven behavior without explicit goal reinforcement.
Contribution
It introduces a novel riddle game paradigm to evaluate implicit goal consistency in LLMs and demonstrates their difficulty in maintaining stable internal representations over extended dialogues.
Findings
LLMs often fail to preserve latent goals across turns
Explicit context is necessary for LLMs to maintain goal consistency
Highlighting a key limitation for realistic personality modeling in LLMs
Abstract
Persona-driven large language models (LLMs) require consistent behavioral tendencies across interactions to simulate human-like personality traits, such as persistence or reliability. However, current LLMs often lack stable internal representations that anchor their responses over extended dialogues. This work explores whether LLMs can maintain "implicit consistency", defined as persistent adherence to an unstated goal in multi-turn interactions. We designed a 20-question-style riddle game paradigm where an LLM is tasked with secretly selecting a target and responding to users' guesses with "yes/no" answers. Through evaluations, we find that LLMs struggle to preserve latent consistency: their implicit "goals" shift across turns unless explicitly provided their selected target in context. These findings highlight critical limitations in the building of persona-driven LLMs and underscore…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPersona Design and Applications · Social Robot Interaction and HRI · AI in Service Interactions
