Unrewarded Exploration in Large Language Models Reveals Latent Learning from Psychology

Jian Xiong; Jingbo Zhou; Zihan Zhou; Yixiong Xiao; Le Zhang; Jingyong Ye; Rui Qian; Yang Zhou; Dejing Dou

arXiv:2601.22474·cs.LG·February 2, 2026

Unrewarded Exploration in Large Language Models Reveals Latent Learning from Psychology

Jian Xiong, Jingbo Zhou, Zihan Zhou, Yixiong Xiao, Le Zhang, Jingyong Ye, Rui Qian, Yang Zhou, Dejing Dou

PDF

Open Access

TL;DR

This paper demonstrates that large language models can exhibit latent learning through unrewarded exploration, leading to improved performance when rewards are later introduced, thus challenging reward-centric training paradigms.

Contribution

The study provides empirical evidence and theoretical explanations showing that LLMs can undergo latent learning during unrewarded exploration, enhancing their capabilities beyond reward-based reinforcement learning.

Findings

01

LLMs show performance improvements during unrewarded exploration phases.

02

Post-training with unrewarded exploration leads to higher competence than reward-only training.

03

Latent learning dynamics are consistent across multiple model families and tasks.

Abstract

Latent learning, classically theorized by Tolman, shows that biological agents (e.g., rats) can acquire internal representations of their environment without rewards, enabling rapid adaptation once rewards are introduced. In contrast, from a cognitive science perspective, reward learning remains overly dependent on external feedback, limiting flexibility and generalization. Although recent advances in the reasoning capabilities of large language models (LLMs), such as OpenAI-o1 and DeepSeek-R1, mark a significant breakthrough, these models still rely primarily on reward-centric reinforcement learning paradigms. Whether and how the well-established phenomenon of latent learning in psychology can inform or emerge within LLMs' training remains largely unexplored. In this work, we present novel findings from our experiments that LLMs also exhibit the latent learning dynamics. During an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Language and cultural evolution