Reinforcement World Model Learning for LLM-based Agents

Xiao Yu; Baolin Peng; Ruize Xu; Yelong Shen; Pengcheng He; Suman Nath; Nikhil Singh; Jiangfeng Gao; Zhou Yu

arXiv:2602.05842·cs.CL·February 10, 2026

Reinforcement World Model Learning for LLM-based Agents

Xiao Yu, Baolin Peng, Ruize Xu, Yelong Shen, Pengcheng He, Suman Nath, Nikhil Singh, Jiangfeng Gao, Zhou Yu

PDF

Open Access

TL;DR

This paper introduces Reinforcement World Model Learning (RWML), a self-supervised approach enabling LLM-based agents to learn consistent action-conditioned world models from textual states, improving their environment understanding and task performance.

Contribution

The paper presents a novel self-supervised training method for LLMs to learn action-conditioned world models, addressing limitations of token prediction and enhancing agent adaptability.

Findings

01

Significant performance improvements on ALFWorld and τ² Bench.

02

Robustness against reward hacking compared to token prediction methods.

03

Outperforms direct task-success reward RL and matches expert-data training results.

Abstract

Large language models (LLMs) have achieved strong performance in language-centric tasks. However, in agentic settings, LLMs often struggle to anticipate action consequences and adapt to environment dynamics, highlighting the need for world-modeling capabilities in LLM-based agents. We propose Reinforcement World Model Learning (RWML), a self-supervised method that learns action-conditioned world models for LLM-based agents on textual states using sim-to-real gap rewards. Our method aligns simulated next states produced by the model with realized next states observed from the environment, encouraging consistency between internal world simulations and actual environment dynamics in a pre-trained embedding space. Unlike next-state token prediction, which prioritizes token-level fidelity (i.e., reproducing exact wording) over semantic equivalence and can lead to model collapse, our method…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Machine Learning in Healthcare