Beyond State Consistency: Behavior Consistency in Text-Based World Models

Youling Huang; Guanqiao Chen; Junchi Yao; Lu Wang; Fangkai Yang; Chao Du; ChenZhuo Zhao; Pu Zhao; Qingwei Lin; Saravan Rajmohan; Dongmei Zhang

arXiv:2604.13824·cs.LG·April 16, 2026

Beyond State Consistency: Behavior Consistency in Text-Based World Models

Youling Huang, Guanqiao Chen, Junchi Yao, Lu Wang, Fangkai Yang, Chao Du, ChenZhuo Zhao, Pu Zhao, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang

PDF

TL;DR

This paper introduces Behavior Consistency Reward (BehR), a new training paradigm for text-based world models that better aligns agent behavior with real environments, improving long-term consistency and evaluation metrics.

Contribution

It proposes a behavior-aligned training method using BehR to enhance the functional consistency of text-based world models beyond traditional single-step metrics.

Findings

01

BehR-based training improves long-term alignment in WebShop and TextWorld.

02

Models trained with BehR achieve lower false positives in offline evaluation.

03

Modest gains observed in inference-time lookahead planning.

Abstract

World models have been emerging as critical components for assessing the consequences of actions generated by interactive agents in online planning and offline evaluation. In text-based environments, world models are typically evaluated and trained with single-step metrics such as Exact Match, aiming to improve the similarity between predicted and real-world states, but such metrics have been shown to be insufficient for capturing actual agent behavior. To address this issue, we introduce a new behavior-aligned training paradigm aimed at improving the functional consistency between the world model and the real environment. This paradigm focuses on optimizing a tractable step-level metric named Behavior Consistency Reward (BehR), which measures how much the likelihood of a logged next action changes between the real state and the world-model-predicted state under a frozen Reference…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.