WorldCompass: Reinforcement Learning for Long-Horizon World Models

Zehan Wang; Tengfei Wang; Haiyu Zhang; Xuhui Zuo; Junta Wu; Haoyuan Wang; Wenqiang Sun; Zhenwei Wang; Chenjie Cao; Hengshuang Zhao; Chunchao Guo; Zhou Zhao

arXiv:2602.09022·cs.CV·February 10, 2026

WorldCompass: Reinforcement Learning for Long-Horizon World Models

Zehan Wang, Tengfei Wang, Haiyu Zhang, Xuhui Zuo, Junta Wu, Haoyuan Wang, Wenqiang Sun, Zhenwei Wang, Chenjie Cao, Hengshuang Zhao, Chunchao Guo, Zhou Zhao

PDF

Open Access

TL;DR

WorldCompass introduces a reinforcement learning framework that enhances long-horizon, interactive video world models by improving exploration, accuracy, and visual fidelity through innovative strategies and efficient algorithms.

Contribution

The paper presents novel RL techniques tailored for autoregressive video world models, including clip-level rollout, specialized reward functions, and an efficient fine-tuning algorithm.

Findings

01

Significant improvement in interaction accuracy

02

Enhanced visual fidelity in generated videos

03

Efficient RL training with reduced reward hacking

Abstract

This work presents WorldCompass, a novel Reinforcement Learning (RL) post-training framework for the long-horizon, interactive video-based world models, enabling them to explore the world more accurately and consistently based on interaction signals. To effectively "steer" the world model's exploration, we introduce three core innovations tailored to the autoregressive video generation paradigm: 1) Clip-level rollout Strategy: We generate and evaluate multiple samples at a single target clip, which significantly boosts rollout efficiency and provides fine-grained reward signals. 2) Complementary Reward Functions: We design reward functions for both interaction-following accuracy and visual quality, which provide direct supervision and effectively suppress reward-hacking behaviors. 3) Efficient RL Algorithm: We employ the negative-aware fine-tuning strategy coupled with various…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Generative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications