GrndCtrl: Grounding World Models via Self-Supervised Reward Alignment

Haoyang He; Jay Patrikar; Dong-Ki Kim; Max Smith; Daniel McGann; Ali-akbar Agha-mohammadi; Shayegan Omidshafiei; Sebastian Scherer

arXiv:2512.01952·cs.CV·February 10, 2026

GrndCtrl: Grounding World Models via Self-Supervised Reward Alignment

Haoyang He, Jay Patrikar, Dong-Ki Kim, Max Smith, Daniel McGann, Ali-akbar Agha-mohammadi, Shayegan Omidshafiei, Sebastian Scherer

PDF

Open Access

TL;DR

This paper introduces GrndCtrl, a self-supervised framework that aligns pretrained video world models with physical and perceptual constraints to improve navigation stability and geometric grounding.

Contribution

It proposes RLWG, a novel post-training alignment method using multiple verifiable rewards, and instantiates it with GrndCtrl, enhancing spatial coherence in embodied navigation models.

Findings

01

GrndCtrl improves trajectory stability in navigation tasks.

02

The method achieves better geometric consistency than supervised fine-tuning.

03

It effectively bridges generative pretraining with grounded, reliable behavior.

Abstract

Recent advances in video world modeling have enabled large-scale generative models to simulate embodied environments with high visual fidelity, providing strong priors for prediction, planning, and control. Yet, despite their realism, these models often lack geometric grounding, limiting their use in navigation tasks that require spatial coherence and stability. We introduce Reinforcement Learning with World Grounding (RLWG), a self-supervised post-training framework that aligns pretrained world models with a physically verifiable structure through geometric and perceptual rewards. Analogous to reinforcement learning from verifiable feedback (RLVR) in language models, RLWG can use multiple rewards that measure pose cycle-consistency, depth reprojection, and temporal coherence. We instantiate this framework with GrndCtrl, a reward-aligned adaptation method based on Group Relative Policy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Reinforcement Learning in Robotics