VGGRPO: Towards World-Consistent Video Generation with 4D Latent Reward

Zhaochong An; Orest Kupyn; Th\'eo Uscidda; Andrea Colaco; Karan Ahuja; Serge Belongie; Mar Gonzalez-Franco; Marta Tintore Gazulla

arXiv:2603.26599·cs.CV·March 30, 2026

VGGRPO: Towards World-Consistent Video Generation with 4D Latent Reward

Zhaochong An, Orest Kupyn, Th\'eo Uscidda, Andrea Colaco, Karan Ahuja, Serge Belongie, Mar Gonzalez-Franco, Marta Tintore Gazulla

PDF

TL;DR

VGGRPO introduces a latent geometry-guided framework that enhances geometric consistency in video generation, especially for dynamic scenes, without compromising pretrained model capabilities or incurring high computational costs.

Contribution

The paper proposes VGGRPO, a novel latent-space geometry-guided approach utilizing a 4D reconstruction model and reinforcement learning rewards to improve world consistency in video generation.

Findings

01

Improves camera stability and geometric coherence in static and dynamic videos.

02

Eliminates the need for repeated VAE decoding, reducing computational overhead.

03

Enhances overall video quality and consistency across benchmarks.

Abstract

Large-scale video diffusion models achieve impressive visual quality, yet often fail to preserve geometric consistency. Prior approaches improve consistency either by augmenting the generator with additional modules or applying geometry-aware alignment. However, architectural modifications can compromise the generalization of internet-scale pretrained models, while existing alignment methods are limited to static scenes and rely on RGB-space rewards that require repeated VAE decoding, incurring substantial compute overhead and failing to generalize to highly dynamic real-world scenes. To preserve the pretrained capacity while improving geometric consistency, we propose VGGRPO (Visual Geometry GRPO), a latent geometry-guided framework for geometry-aware video post-training. VGGRPO introduces a Latent Geometry Model (LGM) that stitches video diffusion latents to geometry foundation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.