GeoFlow: Enforcing Implicit Geometric Consistency in Video Generation

Jan Ackermann; Shengqu Cai; Boyang Deng; Zhengfei Kuang; Songyou Peng; Gordon Wetzstein

arXiv:2605.18365·cs.CV·May 19, 2026

GeoFlow: Enforcing Implicit Geometric Consistency in Video Generation

Jan Ackermann, Shengqu Cai, Boyang Deng, Zhengfei Kuang, Songyou Peng, Gordon Wetzstein

PDF

1 Repo

TL;DR

This paper introduces a geometry-consistency reward for video generation that explicitly enforces geometric coherence, reducing artifacts and improving temporal consistency in generated videos.

Contribution

It proposes a novel, model-agnostic reward based on optical flow and depth-pose predictions to explicitly optimize geometric consistency in video generation.

Findings

01

Significant reduction in geometric artifacts compared to baselines.

02

Improved temporal consistency in generated videos.

03

Applicable to diverse dynamic scenes with camera and object motion.

Abstract

Generating geometrically consistent videos remains an open challenge: text-to-video diffusion models trained on web-scale data treat geometry only implicitly, leading to object deformation, texture drift, and non-rigid backgrounds under camera motion. Existing solutions either improve consistency as a byproduct, apply only to static scenes or realign the latent space of the model completely. We introduce a geometry-consistency reward that directly measures whether motion in a generated video is compatible with a coherent scene. Our key insight is that in physically consistent videos, background motion should be explainable by rigid camera-induced flow, while independently moving objects should preserve appearance identity along motion trajectories. We operationalize this using optical flow, depth--pose predictions, and feature-based correspondence to separate rigid and dynamic regions…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

null
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.