Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling

Haoyu Wu; Diankun Wu; Tianyu He; Junliang Guo; Yang Ye; Yueqi Duan; and Jiang Bian

arXiv:2507.07982·cs.CV·May 6, 2026

Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling

Haoyu Wu, Diankun Wu, Tianyu He, Junliang Guo, Yang Ye, Yueqi Duan, and Jiang Bian

PDF

1 Repo 1 Models 1 Video

TL;DR

This paper introduces Geometry Forcing, a method that guides video diffusion models to learn 3D-aware representations by aligning intermediate features with geometric cues, improving 3D consistency in video generation.

Contribution

It proposes a novel alignment-based approach to embed geometric awareness into video diffusion models, bridging the gap between 2D video data and 3D world understanding.

Findings

01

Enhanced 3D consistency in generated videos.

02

Improved visual quality over baseline models.

03

Effective alignment of features with geometric cues.

Abstract

Videos inherently represent 2D projections of a dynamic 3D world. However, our analysis suggests that video diffusion models trained solely on raw video data often fail to capture meaningful geometric-aware structure in their learned representations. To bridge the gap between video diffusion models and the underlying 3D nature of the physical world, we propose Geometry Forcing, a simple yet effective method that encourages video diffusion models to internalize 3D representations. Our key insight is to guide the model's intermediate representations toward geometry-aware structure by aligning them with features from a geometric foundation model. To this end, we introduce two complementary alignment objectives: Angular Alignment, which enforces directional consistency via cosine similarity, and Scale Alignment, which preserves scale-related information by regressing geometric features from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://GeometryForcing.github.io
github

Models

🤗
Haoyuwu/GeometryForcing
model· ♡ 3
♡ 3

Videos

Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling· slideslive