GeoVideo: Introducing Geometric Regularization into Video Generation Model

Yunpeng Bai; Shaoheng Fang; Chaohui Yu; Fan Wang; Qixing Huang

arXiv:2512.03453·cs.CV·December 4, 2025

GeoVideo: Introducing Geometric Regularization into Video Generation Model

Yunpeng Bai, Shaoheng Fang, Chaohui Yu, Fan Wang, Qixing Huang

PDF

Open Access 1 Video

TL;DR

This paper enhances video generation models by integrating geometric regularization through depth prediction, significantly improving temporal consistency and 3D structural coherence in synthesized videos.

Contribution

It introduces a novel geometric regularization framework using depth prediction and multi-view loss to enforce 3D structural consistency in diffusion-based video generation.

Findings

01

Improved temporal stability and geometric consistency in generated videos.

02

Enhanced shape and structure coherence across frames.

03

Significant performance gains over baseline models.

Abstract

Recent advances in video generation have enabled the synthesis of high-quality and visually realistic clips using diffusion transformer models. However, most existing approaches operate purely in the 2D pixel space and lack explicit mechanisms for modeling 3D structures, often resulting in temporally inconsistent geometries, implausible motions, and structural artifacts. In this work, we introduce geometric regularization losses into video generation by augmenting latent diffusion models with per-frame depth prediction. We adopted depth as the geometric representation because of the great progress in depth prediction and its compatibility with image-based latent encoders. Specifically, to enforce structural consistency over time, we propose a multi-view geometric loss that aligns the predicted depth maps across frames within a shared 3D coordinate system. Our method bridges the gap…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

GeoVideo: Introducing Geometric Regularization into Video Generation Model· slideslive

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Vision and Imaging · 3D Shape Modeling and Analysis