GemDepth: Geometry-Embedded Features for 3D-Consistent Video Depth

Yuecheng Liu; Junda Cheng; Longliang Liu; Wenjing Liao; Hanrui Cheng; Yuzhou Wang; Xin Yang

arXiv:2605.10525·cs.CV·May 20, 2026

GemDepth: Geometry-Embedded Features for 3D-Consistent Video Depth

Yuecheng Liu, Junda Cheng, Longliang Liu, Wenjing Liao, Hanrui Cheng, Yuzhou Wang, Xin Yang

PDF

1 Repo

TL;DR

GemDepth introduces a geometry-aware framework for 3D-consistent video depth estimation, leveraging explicit camera motion and geometric embeddings to improve detail and temporal coherence.

Contribution

It proposes a novel Geometry-Embedding Module and an Alternating Spatio-Temporal Transformer to enhance 3D consistency and spatial detail in video depth estimation.

Findings

01

Achieves state-of-the-art performance on multiple datasets.

02

Effectively maintains 3D geometric consistency under view changes.

03

Improves spatial detail and temporal coherence simultaneously.

Abstract

Video depth estimation extends monocular prediction into the temporal domain to ensure coherence. However, existing methods often suffer from spatial blurring in fine-detail regions and temporal inconsistencies. We argue that current approaches, which primarily rely on temporal smoothing via Transformers, struggle to maintain strict 3D geometric consistency-particularly under rotations or drastic view changes. To address this, we propose GemDepth, a framework built on the insight that an explicit awareness of camera motion and global 3D structure is a prerequisite for 3D consistency. Distinctively, GemDepth introduces a Geometry-Embedding Module (GEM) that predicts inter-frame camera poses to generate implicit geometric embeddings. This injection of motion priors equips the network with intrinsic 3D perception and alignment capabilities. Guided by these geometric cues, our Alternating…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Yuecheng919/GemDepth
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.