DAGE: Dual-Stream Architecture for Efficient and Fine-Grained Geometry Estimation

Tuan Duc Ngo; Jiahui Huang; Seoung Wug Oh; Kevin Blackburn-Matzen; Evangelos Kalogerakis; Chuang Gan; Joon-Young Lee

arXiv:2603.03744·cs.CV·March 24, 2026

DAGE: Dual-Stream Architecture for Efficient and Fine-Grained Geometry Estimation

Tuan Duc Ngo, Jiahui Huang, Seoung Wug Oh, Kevin Blackburn-Matzen, Evangelos Kalogerakis, Chuang Gan, Joon-Young Lee

PDF

Open Access 1 Models

TL;DR

DAGE introduces a dual-stream transformer architecture that efficiently estimates high-resolution, view-consistent geometry and camera poses from uncalibrated multi-view videos, balancing global coherence and fine detail.

Contribution

The paper presents DAGE, a novel dual-stream transformer that disentangles global and local features for scalable, accurate geometry and pose estimation from high-resolution videos.

Findings

01

Achieves state-of-the-art results in video geometry estimation.

02

Supports inputs up to 2K resolution with practical inference costs.

03

Produces sharp depth maps and maintains cross-view consistency.

Abstract

Estimating accurate, view-consistent geometry and camera poses from uncalibrated multi-view/video inputs remains challenging - especially at high spatial resolutions and over long sequences. We present DAGE, a dual-stream transformer whose main novelty is to disentangle global coherence from fine detail. A low-resolution stream operates on aggressively downsampled frames with alternating frame/global attention to build a view-consistent representation and estimate cameras efficiently, while a high-resolution stream processes the original images per-frame to preserve sharp boundaries and small structures. A lightweight adapter fuses these streams via cross-attention, injecting global context without disturbing the pretrained single-frame pathway. This design scales resolution and clip length independently, supports inputs up to 2K, and maintains practical inference cost. DAGE delivers…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
TuanNgo/DAGE
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Video Coding and Compression Technologies · Optical measurement and interference techniques