DAGE: Dual-Stream Architecture for Efficient and Fine-Grained Geometry Estimation
Tuan Duc Ngo, Jiahui Huang, Seoung Wug Oh, Kevin Blackburn-Matzen, Evangelos Kalogerakis, Chuang Gan, Joon-Young Lee

TL;DR
DAGE introduces a dual-stream transformer architecture that efficiently estimates high-resolution, view-consistent geometry and camera poses from uncalibrated multi-view videos, balancing global coherence and fine detail.
Contribution
The paper presents DAGE, a novel dual-stream transformer that disentangles global and local features for scalable, accurate geometry and pose estimation from high-resolution videos.
Findings
Achieves state-of-the-art results in video geometry estimation.
Supports inputs up to 2K resolution with practical inference costs.
Produces sharp depth maps and maintains cross-view consistency.
Abstract
Estimating accurate, view-consistent geometry and camera poses from uncalibrated multi-view/video inputs remains challenging - especially at high spatial resolutions and over long sequences. We present DAGE, a dual-stream transformer whose main novelty is to disentangle global coherence from fine detail. A low-resolution stream operates on aggressively downsampled frames with alternating frame/global attention to build a view-consistent representation and estimate cameras efficiently, while a high-resolution stream processes the original images per-frame to preserve sharp boundaries and small structures. A lightweight adapter fuses these streams via cross-attention, injecting global context without disturbing the pretrained single-frame pathway. This design scales resolution and clip length independently, supports inputs up to 2K, and maintains practical inference cost. DAGE delivers…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Video Coding and Compression Technologies · Optical measurement and interference techniques
