VideoLifter: Lifting Videos to 3D with Fast Hierarchical Stereo Alignment

Wenyan Cong; Hanqing Zhu; Kevin Wang; Jiahui Lei; Colton Stearns; Yuanhao Cai; Leonidas Guibas; Zhangyang Wang; Zhiwen Fan

arXiv:2501.01949·cs.CV·July 8, 2025

VideoLifter: Lifting Videos to 3D with Fast Hierarchical Stereo Alignment

Wenyan Cong, Hanqing Zhu, Kevin Wang, Jiahui Lei, Colton Stearns, Yuanhao Cai, Leonidas Guibas, Zhangyang Wang, Zhiwen Fan

PDF

Open Access 1 Repo

TL;DR

VideoLifter introduces a fast, hierarchical video-to-3D reconstruction method that combines local fragment registration with global merging, achieving state-of-the-art quality with significantly reduced training time.

Contribution

It presents a novel local-to-global pipeline for 3D scene reconstruction from videos, improving efficiency and accuracy over existing methods.

Findings

01

Reduces training time by over 82%.

02

Achieves better visual quality than current SOTA methods.

03

Efficiently maintains global consistency in 3D reconstruction.

Abstract

Efficiently reconstructing 3D scenes from monocular video remains a core challenge in computer vision, vital for applications in virtual reality, robotics, and scene understanding. Recently, frame-by-frame progressive reconstruction without camera poses is commonly adopted, incurring high computational overhead and compounding errors when scaling to longer videos. To overcome these issues, we introduce VideoLifter, a novel video-to-3D pipeline that leverages a local-to-global strategy on a fragment basis, achieving both extreme efficiency and SOTA quality. Locally, VideoLifter leverages learnable 3D priors to register fragments, extracting essential information for subsequent 3D Gaussian initialization with enforced inter-fragment consistency and optimized efficiency. Globally, it employs a tree-based hierarchical merging method with key frame guidance for inter-fragment alignment,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

VITA-Group/VideoLifter
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Computer Graphics and Visualization Techniques · Video Coding and Compression Technologies