TL;DR
LASER is a training-free framework that enables streaming 4D reconstruction by aligning predictions across temporal windows using layer-wise scale adjustments, improving efficiency and accuracy.
Contribution
It introduces layer-wise scale alignment to convert offline models into streaming systems without retraining, addressing layer misalignment issues caused by monocular scale ambiguity.
Findings
Achieves state-of-the-art pose estimation and point cloud reconstruction quality.
Operates at 14 FPS with 6 GB memory on a RTX A6000 GPU.
Enables practical kilometer-scale streaming video reconstruction.
Abstract
Recent feed-forward reconstruction models like VGGT and achieve impressive reconstruction quality but cannot process streaming videos due to quadratic memory complexity, limiting their practical deployment. While existing streaming methods address this through learned memory mechanisms or causal attention, they require extensive retraining and may not fully leverage the strong geometric priors of state-of-the-art offline models. We propose LASER, a training-free framework that converts an offline reconstruction model into a streaming system by aligning predictions across consecutive temporal windows. We observe that simple similarity transformation () alignment fails due to layer depth misalignment: monocular scale ambiguity causes relative depth scales of different scene layers to vary inconsistently between windows. To address this, we introduce layer-wise…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
