TL;DR
Scal3R introduces a scalable test-time training method with a neural global context for large-scale 3D scene reconstruction from videos, improving accuracy and consistency.
Contribution
The paper proposes a neural global context representation and test-time adaptation technique to enhance large-scale 3D reconstruction from long videos.
Findings
Achieves state-of-the-art accuracy on KITTI and Oxford Spires datasets.
Effectively handles ultra-large scenes with improved global consistency.
Maintains efficiency with minimal computational overhead.
Abstract
This paper addresses the task of large-scale 3D scene reconstruction from long video sequences. Recent feed-forward reconstruction models have shown promising results by directly regressing 3D geometry from RGB images without explicit 3D priors or geometric constraints. However, these methods often struggle to maintain reconstruction accuracy and consistency over long sequences due to limited memory capacity and the inability to effectively capture global contextual cues. In contrast, humans can naturally exploit the global understanding of the scene to inform local perception. Motivated by this, we propose a novel neural global context representation that efficiently compresses and retains long-range scene information, enabling the model to leverage extensive contextual cues for enhanced reconstruction accuracy and consistency. The context representation is realized through a set of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
