LoGeR: Long-Context Geometric Reconstruction with Hybrid Memory
Junyi Zhang, Charles Herrmann, Junhwa Hur, Chen Sun, Ming-Hsuan Yang, Forrester Cole, Trevor Darrell, Deqing Sun

TL;DR
LoGeR is a novel architecture that enables dense 3D reconstruction of extremely long video sequences by combining hybrid memory modules, significantly improving scalability and accuracy over previous methods.
Contribution
LoGeR introduces a hybrid memory system with parametric TTT and non-parametric SWA to scale dense 3D reconstruction to thousands of frames without post-optimization.
Findings
Reduces ATE on KITTI by over 74%.
Successfully reconstructs sequences up to 19,000 frames.
Outperforms prior state-of-the-art feedforward methods.
Abstract
Feedforward geometric foundation models achieve strong short-window reconstruction, yet scaling them to minutes-long videos is bottlenecked by quadratic attention complexity or limited effective memory in recurrent designs. We present LoGeR (Long-context Geometric Reconstruction), a novel architecture that scales dense 3D reconstruction to extremely long sequences without post-optimization. LoGeR processes video streams in chunks, leveraging strong bidirectional priors for high-fidelity intra-chunk reasoning. To manage the critical challenge of coherence across chunk boundaries, we propose a learning-based hybrid memory module. This dual-component system combines a parametric Test-Time Training (TTT) memory to anchor the global coordinate frame and prevent scale drift, alongside a non-parametric Sliding Window Attention (SWA) mechanism to preserve uncompressed context for high-precision…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
