TL;DR
Mem3R introduces a hybrid memory streaming model for 3D reconstruction that enhances long-sequence accuracy and efficiency using test-time training and fixed-size states.
Contribution
It proposes a novel hybrid memory architecture with implicit fast-weight memory and explicit token-based states, improving long-sequence performance and reducing model size.
Findings
Significantly improves long-sequence performance over CUT3R.
Reduces model size from 793M to 644M parameters.
Decreases Absolute Trajectory Error by up to 39% on long sequences.
Abstract
Streaming 3D perception is well suited to robotics and augmented reality, where long visual streams must be processed efficiently and consistently. Recent recurrent models offer a promising solution by maintaining fixed-size states and enabling linear-time inference, but they often suffer from drift accumulation and temporal forgetting over long sequences due to the limited capacity of compressed latent memories. We propose Mem3R, a streaming 3D reconstruction model with a hybrid memory design that decouples camera tracking from geometric mapping to improve temporal consistency over long sequences. For camera tracking, Mem3R employs an implicit fast-weight memory implemented as a lightweight Multi-Layer Perceptron updated via Test-Time Training. For geometric mapping, Mem3R maintains an explicit token-based fixed-size state. Compared with CUT3R, this design not only significantly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
