ZipMap: Linear-Time Stateful 3D Reconstruction via Test-Time Training
Haian Jin, Rundi Wu, Tianyuan Zhang, Ruiqi Gao, Jonathan T. Barron, Noah Snavely, Aleksander Holynski

TL;DR
ZipMap is a novel 3D reconstruction model that achieves linear-time, high-accuracy results by using test-time training layers to efficiently process large image collections in real-time.
Contribution
It introduces ZipMap, a stateful feed-forward model that significantly reduces computational complexity for 3D reconstruction while maintaining or improving accuracy.
Findings
ZipMap reconstructs over 700 frames in under 10 seconds on a single GPU.
It outperforms quadratic-time methods like VGGT by more than 20 times in speed.
ZipMap enables real-time scene-state querying and streaming reconstruction.
Abstract
Feed-forward transformer models have driven rapid progress in 3D vision, but state-of-the-art methods such as VGGT and have a computational cost that scales quadratically with the number of input images, making them inefficient when applied to large image collections. Sequential-reconstruction approaches reduce this cost but sacrifice reconstruction quality. We introduce ZipMap, a stateful feed-forward model that achieves linear-time, bidirectional 3D reconstruction while matching or surpassing the accuracy of quadratic-time methods. ZipMap employs test-time training layers to zip an entire image collection into a compact hidden scene state in a single forward pass, enabling reconstruction of over 700 frames in under 10 seconds on a single H100 GPU, more than faster than state-of-the-art methods such as VGGT. Moreover, we demonstrate the benefits of having a stateful…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
