ZipMap: Linear-Time Stateful 3D Reconstruction via Test-Time Training

Haian Jin; Rundi Wu; Tianyuan Zhang; Ruiqi Gao; Jonathan T. Barron; Noah Snavely; Aleksander Holynski

arXiv:2603.04385·cs.CV·April 10, 2026

ZipMap: Linear-Time Stateful 3D Reconstruction via Test-Time Training

Haian Jin, Rundi Wu, Tianyuan Zhang, Ruiqi Gao, Jonathan T. Barron, Noah Snavely, Aleksander Holynski

PDF

TL;DR

ZipMap is a novel 3D reconstruction model that achieves linear-time, high-accuracy results by using test-time training layers to efficiently process large image collections in real-time.

Contribution

It introduces ZipMap, a stateful feed-forward model that significantly reduces computational complexity for 3D reconstruction while maintaining or improving accuracy.

Findings

01

ZipMap reconstructs over 700 frames in under 10 seconds on a single GPU.

02

It outperforms quadratic-time methods like VGGT by more than 20 times in speed.

03

ZipMap enables real-time scene-state querying and streaming reconstruction.

Abstract

Feed-forward transformer models have driven rapid progress in 3D vision, but state-of-the-art methods such as VGGT and $π^{3}$ have a computational cost that scales quadratically with the number of input images, making them inefficient when applied to large image collections. Sequential-reconstruction approaches reduce this cost but sacrifice reconstruction quality. We introduce ZipMap, a stateful feed-forward model that achieves linear-time, bidirectional 3D reconstruction while matching or surpassing the accuracy of quadratic-time methods. ZipMap employs test-time training layers to zip an entire image collection into a compact hidden scene state in a single forward pass, enabling reconstruction of over 700 frames in under 10 seconds on a single H100 GPU, more than $20 \times$ faster than state-of-the-art methods such as VGGT. Moreover, we demonstrate the benefits of having a stateful…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.