STAC: Plug-and-Play Spatio-Temporal Aware Cache Compression for Streaming 3D Reconstruction
Runze Wang, Yuxuan Song, Youcheng Cai, Ligang Liu

TL;DR
STAC introduces a novel cache compression framework for streaming 3D reconstruction, significantly reducing memory use and increasing speed while maintaining high reconstruction quality.
Contribution
The paper presents a spatio-temporally aware cache compression method that leverages attention sparsity in transformers for efficient 3D reconstruction.
Findings
Achieves nearly 10x memory reduction.
Speeds up inference by 4x.
Maintains state-of-the-art reconstruction quality.
Abstract
Online 3D reconstruction from streaming inputs requires both long-term temporal consistency and efficient memory usage. Although causal variants of VGGT address this challenge through a key-value (KV) cache mechanism, the cache grows linearly with the stream length, creating a major memory bottleneck. Under limited memory budgets, early cache eviction significantly degrades reconstruction quality and temporal consistency. In this work, we observe that attention in causal transformers for 3D reconstruction exhibits intrinsic spatio-temporal sparsity. Based on this insight, we propose STAC, a Spatio-Temporally Aware Cache Compression framework for streaming 3D reconstruction with large causal transformers. STAC consists of three key components: (1) a Working Temporal Token Caching mechanism that preserves long-term informative tokens using decayed cumulative attention scores; (2) a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
