Spark3R: Asymmetric Token Reduction Makes Fast Feed-Forward 3D Reconstruction
Zecheng Tang, Jiaye Fu, Qiankun Gao, Haijie Li, Yanmin Wu, Jiaqi Zhang, Siwei Ma, Jian Zhang

TL;DR
Spark3R introduces a novel asymmetric token reduction method for faster 3D reconstruction models, significantly improving efficiency without retraining by decoupling token compression based on their roles.
Contribution
It proposes a training-free, asymmetric token reduction framework that adaptively compresses query and key-value tokens differently, enhancing speed while preserving quality.
Findings
Achieves up to 28x speedup on 1000-frame inputs.
Maintains competitive reconstruction quality across multiple models.
Integrates seamlessly with pretrained models without retraining.
Abstract
Feed-forward 3D reconstruction models based on Vision Transformers can directly estimate scene geometry and camera poses from a small set of input images, but scaling them to video inputs with hundreds or thousands of frames remains challenging due to the quadratic cost of global attention layers. Recent token-merging methods accelerate these models by compressing the token sequence within the global attention layers, but they apply a uniform reduction to query tokens and key-value tokens, ignoring their functionally distinct roles in 3D reconstruction. In this work, we identify a key property of feed-forward 3D reconstruction models: query tokens encode view-specific geometric requests and are sensitive to compression, while key-value tokens represent shared scene context and tolerate aggressive compression. Guided by this insight, we propose Spark3R, a training-free acceleration…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
