Spark3R: Asymmetric Token Reduction Makes Fast Feed-Forward 3D Reconstruction

Zecheng Tang; Jiaye Fu; Qiankun Gao; Haijie Li; Yanmin Wu; Jiaqi Zhang; Siwei Ma; Jian Zhang

arXiv:2605.06270·cs.CV·May 20, 2026

Spark3R: Asymmetric Token Reduction Makes Fast Feed-Forward 3D Reconstruction

Zecheng Tang, Jiaye Fu, Qiankun Gao, Haijie Li, Yanmin Wu, Jiaqi Zhang, Siwei Ma, Jian Zhang

PDF

TL;DR

Spark3R introduces a novel asymmetric token reduction method for faster 3D reconstruction models, significantly improving efficiency without retraining by decoupling token compression based on their roles.

Contribution

It proposes a training-free, asymmetric token reduction framework that adaptively compresses query and key-value tokens differently, enhancing speed while preserving quality.

Findings

01

Achieves up to 28x speedup on 1000-frame inputs.

02

Maintains competitive reconstruction quality across multiple models.

03

Integrates seamlessly with pretrained models without retraining.

Abstract

Feed-forward 3D reconstruction models based on Vision Transformers can directly estimate scene geometry and camera poses from a small set of input images, but scaling them to video inputs with hundreds or thousands of frames remains challenging due to the quadratic cost of global attention layers. Recent token-merging methods accelerate these models by compressing the token sequence within the global attention layers, but they apply a uniform reduction to query tokens and key-value tokens, ignoring their functionally distinct roles in 3D reconstruction. In this work, we identify a key property of feed-forward 3D reconstruction models: query tokens encode view-specific geometric requests and are sensitive to compression, while key-value tokens represent shared scene context and tolerate aggressive compression. Guided by this insight, we propose Spark3R, a training-free acceleration…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.