AVGGT: Rethinking Global Attention for Accelerating VGGT
Xianbing Sun, Zhikai Zhu, Zhengyu Lou, Bo Yang, Jinyang Tang, Liqing Zhang, He Wang, Jianfu Zhang

TL;DR
This paper analyzes the role of global attention in VGGT and $ ext{pi}^3$ models, revealing their layered functions, and proposes a training-free acceleration scheme that significantly speeds up inference while maintaining accuracy.
Contribution
The paper provides a systematic analysis of global attention modules in multi-view 3D models and introduces a novel, training-free acceleration method based on insights from this analysis.
Findings
Achieves 2x to 10x inference speedup across different frame lengths.
Maintains or slightly improves model accuracy compared to original models.
Robustly performs in dense multi-view settings where sparse attention baselines fail.
Abstract
Models such as VGGT and have shown strong multi-view 3D performance, but their heavy reliance on global self-attention results in high computational cost. Existing sparse-attention variants offer partial speedups, yet lack a systematic analysis of how global attention contributes to multi-view reasoning. In this paper, we first conduct an in-depth investigation of the global attention modules in VGGT and to better understand their roles. Our analysis reveals a clear division of roles in the alternating global-frame architecture: early global layers do not form meaningful correspondences, middle layers perform cross-view alignment, and last layers provide only minor refinements. Guided by these findings, we propose a training-free two-step acceleration scheme: (1) converting early global layers into frame attention, and (2) subsampling global attention by subsampling K/V…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · Advanced Neural Network Applications · Generative Adversarial Networks and Image Synthesis
