Not All Tasks Quantize Equally: Fisher-Guided Quantization for Visual Geometry Transformer
Yipu Zhang, Jintao Cheng, Weilun Feng, Jiehao Luo, Chuanguang Yang, Zhulin An, Yongjun Xu, Wei Zhang

TL;DR
This paper introduces Fisher-Guided Quantization (FGQ), a novel method that uses Fisher information to adaptively calibrate 3D vision transformer models, significantly reducing quantization errors across multiple tasks.
Contribution
FGQ leverages Fisher information to quantify task-specific sensitivities, enabling more effective calibration of 3D models for lower-bit quantization.
Findings
FGQ outperforms state-of-the-art baselines on VGGT models.
Achieves up to 39% relative improvement under 4-bit quantization.
Effectively preserves accuracy across multiple geometric tasks.
Abstract
Feed-forward 3D reconstruction models, represented by Visual Geometry Grounded Transformer (VGGT), jointly predict multiple visual geometry tasks such as depth estimation, camera pose prediction, and point cloud reconstruction in a single forward pass. They have been widely adopted in 3D vision applications, but their billion-scale parameters bring substantial memory and computation overhead, posing challenges for on-device deployment. Post-Training Quantization (PTQ) is an effective technique to reduce this overhead. Existing PTQ methods for feed-forward 3D models mainly focus on handling heavy-tailed activation distributions and constructing diverse calibration datasets. However, we observe that feed-forward 3D models predict multiple geometric attributes through a shared backbone, where different transformer blocks and hidden channels contribute distinctly to each task, resulting in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
