VEQ: Modality-Adaptive Quantization for MoE Vision-Language Models
Guangshuo Qin, Zhiteng Li, Zheng Chen, Weihang Zhang, Linghe Kong, Yulun Zhang

TL;DR
VEQ introduces a modality-adaptive quantization framework for MoE vision-language models, effectively reducing memory and computation costs while maintaining high accuracy by addressing cross-modal and expert heterogeneity.
Contribution
The paper proposes VEQ, a novel dual-aware quantization method that considers modality differences and expert heterogeneity, improving compression performance of MoE VLMs.
Findings
VEQ outperforms state-of-the-art quantization methods on multiple benchmarks.
Achieves over 2% accuracy improvement on Kimi-VL and Qwen3-VL.
Demonstrates robustness across various multimodal tasks.
Abstract
Mixture-of-Experts(MoE) Vision-Language Models (VLMs) offer remarkable performance but incur prohibitive memory and computational costs, making compression essential. Post-Training Quantization (PTQ) is an effective training-free technique to address the massive memory and computation overhead. Existing quantization paradigms fall short as they are oblivious to two critical forms of heterogeneity: the inherent discrepancy between vision and language tokens, and the non-uniform contribution of different experts. To bridge this gap, we propose Visual Expert Quantization (VEQ), a dual-aware quantization framework designed to simultaneously accommodate cross-modal differences and heterogeneity between experts. Specifically, VEQ incorporates 1)Modality-expert-aware Quantization, which utilizes expert activation frequency to prioritize error minimization for pivotal experts, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications
