VEQ: Modality-Adaptive Quantization for MoE Vision-Language Models

Guangshuo Qin; Zhiteng Li; Zheng Chen; Weihang Zhang; Linghe Kong; Yulun Zhang

arXiv:2602.01037·cs.CV·February 3, 2026

VEQ: Modality-Adaptive Quantization for MoE Vision-Language Models

Guangshuo Qin, Zhiteng Li, Zheng Chen, Weihang Zhang, Linghe Kong, Yulun Zhang

PDF

Open Access

TL;DR

VEQ introduces a modality-adaptive quantization framework for MoE vision-language models, effectively reducing memory and computation costs while maintaining high accuracy by addressing cross-modal and expert heterogeneity.

Contribution

The paper proposes VEQ, a novel dual-aware quantization method that considers modality differences and expert heterogeneity, improving compression performance of MoE VLMs.

Findings

01

VEQ outperforms state-of-the-art quantization methods on multiple benchmarks.

02

Achieves over 2% accuracy improvement on Kimi-VL and Qwen3-VL.

03

Demonstrates robustness across various multimodal tasks.

Abstract

Mixture-of-Experts(MoE) Vision-Language Models (VLMs) offer remarkable performance but incur prohibitive memory and computational costs, making compression essential. Post-Training Quantization (PTQ) is an effective training-free technique to address the massive memory and computation overhead. Existing quantization paradigms fall short as they are oblivious to two critical forms of heterogeneity: the inherent discrepancy between vision and language tokens, and the non-uniform contribution of different experts. To bridge this gap, we propose Visual Expert Quantization (VEQ), a dual-aware quantization framework designed to simultaneously accommodate cross-modal differences and heterogeneity between experts. Specifically, VEQ incorporates 1)Modality-expert-aware Quantization, which utilizes expert activation frequency to prioritize error minimization for pivotal experts, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications