MASQuant: Modality-Aware Smoothing Quantization for Multimodal Large Language Models

Lulu Hu; Wenhu Xiao; Xin Chen; Xinhua Xu; Bowen Xu; Kun Li; Yongliang Tao

arXiv:2603.04800·cs.CV·March 6, 2026

MASQuant: Modality-Aware Smoothing Quantization for Multimodal Large Language Models

Lulu Hu, Wenhu Xiao, Xin Chen, Xinhua Xu, Bowen Xu, Kun Li, Yongliang Tao

PDF

Open Access

TL;DR

MASQuant introduces a modality-aware quantization framework for multimodal large language models, effectively addressing cross-modal invariance issues and improving post-training quantization stability and performance.

Contribution

It proposes a novel framework with modality-specific smoothing and cross-modal compensation, advancing quantization techniques for multimodal large language models.

Findings

01

MASQuant achieves stable quantization across dual- and tri-modal MLLMs.

02

It outperforms or matches state-of-the-art PTQ algorithms.

03

The method effectively addresses smoothing misalignment and cross-modal invariance issues.

Abstract

Post-training quantization (PTQ) with computational invariance for Large Language Models~(LLMs) have demonstrated remarkable advances, however, their application to Multimodal Large Language Models~(MLLMs) presents substantial challenges. In this paper, we analyze SmoothQuant as a case study and identify two critical issues: Smoothing Misalignment and Cross-Modal Computational Invariance. To address these issues, we propose Modality-Aware Smoothing Quantization (MASQuant), a novel framework that introduces (1) Modality-Aware Smoothing (MAS), which learns separate, modality-specific smoothing factors to prevent Smoothing Misalignment, and (2) Cross-Modal Compensation (CMC), which addresses Cross-modal Computational Invariance by using SVD whitening to transform multi-modal activation differences into low-rank forms, enabling unified quantization across modalities. MASQuant demonstrates…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications