Advancing Multimodal Large Language Models with Quantization-Aware Scale Learning for Efficient Adaptation
Jingjing Xie, Yuxin Zhang, Mingbao Lin, Liujuan Cao, Rongrong Ji

TL;DR
This paper introduces QSLAW, a quantization-aware scale learning method that enables efficient adaptation of multimodal large language models, reducing resource consumption while maintaining or improving performance on vision-language tasks.
Contribution
The paper proposes a novel quantization-aware scale learning approach with multimodal warmup, enhancing resource efficiency and stability in vision-language instruction tuning of large models.
Findings
Models quantized by QSLAW match or outperform full-precision models.
Achieves up to 1.4x reduction in tuning time and GPU usage.
Demonstrates effective mitigation of quantization errors in multimodal models.
Abstract
This paper presents the first study to explore the potential of parameter quantization for multimodal large language models to alleviate the significant resource constraint encountered during vision-language instruction tuning. We introduce a Quantization-aware Scale LeArning method based on multimodal Warmup, termed QSLAW. This method is grounded in two key innovations: (1) The learning of group-wise scale factors for quantized LLM weights to mitigate the quantization error arising from activation outliers and achieve more effective vision-language instruction tuning; (2) The implementation of a multimodal warmup that progressively integrates linguistic and multimodal training samples, thereby preventing overfitting of the quantized model to multimodal data while ensuring stable adaptation of multimodal large language models to downstream vision-language tasks. Extensive experiments…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Text and Document Classification Technologies · Multimodal Machine Learning Applications
