Advancing Multimodal Large Language Models with Quantization-Aware Scale   Learning for Efficient Adaptation

Jingjing Xie; Yuxin Zhang; Mingbao Lin; Liujuan Cao; Rongrong Ji

arXiv:2408.03735·cs.CV·August 8, 2024

Advancing Multimodal Large Language Models with Quantization-Aware Scale Learning for Efficient Adaptation

Jingjing Xie, Yuxin Zhang, Mingbao Lin, Liujuan Cao, Rongrong Ji

PDF

Open Access 1 Repo

TL;DR

This paper introduces QSLAW, a quantization-aware scale learning method that enables efficient adaptation of multimodal large language models, reducing resource consumption while maintaining or improving performance on vision-language tasks.

Contribution

The paper proposes a novel quantization-aware scale learning approach with multimodal warmup, enhancing resource efficiency and stability in vision-language instruction tuning of large models.

Findings

01

Models quantized by QSLAW match or outperform full-precision models.

02

Achieves up to 1.4x reduction in tuning time and GPU usage.

03

Demonstrates effective mitigation of quantization errors in multimodal models.

Abstract

This paper presents the first study to explore the potential of parameter quantization for multimodal large language models to alleviate the significant resource constraint encountered during vision-language instruction tuning. We introduce a Quantization-aware Scale LeArning method based on multimodal Warmup, termed QSLAW. This method is grounded in two key innovations: (1) The learning of group-wise scale factors for quantized LLM weights to mitigate the quantization error arising from activation outliers and achieve more effective vision-language instruction tuning; (2) The implementation of a multimodal warmup that progressively integrates linguistic and multimodal training samples, thereby preventing overfitting of the quantized model to multimodal data while ensuring stable adaptation of multimodal large language models to downstream vision-language tasks. Extensive experiments…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xjjxmu/qslaw
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Text and Document Classification Technologies · Multimodal Machine Learning Applications