AutoMixQ: Self-Adjusting Quantization for High Performance   Memory-Efficient Fine-Tuning

Changhai Zhou; Shiyang Zhang; Yuhua Zhou; Zekai Liu; Shichao Weng

arXiv:2411.13814·cs.LG·November 22, 2024

AutoMixQ: Self-Adjusting Quantization for High Performance Memory-Efficient Fine-Tuning

Changhai Zhou, Shiyang Zhang, Yuhua Zhou, Zekai Liu, Shichao Weng

PDF

Open Access

TL;DR

AutoMixQ is an innovative framework that optimizes layer-wise quantization in large language models, balancing memory efficiency and performance through lightweight modeling and Pareto optimality, especially under resource constraints.

Contribution

It introduces an end-to-end method for selecting optimal quantization per layer, improving resource efficiency and performance in fine-tuning large models.

Findings

01

AutoMixQ reduces memory usage significantly compared to baseline methods.

02

AutoMixQ achieves higher accuracy on benchmarks at given resource levels.

03

The framework effectively balances performance and memory through Pareto optimization.

Abstract

Fine-tuning large language models (LLMs) under resource constraints is a significant challenge in deep learning. Low-Rank Adaptation (LoRA), pruning, and quantization are all effective methods for improving resource efficiency. However, combining them directly often results in suboptimal performance, especially with uniform quantization across all model layers. This is due to the complex, uneven interlayer relationships introduced by pruning, necessitating more refined quantization strategies. To address this, we propose AutoMixQ, an end-to-end optimization framework that selects optimal quantization configurations for each LLM layer. AutoMixQ leverages lightweight performance models to guide the selection process, significantly reducing time and computational resources compared to exhaustive search methods. By incorporating Pareto optimality, AutoMixQ balances memory usage and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Memory and Neural Computing · Neural Networks and Reservoir Computing · Photonic and Optical Devices

MethodsPruning