AutoMixQ: Self-Adjusting Quantization for High Performance Memory-Efficient Fine-Tuning
Changhai Zhou, Shiyang Zhang, Yuhua Zhou, Zekai Liu, Shichao Weng

TL;DR
AutoMixQ is an innovative framework that optimizes layer-wise quantization in large language models, balancing memory efficiency and performance through lightweight modeling and Pareto optimality, especially under resource constraints.
Contribution
It introduces an end-to-end method for selecting optimal quantization per layer, improving resource efficiency and performance in fine-tuning large models.
Findings
AutoMixQ reduces memory usage significantly compared to baseline methods.
AutoMixQ achieves higher accuracy on benchmarks at given resource levels.
The framework effectively balances performance and memory through Pareto optimization.
Abstract
Fine-tuning large language models (LLMs) under resource constraints is a significant challenge in deep learning. Low-Rank Adaptation (LoRA), pruning, and quantization are all effective methods for improving resource efficiency. However, combining them directly often results in suboptimal performance, especially with uniform quantization across all model layers. This is due to the complex, uneven interlayer relationships introduced by pruning, necessitating more refined quantization strategies. To address this, we propose AutoMixQ, an end-to-end optimization framework that selects optimal quantization configurations for each LLM layer. AutoMixQ leverages lightweight performance models to guide the selection process, significantly reducing time and computational resources compared to exhaustive search methods. By incorporating Pareto optimality, AutoMixQ balances memory usage and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Memory and Neural Computing · Neural Networks and Reservoir Computing · Photonic and Optical Devices
MethodsPruning
