Assigning Distinct Roles to Quantized and Low-Rank Matrices Toward Optimal Weight Decomposition
Yoonjun Cho, Soeun Kim, Dongjae Jeon, Kyelim Lee, Beomsoo Lee, Albert No

TL;DR
This paper proposes a novel initialization method for weight matrix decomposition that improves the balance between quantization and low-rank approximation, leading to better compression and performance of large language models.
Contribution
It introduces Outlier-Driven Low-Rank Initialization (ODLRI), a structured approach that enhances joint optimization by assigning specific roles to components, improving model compression.
Findings
Reduces activation-aware error in LLM weight decompositions.
Minimizes quantization scale while maintaining model accuracy.
Improves perplexity and zero-shot accuracy in low-bit models.
Abstract
Decomposing weight matrices into quantization and low-rank components () is a widely used technique for compressing large language models (LLMs). Existing joint optimization methods iteratively alternate between quantization and low-rank approximation. However, these methods tend to prioritize one component at the expense of the other, resulting in suboptimal decompositions that fail to leverage each component's unique strengths. In this work, we introduce Outlier-Driven Low-Rank Initialization (ODLRI), which assigns low-rank components the specific role of capturing activation-sensitive weights. This structured decomposition mitigates outliers' negative impact on quantization, enabling more effective balance between quantization and low-rank approximation. Experiments on Llama2 (7B, 13B, 70B), Llama3-8B, and Mistral-7B demonstrate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMedical Image Segmentation Techniques · Face and Expression Recognition · Color Science and Applications
