Adaptive Feature-based Low-Rank Compression of Large Language Models via Bayesian Optimization
Yixin Ji, Yang Xiang, Juntao Li, Qingrong Xia, Zi Ye, Xinyu Duan,, Zhefeng Wang, Kehai Chen, Min Zhang

TL;DR
This paper introduces a Bayesian optimization-based low-rank compression method for large language models, effectively balancing efficiency and performance by accurately estimating feature distributions and allocating low-rank dimensions.
Contribution
It proposes a novel low-rank compression technique tailored for LLMs, utilizing Bayesian optimization for optimal dimension allocation based on empirical feature distribution analysis.
Findings
Outperforms existing compression methods in maintaining model performance
Effectively estimates feature distributions using pooled covariance matrices
Achieves better compression-performance trade-off on LLaMA-2 models
Abstract
In recent years, large language models (LLMs) have driven advances in natural language processing. Still, their growing scale has increased the computational burden, necessitating a balance between efficiency and performance. Low-rank compression, a promising technique, reduces non-essential parameters by decomposing weight matrices into products of two low-rank matrices. Yet, its application in LLMs has not been extensively studied. The key to low-rank compression lies in low-rank factorization and low-rank dimensions allocation. To address the challenges of low-rank compression in LLMs, we conduct empirical research on the low-rank characteristics of large models. We propose a low-rank compression method suitable for LLMs. This approach involves precise estimation of feature distributions through pooled covariance matrices and a Bayesian optimization strategy for allocating low-rank…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
MethodsPruning
