EoRA: Fine-tuning-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation
Shih-Yang Liu, Maksim Khadkevich, Nai Chit Fung, Charbel Sakr, Chao-Han Huck Yang, Chien-Yi Wang, Saurav Muralidharan, Hongxu Yin, Kwang-Ting Cheng, Jan Kautz, Yu-Chiang Frank Wang, Pavlo Molchanov, Min-Hung Chen

TL;DR
EoRA is a fine-tuning-free method that enhances compressed large language models by adding low-rank matrices, significantly improving accuracy and inference speed without hardware constraints.
Contribution
EoRA introduces a novel low-rank augmentation technique for compressed LLMs that surpasses prior methods and includes an optimized CUDA kernel for faster inference.
Findings
Achieves up to 11.45% accuracy improvement on GSM8K for compressed LLaMA3-8B.
Accelerates inference by up to 1.4x with an optimized CUDA kernel.
Reduces memory overhead through quantization of EoRA.
Abstract
While post-training compression techniques effectively reduce the memory footprint, latency, and power consumption of Large Language Models (LLMs), they often result in noticeable accuracy degradation and remain limited by hardware and kernel constraints that restrict supported compression formats - ultimately reducing flexibility across a wide range of deployment scenarios. In this work, we propose EoRA - a novel method that augments compressed LLMs with low-rank matrices, allowing users to rapidly enhance task-specific performance and freely balance the trade-off between accuracy and computational overhead beyond the constraints of compression formats. EoRA consistently outperforms prior fine-tuning-free low rank methods in recovering the accuracy of compressed LLMs, achieving notable accuracy improvements (e.g., on ARC-Challenge,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBlind Source Separation Techniques · Sparse and Compressive Sensing Techniques · Advanced Image Processing Techniques
