CALR: Corrective Adaptive Low-Rank Decomposition for Efficient Large Language Model Layer Compression
Muchammad Daniyal Kautsar, Afra Majida Hariono, Widyawan, Syukron Abu Ishaq Alfarozi, Kuntpong Woraratpanya

TL;DR
CALR introduces a novel low-rank decomposition method with a learnable correction module to effectively compress large language models while preserving their performance, enabling more practical deployment in resource-limited settings.
Contribution
It proposes CALR, a two-part compression technique combining SVD with a learnable correction module to better retain model functionality after compression.
Findings
Reduces model size by up to 51.77%.
Retains up to 90.42% of original performance.
Outperforms existing compression methods.
Abstract
Large Language Models (LLMs) present significant deployment challenges due to their immense size and computational requirements. Model compression techniques are essential for making these models practical for resource-constrained environments. A prominent compression strategy is low-rank factorization via Singular Value Decomposition (SVD) to reduce model parameters by approximating weight matrices. However, standard SVD focuses on minimizing matrix reconstruction error, often leading to a substantial loss of the model's functional performance. This performance degradation occurs because existing methods do not adequately correct for the functional information lost during compression. To address this gap, we introduce Corrective Adaptive Low-Rank Decomposition (CALR), a two-component compression approach. CALR combines a primary path of SVD-compressed layers with a parallel, learnable,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
