CALR: Corrective Adaptive Low-Rank Decomposition for Efficient Large Language Model Layer Compression

Muchammad Daniyal Kautsar; Afra Majida Hariono; Widyawan; Syukron Abu Ishaq Alfarozi; Kuntpong Woraratpanya

arXiv:2508.16680·cs.LG·August 27, 2025

CALR: Corrective Adaptive Low-Rank Decomposition for Efficient Large Language Model Layer Compression

Muchammad Daniyal Kautsar, Afra Majida Hariono, Widyawan, Syukron Abu Ishaq Alfarozi, Kuntpong Woraratpanya

PDF

TL;DR

CALR introduces a novel low-rank decomposition method with a learnable correction module to effectively compress large language models while preserving their performance, enabling more practical deployment in resource-limited settings.

Contribution

It proposes CALR, a two-part compression technique combining SVD with a learnable correction module to better retain model functionality after compression.

Findings

01

Reduces model size by up to 51.77%.

02

Retains up to 90.42% of original performance.

03

Outperforms existing compression methods.

Abstract

Large Language Models (LLMs) present significant deployment challenges due to their immense size and computational requirements. Model compression techniques are essential for making these models practical for resource-constrained environments. A prominent compression strategy is low-rank factorization via Singular Value Decomposition (SVD) to reduce model parameters by approximating weight matrices. However, standard SVD focuses on minimizing matrix reconstruction error, often leading to a substantial loss of the model's functional performance. This performance degradation occurs because existing methods do not adequately correct for the functional information lost during compression. To address this gap, we introduce Corrective Adaptive Low-Rank Decomposition (CALR), a two-component compression approach. CALR combines a primary path of SVD-compressed layers with a parallel, learnable,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.