Low-Rank Compression of Language Models via Differentiable Rank Selection
Sidhant Sundrani, Francesco Tudisco, and Pasquale Minervini

TL;DR
This paper introduces LLRC, a gradient-based, fine-tuning-free method for selecting optimal low-rank decompositions in large language models, improving compression efficiency and task performance without additional training.
Contribution
We propose LLRC, a novel differentiable rank selection method that optimizes singular value masks directly, outperforming existing heuristics and gradient-based approaches in model compression tasks.
Findings
LLRC outperforms heuristic and gradient-based methods across multiple tasks.
Our approach achieves higher accuracy at 20% compression rate on Llama-2-13B.
LLRC performs competitively with fine-tuning methods without additional training.
Abstract
Approaches for compressing large-language models using low-rank decomposition have made strides, particularly with the introduction of activation and loss-aware SVD, which improves the trade-off between decomposition rank and downstream task performance. Despite these advancements, a persistent challenge remains--selecting the optimal ranks for each layer to jointly optimise compression rate and downstream task accuracy. Current methods either rely on heuristics that can yield sub-optimal results due to their limited discrete search space or are gradient-based but are not as performant as heuristic approaches without post-compression fine-tuning. To address these issues, we propose Learning to Low-Rank Compress (LLRC), a gradient-based approach which directly learns the weights of masks that select singular values in a fine-tuning-free setting. Using a calibration dataset, we train only…
Peer Reviews
Decision·Submitted to ICLR 2025
- The learnable mask mechanism has a higher potential compared to heuristic algorithms. - The experimental results are positive, outperforming previous SVD methods
- sloppy formatting. The reference to the table is missing in line 452. AUTHOR CONTRIBUTIONS and ACKNOWLEDGMENTS sections have not been removed. The authors should check their manuscript more carefully. - cannot outperform LLM Pruner. LLM Pruner requires only a small amount of data (e.g., 128 articles) for fine-tuning to achieve better results. - no ablation study on multiple loss functions. The authors used three loss functions but did not verify their effects in the experiments.
1. Clear idea. The motivation is good and the solution is reasonable. 2. The writing is clear and easy to follow
My primary concern is the trade-off between compression and model quality. While the model achieves some compression, it sacrifices quality significantly. A 20% reduction in parameters leads to a notable degradation in performance. Although the paper attempts to show improvements over certain baselines, a more meaningful comparison would be with a non-compressed model of similar size. For instance, if the method can compress an 8B model to 3B while still outperforming a standard 3B model, it wou
1. The use of multi-objective loss functions, including distillation and total variation loss, helps LLRC retain high performance even at high compression rates, making it effective for deployment in resource-constrained environments. 2. By freezing the main model weights and only learning the mask layer, LLRC reduces the computational burden during training, making it more efficient than traditional compression methods.
1. The method is similar to structured pruning approaches, such as Sheared LLaMA (https://arxiv.org/abs/2310.06694), yet these works are neither cited nor compared against, which limits the paper’s contextual grounding. 2. When comparing with pruning and distillation methods, the paper does not choose the most competitive or state-of-the-art approaches, making it unclear how LLRC performs against the best available compression techniques. 3.Based on my own empirical experience, datasets like B
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
