Compress to Impress: Efficient LLM Adaptation Using a Single Gradient Step on 100 Samples
Shiva Sreeram, Alaa Maalouf, Pratyusha Sharma, Daniela Rus

TL;DR
This paper introduces a rapid, fine-tuning-free method for adapting large language models to new tasks by using a single gradient step on just 100 samples, significantly reducing computational overhead.
Contribution
The authors propose a novel, efficient adaptation technique that identifies key matrices for pruning using gradient signals, enabling fast LLM adaptation without full fine-tuning.
Findings
Reducing the search to a small subset of matrices improves efficiency.
Gradient of singular values indicates which matrices to prune.
Evaluation on 100 samples suffices for effective adaptation.
Abstract
Recently, Sharma et al. suggested a method called Layer-SElective-Rank reduction (LASER) which demonstrated that pruning high-order components of carefully chosen LLM's weight matrices can boost downstream accuracy -- without any gradient-based fine-tuning. Yet LASER's exhaustive, per-matrix search (each requiring full-dataset forward passes) makes it impractical for rapid deployment. We demonstrate that this overhead can be removed and find that: (i) Only a small, carefully chosen subset of matrices needs to be inspected -- eliminating the layer-by-layer sweep, (ii) The gradient of each matrix's singular values pinpoints which matrices merit reduction, (iii) Increasing the factorization search space by allowing matrices rows to cluster around multiple subspaces and then decomposing each cluster separately further reduces overfitting on the original training data and further lifts…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Neural Network Applications · Generative Adversarial Networks and Image Synthesis · Stochastic Gradient Optimization Techniques
