CoSA: Compressed Sensing-Based Adaptation of Large Language Models
Songtao Wei, Yi Li, Bohan Zhang, Zhichun Guo, Ying Huang, Yuede Ji, Miao Yin, Guanpeng Li, Bingzhe Li

TL;DR
CoSA introduces a novel PEFT method based on compressed sensing theory, allowing more expressive and efficient adaptation of large language models without low-rank constraints, outperforming existing methods across diverse tasks.
Contribution
The paper proposes CoSA, a PEFT approach using fixed random projections and a learnable core, extending compressed sensing theory to improve model adaptation expressivity.
Findings
CoSA matches or outperforms state-of-the-art PEFT methods.
Effective across 10 diverse NLP tasks and multiple model scales.
Provides a theoretical foundation for efficient model adaptation.
Abstract
Parameter-Efficient Fine-Tuning (PEFT) has emerged as a practical paradigm for adapting large language models (LLMs) without updating all parameters. Most existing approaches, such as LoRA and PiSSA, rely on low-rank decompositions of weight updates. However, the low-rank assumption may restrict expressivity, particularly in task-specific adaptation scenarios where singular values are distributed relatively uniformly. To address this limitation, we propose CoSA (Compressed Sensing-Based Adaptation), a new PEFT method extended from compressed sensing theory. Instead of constraining weight updates to a low-rank subspace, CoSA expresses them through fixed random projection matrices and a compact learnable core. We provide a formal theoretical analysis of CoSA as a synthesis process, proving that weight updates can be compactly encoded into a low-dimensional space and mapped back through…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
1. The observation that Low rank is not always a good hypothesis is valuable (although it appears in some recent works) 2. The paper is well written and generally a good read with discussion around compressed sensing,etc
1. Lack of baselines ( and hence related work) (my main concern is this) The experiments are okay (benchmark wise) but are lacking baseline wise. For instance, very similar and more recent PEFT baselines are excluded. SketchTune, for instance is also based on sketching matrices (a special case of projection matrices which also have RIP property). Also, some other baselines such as S2FT etc are missing. It is important to compare against these methods to ensure that we are indeed making progres
1. Viewing the PEFT problem through the lens of compressed sensing is an interesting and novel perspective. 2. The writing and presentation is clear. 3. The experiments are comprehensive and include tasks of different domains.
1. The proposed approach substantially overlaps with existing methods such as Tied-LoRA and VeRA [1,2], yet the paper makes no mention of them. Both Tied-LoRA and VeRA also employ frozen random matrices as down- and up-projection matrices, making it unclear how CoSA differs conceptually or empirically from these prior works. 2. The claim of O(1) complexity for CoSA in Table 1 appears inaccurate. Given the formulation, the complexity should be O(ab). 3. The method assumes that the target weight u
+ This paper is overall well-written and clearly-presented, making the readers easy to follow. + The proposed method shows a clear parameter and memory benefits over LoRA, AdaLoRA, PiSSA. + The ablation study is extensive.
- The technique soundness is open to doubt, at least in its current form. For example, the framing is not tied to an actual sparsity prior or to constraints. Besides, there is no theory level proof to justify the stability guarantees. - The core idea to fix random $L$, $R$ and learn a compact core is not sufficiently distinguished from VeRA and/or other related random-projection PEFT methods, making the contribution to the community difficult to justify. - This paper does not provide a theory-
- The idea of applying compressed sensing to parameter-efficient fine-tuning is interesting. - The paper is clearly written and easy to follow. - There is an effort to ground the method in compressed sensing theory, including the use of the Restricted Isometry Property (RIP).
- The proposed method lacks novelty. Several works have already explored tri-matrix adapter structures. In particular, TLoRA [1] (in arxiv) presents a structurally identical approach, using frozen random matrices $A$ and $C$, and a learnable small matrix $B$. Additionally, PMSS [2], which trains frozen A, B, and learnable cores in the same way, but with different initialization methods, was proposed in COLING'25. The authors did not provide a sufficient comparison of these tri-matrix adapters.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Topic Modeling · Generative Adversarial Networks and Image Synthesis
