Sketch to Adapt: Fine-Tunable Sketches for Efficient LLM Adaptation
Tianyi Zhang, Junda Su, Aditya Desai, Oscar Wu, Zhaozhuo Xu, Anshumali Shrivastava

TL;DR
SketchTune introduces a novel data compression-based adaptation method for large language models that improves efficiency and performance by using sketching techniques instead of traditional low-rank assumptions.
Contribution
It proposes SketchTune, a unified framework that compresses LLM weights into fine-tunable sketches, enabling faster, more memory-efficient adaptation without complex two-path computation.
Findings
Outperforms existing PEFT methods on diverse tasks.
Uses significantly smaller models with comparable or better accuracy.
Achieves 14.48% higher accuracy on GSM8K with fewer trainable parameters.
Abstract
Adapting pre-trained large language models (LLMs) is crucial but challenging due to their enormous size. Parameter-efficient fine-tuning (PEFT) techniques typically employ additive adapters applied to frozen model weights. To further reduce memory usage, model weights are often compressed through quantization. However, existing PEFT methods often yield suboptimal model quality because they rely on restrictive assumptions, such as low-rank constraints on adapters to limit the number of trainable parameters. We find that sketching, a popular data compression technique, can serve as an efficient LLM adaptation strategy while avoiding the low-rank assumption. We introduce SketchTune, a compressive adaptation strategy that compresses LLM weights into compact fine-tunable sketches, integrating compression and adaptation into a unified framework. This integration eliminates the need for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Speech Recognition and Synthesis · Machine Learning in Healthcare
