ComPEFT: Compression for Communicating Parameter Efficient Updates via Sparsification and Quantization

Prateek Yadav; Leshem Choshen; Colin Raffel; Mohit Bansal

arXiv:2311.13171·cs.LG·August 12, 2025·2 cites

ComPEFT: Compression for Communicating Parameter Efficient Updates via Sparsification and Quantization

Prateek Yadav, Leshem Choshen, Colin Raffel, Mohit Bansal

PDF

Open Access 1 Repo

TL;DR

ComPEFT introduces a compression method for PEFT models using sparsification and quantization, significantly reducing size while maintaining or improving performance, enabling efficient communication and deployment of large language models.

Contribution

The paper presents ComPEFT, a novel compression technique for PEFT models that does not require retraining and achieves high compression ratios with preserved or enhanced performance.

Findings

01

Achieves 8x-50x compression across various models.

02

Outperforms QLoRA with 26x smaller size on LLaMA.

03

Maintains few-shot generalization and improves with model scale.

Abstract

Parameter-efficient fine-tuning (PEFT) techniques make it possible to efficiently adapt a language model to create "expert" models that specialize to new tasks or domains. Recent techniques in model merging and compositional generalization leverage these expert models by dynamically composing modules to improve zero/few-shot generalization. Despite the efficiency of PEFT methods, the size of expert models can make it onerous to retrieve expert models per query over high-latency networks like the Internet or serve multiple experts on a single GPU. To address these issues, we present ComPEFT, a novel method for compressing fine-tuning residuals (task vectors) of PEFT based models. ComPEFT employs sparsification and ternary quantization to reduce the size of the PEFT module without performing any additional retraining while preserving or enhancing model performance. In extensive evaluation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

prateeky2806/compeft
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications

MethodsGated Linear Unit · Refunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Attention Dropout · Residual Connection · Inverse Square Root Schedule · Byte Pair Encoding · Layer Normalization