PC-LoRA: Low-Rank Adaptation for Progressive Model Compression with Knowledge Distillation
Injoon Hwang, Haewon Park, Youngwan Lee, Jooyoung Yang, SunJae Maeng

TL;DR
PC-LoRA introduces a progressive method that combines model compression and fine-tuning by gradually replacing pre-trained weights with low-rank adapters, achieving high compression rates in vision and language models.
Contribution
It presents a novel progressive approach that removes pre-trained weights during training, enabling simultaneous model compression and fine-tuning with low-rank adapters.
Findings
Achieves over 94% parameter compression in vision models.
Attains over 93% parameter compression in language models.
Reduces FLOPs significantly while maintaining performance.
Abstract
Low-rank adaption (LoRA) is a prominent method that adds a small number of learnable parameters to the frozen pre-trained weights for parameter-efficient fine-tuning. Prompted by the question, ``Can we make its representation enough with LoRA weights solely at the final phase of finetuning without the pre-trained weights?'' In this work, we introduce Progressive Compression LoRA~(PC-LoRA), which utilizes low-rank adaptation (LoRA) to simultaneously perform model compression and fine-tuning. The PC-LoRA method gradually removes the pre-trained weights during the training process, eventually leaving only the low-rank adapters in the end. Thus, these low-rank adapters replace the whole pre-trained weights, achieving the goals of compression and fine-tuning at the same time. Empirical analysis across various models demonstrates that PC-LoRA achieves parameter and FLOPs compression rates of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Compression Techniques
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · WordPiece · Linear Warmup With Linear Decay · Adam · Attention Dropout · Weight Decay · Linear Layer · Multi-Head Attention · Dropout
