PC-LoRA: Low-Rank Adaptation for Progressive Model Compression with   Knowledge Distillation

Injoon Hwang; Haewon Park; Youngwan Lee; Jooyoung Yang; SunJae Maeng

arXiv:2406.09117·cs.CV·June 14, 2024·1 cites

PC-LoRA: Low-Rank Adaptation for Progressive Model Compression with Knowledge Distillation

Injoon Hwang, Haewon Park, Youngwan Lee, Jooyoung Yang, SunJae Maeng

PDF

Open Access

TL;DR

PC-LoRA introduces a progressive method that combines model compression and fine-tuning by gradually replacing pre-trained weights with low-rank adapters, achieving high compression rates in vision and language models.

Contribution

It presents a novel progressive approach that removes pre-trained weights during training, enabling simultaneous model compression and fine-tuning with low-rank adapters.

Findings

01

Achieves over 94% parameter compression in vision models.

02

Attains over 93% parameter compression in language models.

03

Reduces FLOPs significantly while maintaining performance.

Abstract

Low-rank adaption (LoRA) is a prominent method that adds a small number of learnable parameters to the frozen pre-trained weights for parameter-efficient fine-tuning. Prompted by the question, ``Can we make its representation enough with LoRA weights solely at the final phase of finetuning without the pre-trained weights?'' In this work, we introduce Progressive Compression LoRA~(PC-LoRA), which utilizes low-rank adaptation (LoRA) to simultaneously perform model compression and fine-tuning. The PC-LoRA method gradually removes the pre-trained weights during the training process, eventually leaving only the low-rank adapters in the end. Thus, these low-rank adapters replace the whole pre-trained weights, achieving the goals of compression and fine-tuning at the same time. Empirical analysis across various models demonstrates that PC-LoRA achieves parameter and FLOPs compression rates of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Compression Techniques

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · WordPiece · Linear Warmup With Linear Decay · Adam · Attention Dropout · Weight Decay · Linear Layer · Multi-Head Attention · Dropout