KD-LoRA: A Hybrid Approach to Efficient Fine-Tuning with LoRA and   Knowledge Distillation

Rambod Azimi; Rishav Rishav; Marek Teichmann; Samira Ebrahimi Kahou

arXiv:2410.20777·cs.CL·October 29, 2024

KD-LoRA: A Hybrid Approach to Efficient Fine-Tuning with LoRA and Knowledge Distillation

Rambod Azimi, Rishav Rishav, Marek Teichmann, Samira Ebrahimi Kahou

PDF

Open Access 1 Repo

TL;DR

KD-LoRA is a novel fine-tuning method combining LoRA and knowledge distillation, achieving near full fine-tuning performance with significantly reduced resource consumption across multiple models and benchmarks.

Contribution

This work introduces KD-LoRA, a hybrid approach that effectively combines LoRA and knowledge distillation for efficient LLM fine-tuning, maintaining high performance with lower resource requirements.

Findings

01

KD-LoRA retains 98% of LoRA's performance on GLUE.

02

KD-LoRA reduces GPU memory usage by 30%.

03

KD-LoRA decreases inference time by 30%.

Abstract

Large language models (LLMs) have demonstrated remarkable performance across various downstream tasks. However, the high computational and memory requirements of LLMs are a major bottleneck. To address this, parameter-efficient fine-tuning (PEFT) methods such as low-rank adaptation (LoRA) have been proposed to reduce computational costs while ensuring minimal loss in performance. Additionally, knowledge distillation (KD) has been a popular choice for obtaining compact student models from teacher models. In this work, we present KD-LoRA, a novel fine-tuning method that combines LoRA with KD. Our results demonstrate that KD-LoRA achieves performance comparable to full fine-tuning (FFT) and LoRA while significantly reducing resource requirements. Specifically, KD-LoRA retains 98% of LoRA's performance on the GLUE benchmark, while being 40% more compact. Additionally, KD-LoRA reduces GPU…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rambodazimi/kd-lora
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Robotics and Automated Systems

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Adam · Attention Dropout · Dropout · Knowledge Distillation · Weight Decay · Dense Connections · Layer Normalization · Residual Connection · Linear Warmup With Linear Decay