KD-LoRA: A Hybrid Approach to Efficient Fine-Tuning with LoRA and Knowledge Distillation
Rambod Azimi, Rishav Rishav, Marek Teichmann, Samira Ebrahimi Kahou

TL;DR
KD-LoRA is a novel fine-tuning method combining LoRA and knowledge distillation, achieving near full fine-tuning performance with significantly reduced resource consumption across multiple models and benchmarks.
Contribution
This work introduces KD-LoRA, a hybrid approach that effectively combines LoRA and knowledge distillation for efficient LLM fine-tuning, maintaining high performance with lower resource requirements.
Findings
KD-LoRA retains 98% of LoRA's performance on GLUE.
KD-LoRA reduces GPU memory usage by 30%.
KD-LoRA decreases inference time by 30%.
Abstract
Large language models (LLMs) have demonstrated remarkable performance across various downstream tasks. However, the high computational and memory requirements of LLMs are a major bottleneck. To address this, parameter-efficient fine-tuning (PEFT) methods such as low-rank adaptation (LoRA) have been proposed to reduce computational costs while ensuring minimal loss in performance. Additionally, knowledge distillation (KD) has been a popular choice for obtaining compact student models from teacher models. In this work, we present KD-LoRA, a novel fine-tuning method that combines LoRA with KD. Our results demonstrate that KD-LoRA achieves performance comparable to full fine-tuning (FFT) and LoRA while significantly reducing resource requirements. Specifically, KD-LoRA retains 98% of LoRA's performance on the GLUE benchmark, while being 40% more compact. Additionally, KD-LoRA reduces GPU…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Robotics and Automated Systems
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Adam · Attention Dropout · Dropout · Knowledge Distillation · Weight Decay · Dense Connections · Layer Normalization · Residual Connection · Linear Warmup With Linear Decay
