LLM-NEO: Parameter Efficient Knowledge Distillation for Large Language Models
Runming Yang, Taiqiang Wu, Jiahao Wang, Pengfei Hu, Yik-Chung Wu, Ngai, Wong, Yujiu Yang

TL;DR
This paper introduces LLM-NEO, a parameter-efficient knowledge distillation method that combines LoRA with KD to effectively compress large language models like Llama 2 and Llama 3.2, outperforming baselines.
Contribution
The paper reveals the connection between KD and LoRA and proposes a novel integrated approach, LLM-NEO, for more efficient knowledge transfer in LLM compression.
Findings
LLM-NEO outperforms various baselines on Llama 2 and Llama 3.2.
The method demonstrates robustness across LoRA variants.
Guidelines for hyperparameter tuning are summarized.
Abstract
Knowledge distillation (KD) has been a predominant method for compressing Large Language Models (LLMs). In this paper, we first revisit KD and Low-Rank Adaption (LoRA) and demonstrate that they follow the same paradigm. Inspired by this observation, we propose a parameter-efficient knowledge distillation method, LLM-NEO, which integrates LoRA into KD to improve the efficiency of knowledge transfer. After that, we summarize some valuable guidelines for the hyperparameters in LLM-NEO. Experimental results on compressing Llama 2 and Llama 3.2 show that LLM-NEO outperforms various baselines. Further analysis demonstrates the robustness of the proposed LLM-NEO on variants of LoRA. The code and trained models are available at [Github](https://github.com/yang3121099/LLM-Neo).
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
MethodsLLaMA · Knowledge Distillation
