LLM-NEO: Parameter Efficient Knowledge Distillation for Large Language   Models

Runming Yang; Taiqiang Wu; Jiahao Wang; Pengfei Hu; Yik-Chung Wu; Ngai; Wong; Yujiu Yang

arXiv:2411.06839·cs.CL·February 26, 2025

LLM-NEO: Parameter Efficient Knowledge Distillation for Large Language Models

Runming Yang, Taiqiang Wu, Jiahao Wang, Pengfei Hu, Yik-Chung Wu, Ngai, Wong, Yujiu Yang

PDF

Open Access 2 Repos 3 Models

TL;DR

This paper introduces LLM-NEO, a parameter-efficient knowledge distillation method that combines LoRA with KD to effectively compress large language models like Llama 2 and Llama 3.2, outperforming baselines.

Contribution

The paper reveals the connection between KD and LoRA and proposes a novel integrated approach, LLM-NEO, for more efficient knowledge transfer in LLM compression.

Findings

01

LLM-NEO outperforms various baselines on Llama 2 and Llama 3.2.

02

The method demonstrates robustness across LoRA variants.

03

Guidelines for hyperparameter tuning are summarized.

Abstract

Knowledge distillation (KD) has been a predominant method for compressing Large Language Models (LLMs). In this paper, we first revisit KD and Low-Rank Adaption (LoRA) and demonstrate that they follow the same paradigm. Inspired by this observation, we propose a parameter-efficient knowledge distillation method, LLM-NEO, which integrates LoRA into KD to improve the efficiency of knowledge transfer. After that, we summarize some valuable guidelines for the hyperparameters in LLM-NEO. Experimental results on compressing Llama 2 and Llama 3.2 show that LLM-NEO outperforms various baselines. Further analysis demonstrates the robustness of the proposed LLM-NEO on variants of LoRA. The code and trained models are available at [Github](https://github.com/yang3121099/LLM-Neo).

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsLLaMA · Knowledge Distillation