Enhancing Knowledge Distillation for LLMs with Response-Priming   Prompting

Vijay Goyal; Mustafa Khan; Aprameya Tirupati; Harveer Saini; Michael; Lam; Kevin Zhu

arXiv:2412.17846·cs.CL·December 25, 2024

Enhancing Knowledge Distillation for LLMs with Response-Priming Prompting

Vijay Goyal, Mustafa Khan, Aprameya Tirupati, Harveer Saini, Michael, Lam, Kevin Zhu

PDF

Open Access 1 Repo

TL;DR

This paper introduces response-priming prompting strategies in knowledge distillation to improve the performance of smaller LLMs, demonstrating significant gains on the GSM8K benchmark through reasoning-eliciting prompts.

Contribution

It proposes novel response-priming prompting techniques integrated into KD, enhancing student LLM performance and analyzing attention behaviors for better understanding.

Findings

01

55% performance increase on GSM8K with prompting

02

Prompted models show positive attention head behaviors

03

Efficient distillation method for resource-constrained deployment

Abstract

Large language models (LLMs) have demonstrated remarkable performance across a wide range of natural language processing (NLP) tasks. However, these models are often difficult to deploy due to significant computational requirements and resource constraints. Knowledge distillation (KD) is an effective technique for transferring the performance of larger LLMs to smaller models. Traditional KD methods primarily focus on the direct output of the teacher model, with little emphasis on the role of prompting during knowledge transfer. In this paper, we propose a set of novel response-priming prompting strategies applied in the knowledge distillation pipeline to enhance the performance of student models. Our approach fine-tunes a smaller Llama 3.1 8B Instruct model by distilling knowledge from a quantized Llama 3.1 405B Instruct teacher model. We apply LoRA optimization and evaluate on the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

alonso130r/knowledge-distillation
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Stream Mining Techniques · Neural Networks and Applications · Fuzzy Logic and Control Systems

MethodsSoftmax · Attention Is All You Need · Sparse Evolutionary Training · Knowledge Distillation · Focus · LLaMA