Enhancing Knowledge Distillation for LLMs with Response-Priming Prompting
Vijay Goyal, Mustafa Khan, Aprameya Tirupati, Harveer Saini, Michael, Lam, Kevin Zhu

TL;DR
This paper introduces response-priming prompting strategies in knowledge distillation to improve the performance of smaller LLMs, demonstrating significant gains on the GSM8K benchmark through reasoning-eliciting prompts.
Contribution
It proposes novel response-priming prompting techniques integrated into KD, enhancing student LLM performance and analyzing attention behaviors for better understanding.
Findings
55% performance increase on GSM8K with prompting
Prompted models show positive attention head behaviors
Efficient distillation method for resource-constrained deployment
Abstract
Large language models (LLMs) have demonstrated remarkable performance across a wide range of natural language processing (NLP) tasks. However, these models are often difficult to deploy due to significant computational requirements and resource constraints. Knowledge distillation (KD) is an effective technique for transferring the performance of larger LLMs to smaller models. Traditional KD methods primarily focus on the direct output of the teacher model, with little emphasis on the role of prompting during knowledge transfer. In this paper, we propose a set of novel response-priming prompting strategies applied in the knowledge distillation pipeline to enhance the performance of student models. Our approach fine-tunes a smaller Llama 3.1 8B Instruct model by distilling knowledge from a quantized Llama 3.1 405B Instruct teacher model. We apply LoRA optimization and evaluate on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Neural Networks and Applications · Fuzzy Logic and Control Systems
MethodsSoftmax · Attention Is All You Need · Sparse Evolutionary Training · Knowledge Distillation · Focus · LLaMA
