Efficient Knowledge Distillation: Empowering Small Language Models with   Teacher Model Insights

Mohamad Ballout; Ulf Krumnack; Gunther Heidemann; Kai-Uwe; K\"uhnberger

arXiv:2409.12586·cs.CL·September 20, 2024·2 cites

Efficient Knowledge Distillation: Empowering Small Language Models with Teacher Model Insights

Mohamad Ballout, Ulf Krumnack, Gunther Heidemann, Kai-Uwe, K\"uhnberger

PDF

Open Access

TL;DR

This paper presents a simple knowledge distillation method that enhances small language models by using influential tokens identified by a large teacher model, improving performance across diverse datasets.

Contribution

Introduces a token-based knowledge distillation approach leveraging teacher model attributions to improve small language model performance.

Findings

01

Outperforms standard fine-tuning and state-of-the-art distillation methods.

02

Important tokens often align with ground truth in multiple-choice datasets.

03

Method is effective across four diverse datasets.

Abstract

Enhancing small language models for real-life application deployment is a significant challenge facing the research community. Due to the difficulties and costs of using large language models, researchers are seeking ways to effectively deploy task-specific small models. In this work, we introduce a simple yet effective knowledge distillation method to improve the performance of small language models. Our approach utilizes a teacher model with approximately 3 billion parameters to identify the most influential tokens in its decision-making process. These tokens are extracted from the input based on their attribution scores relative to the output, using methods like saliency maps. These important tokens are then provided as rationales to a student model, aiming to distill the knowledge of the teacher model. This method has proven to be effective, as demonstrated by testing it on four…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning · Topic Modeling · Online Learning and Analytics

MethodsKnowledge Distillation