Teaching-Assistant-in-the-Loop: Improving Knowledge Distillation from   Imperfect Teacher Models in Low-Budget Scenarios

Yuhang Zhou; Wei Ai

arXiv:2406.05322·cs.CL·June 11, 2024·1 cites

Teaching-Assistant-in-the-Loop: Improving Knowledge Distillation from Imperfect Teacher Models in Low-Budget Scenarios

Yuhang Zhou, Wei Ai

PDF

Open Access 1 Video

TL;DR

This paper introduces a three-component framework that uses a teaching assistant model to improve knowledge distillation from imperfect large language models, especially in low-resource scenarios, by leveraging multiple signals including student self-consistency and confidence scoring.

Contribution

It proposes a novel teaching assistant framework with a two-stage training process that enhances sample efficiency and robustness in distillation from imperfect teachers.

Findings

01

Achieves up to 20.79% relative improvement in complex reasoning tasks.

02

Effectively utilizes multiple signals to improve student model training.

03

Demonstrates superiority over standard fine-tuning methods.

Abstract

There is increasing interest in distilling task-specific knowledge from large language models (LLM) to smaller student models. Nonetheless, LLM distillation presents a dual challenge: 1) there is a high cost associated with querying the teacher LLM, such as GPT-4, for gathering an ample number of demonstrations; 2) the teacher LLM might provide imperfect outputs with a negative impact on the student's learning process. To enhance sample efficiency within resource-constrained, imperfect teacher scenarios, we propose a three-component framework leveraging three signal types. The first signal is the student's self-consistency (consistency of student multiple outputs), which is a proxy of the student's confidence. Specifically, we introduce a ``teaching assistant'' (TA) model to assess the uncertainty of both the student's and the teacher's outputs via confidence scoring, which serves as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Teaching-Assistant-in-the-Loop: Improving Knowledge Distillation from Imperfect Teacher Models in Low-Budget Scenarios· underline

Taxonomy

TopicsOnline Learning and Analytics

MethodsAttention Is All You Need · Softmax · Layer Normalization · Linear Layer · Byte Pair Encoding · Label Smoothing · Adam · Residual Connection · Multi-Head Attention · Position-Wise Feed-Forward Layer