LoCa: Logit Calibration for Knowledge Distillation

Runming Yang; Taiqiang Wu; Yujiu Yang

arXiv:2409.04778·cs.CL·September 10, 2024

LoCa: Logit Calibration for Knowledge Distillation

Runming Yang, Taiqiang Wu, Yujiu Yang

PDF

Open Access

TL;DR

LoCa is a simple calibration method for logits in knowledge distillation that corrects mis-instruction issues and preserves dark knowledge, improving student model performance without extra parameters.

Contribution

Introduces Logit Calibration (LoCa), a parameter-free method that enhances knowledge distillation by correcting logits based on labels, addressing mis-instruction and preserving dark knowledge.

Findings

01

Improves accuracy in image classification tasks.

02

Enhances performance in text generation tasks.

03

Does not add extra model parameters.

Abstract

Knowledge Distillation (KD), aiming to train a better student model by mimicking the teacher model, plays an important role in model compression. One typical way is to align the output logits. However, we find a common issue named mis-instruction, that the student would be misled when the predictions based on teacher logits do not follow the labels. Meanwhile, there is other useful dark knowledge in the logits such as the class discriminability, which is vital for distillation. In this paper, we propose a simple yet effective Logit Calibration (LoCa) method, which calibrates the logits from the teacher model based on the ground-truth labels. The key insight is to correct the prediction (to address the mis-instruction issue) and maintain useful dark knowledge simultaneously. Our proposed LoCa does not require any additional parameters. Empirical results on image classification and text…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Database Systems and Queries · AI-based Problem Solving and Planning · Neural Networks and Applications

MethodsALIGN