BD-KD: Balancing the Divergences for Online Knowledge Distillation

Ibtihel Amara; Nazanin Sepahvand; Brett H. Meyer; Warren J. Gross and; James J. Clark

arXiv:2212.12965·cs.CV·December 17, 2024

BD-KD: Balancing the Divergences for Online Knowledge Distillation

Ibtihel Amara, Nazanin Sepahvand, Brett H. Meyer, Warren J. Gross and, James J. Clark

PDF

Open Access

TL;DR

BD-KD introduces a balanced divergence approach for online knowledge distillation that improves both accuracy and calibration of compact models without additional post-processing, suitable for edge devices.

Contribution

The paper proposes BD-KD, a novel online KD framework that balances divergence losses to enhance model calibration and accuracy simultaneously, eliminating the need for post-hoc calibration.

Findings

01

Improved calibration and accuracy across multiple datasets.

02

Effective sample-wise weighting of divergence losses.

03

Outperforms recent online KD methods.

Abstract

We address the challenge of producing trustworthy and accurate compact models for edge devices. While Knowledge Distillation (KD) has improved model compression in terms of achieving high accuracy performance, calibration of these compact models has been overlooked. We introduce BD-KD (Balanced Divergence Knowledge Distillation), a framework for logit-based online KD. BD-KD enhances both accuracy and model calibration simultaneously, eliminating the need for post-hoc recalibration techniques, which add computational overhead to the overall training pipeline and degrade performance. Our method encourages student-centered training by adjusting the conventional online distillation loss on both the student and teacher losses, employing sample-wise weighting of forward and reverse Kullback-Leibler divergence. This strategy balances student network confidence and boosts performance.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Image Enhancement Techniques · Advanced Image Processing Techniques

MethodsKnowledge Distillation