Multi-granularity for knowledge distillation

Baitan Shao; Ying Chen

arXiv:2108.06681·cs.CV·September 6, 2021

Multi-granularity for knowledge distillation

Baitan Shao, Ying Chen

PDF

1 Repo

TL;DR

This paper introduces a multi-granularity knowledge distillation method that enhances student networks by providing diverse teaching patterns and robust supervision, leading to improved accuracy and robustness.

Contribution

It proposes a novel multi-granularity self-analyzing module and a stable excitation scheme for knowledge distillation, improving performance and robustness of student networks.

Findings

01

Achieves an average accuracy improvement of 0.58% over baselines.

02

Best performance improvement of 1.08% over baselines.

03

Enhances the student's fine-tuning ability and robustness to noisy inputs.

Abstract

Considering the fact that students have different abilities to understand the knowledge imparted by teachers, a multi-granularity distillation mechanism is proposed for transferring more understandable knowledge for student networks. A multi-granularity self-analyzing module of the teacher network is designed, which enables the student network to learn knowledge from different teaching patterns. Furthermore, a stable excitation scheme is proposed for robust supervision for the student training. The proposed distillation mechanism can be embedded into different distillation frameworks, which are taken as baselines. Experiments show the mechanism improves the accuracy by 0.58% on average and by 1.08% in the best over the baselines, which makes its performance superior to the state-of-the-arts. It is also exploited that the student's ability of fine-tuning and robustness to noisy inputs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

shaoeric/multi-granularity-distillation
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.