DOT: A Distillation-Oriented Trainer

Borui Zhao; Quan Cui; Renjie Song; Jiajun Liang

arXiv:2307.08436·cs.CV·July 18, 2023

DOT: A Distillation-Oriented Trainer

Borui Zhao, Quan Cui, Renjie Song, Jiajun Liang

PDF

Open Access 1 Repo

TL;DR

The paper introduces DOT, a new training method for knowledge distillation that separately optimizes task and distillation losses, leading to better convergence and improved model accuracy.

Contribution

DOT employs a gradient separation and momentum adjustment technique to effectively optimize both losses simultaneously, overcoming previous trade-offs.

Findings

01

DOT achieves a +2.59% accuracy on ImageNet-1k with ResNet50-MobileNetV1.

02

It improves loss convergence and model generalization.

03

Extensive experiments validate the effectiveness of DOT.

Abstract

Knowledge distillation transfers knowledge from a large model to a small one via task and distillation losses. In this paper, we observe a trade-off between task and distillation losses, i.e., introducing distillation loss limits the convergence of task loss. We believe that the trade-off results from the insufficient optimization of distillation loss. The reason is: The teacher has a lower task loss than the student, and a lower distillation loss drives the student more similar to the teacher, then a better-converged task loss could be obtained. To break the trade-off, we propose the Distillation-Oriented Trainer (DOT). DOT separately considers gradients of task and distillation losses, then applies a larger momentum to distillation loss to accelerate its optimization. We empirically prove that DOT breaks the trade-off, i.e., both losses are sufficiently optimized. Extensive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

megvii-research/mdistiller
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBrain Tumor Detection and Classification · COVID-19 diagnosis using AI · Advanced Neural Network Applications