Isotonic Data Augmentation for Knowledge Distillation

Wanyun Cui; Sen Yan

arXiv:2107.01412·cs.LG·July 7, 2021

Isotonic Data Augmentation for Knowledge Distillation

Wanyun Cui, Sen Yan

PDF

Open Access

TL;DR

This paper introduces isotonic data augmentation (IDA), a method that uses isotonic regression to correct order violations in soft labels during knowledge distillation, improving accuracy.

Contribution

We propose IDA, a novel data augmentation technique that applies isotonic regression to enforce label order consistency, enhancing knowledge transfer in distillation.

Findings

01

IDAs effectively eliminate label order violations.

02

IDAs improve the accuracy of knowledge distillation.

03

Proposed algorithms are efficient and GPU-friendly.

Abstract

Knowledge distillation uses both real hard labels and soft labels predicted by teacher models as supervision. Intuitively, we expect the soft labels and hard labels to be concordant w.r.t. their orders of probabilities. However, we found critical order violations between hard labels and soft labels in augmented samples. For example, for an augmented sample $x = 0.7 * p an d a + 0.3 * c a t$ , we expect the order of meaningful soft labels to be $P_{soft} (p an d a ∣ x) > P_{soft} (c a t ∣ x) > P_{soft} (o t h er ∣ x)$ . But real soft labels usually violate the order, e.g. $P_{soft} (t i g er ∣ x) > P_{soft} (p an d a ∣ x) > P_{soft} (c a t ∣ x)$ . We attribute this to the unsatisfactory generalization ability of the teacher, which leads to the prediction error of augmented samples. Empirically, we found the violations are common and injure the knowledge transfer. In this paper, we introduce order restrictions to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Machine Learning and Algorithms · Domain Adaptation and Few-Shot Learning

MethodsKnowledge Distillation