Isotonic Data Augmentation for Knowledge Distillation
Wanyun Cui, Sen Yan

TL;DR
This paper introduces isotonic data augmentation (IDA), a method that uses isotonic regression to correct order violations in soft labels during knowledge distillation, improving accuracy.
Contribution
We propose IDA, a novel data augmentation technique that applies isotonic regression to enforce label order consistency, enhancing knowledge transfer in distillation.
Findings
IDAs effectively eliminate label order violations.
IDAs improve the accuracy of knowledge distillation.
Proposed algorithms are efficient and GPU-friendly.
Abstract
Knowledge distillation uses both real hard labels and soft labels predicted by teacher models as supervision. Intuitively, we expect the soft labels and hard labels to be concordant w.r.t. their orders of probabilities. However, we found critical order violations between hard labels and soft labels in augmented samples. For example, for an augmented sample , we expect the order of meaningful soft labels to be . But real soft labels usually violate the order, e.g. . We attribute this to the unsatisfactory generalization ability of the teacher, which leads to the prediction error of augmented samples. Empirically, we found the violations are common and injure the knowledge transfer. In this paper, we introduce order restrictions to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Machine Learning and Algorithms · Domain Adaptation and Few-Shot Learning
MethodsKnowledge Distillation
