Learning from Matured Dumb Teacher for Fine Generalization
HeeSeung Jung, Kangil Kim, Hoyong Kim, Jong-Hun Shin

TL;DR
This paper introduces a matured dumb teacher knowledge distillation method that conservatively transfers decision boundary hypotheses, leading to improved generalization in neural networks across multiple image classification datasets.
Contribution
It proposes a novel matured dumb teacher KD approach that enhances generalization by preserving decision boundary hypotheses without destroying trained information.
Findings
Consistent improvement in test performance across datasets
Finer generalization compared to existing methods
Stable results over hyperparameter grid search
Abstract
The flexibility of decision boundaries in neural networks that are unguided by training data is a well-known problem typically resolved with generalization methods. A surprising result from recent knowledge distillation (KD) literature is that random, untrained, and equally structured teacher networks can also vastly improve generalization performance. It raises the possibility of existence of undiscovered assumptions useful for generalization on an uncertain region. In this paper, we shed light on the assumptions by analyzing decision boundaries and confidence distributions of both simple and KD-based generalization methods. Assuming that a decision boundary exists to represent the most general tendency of distinction on an input sample space (i.e., the simplest hypothesis), we show the various limitations of methods when using the hypothesis. To resolve these limitations, we propose…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Machine Learning and Data Classification · Domain Adaptation and Few-Shot Learning
MethodsKnowledge Distillation · Convolution
