Self-Distillation as Instance-Specific Label Smoothing

Zhilu Zhang; Mert R. Sabuncu

arXiv:2006.05065·cs.LG·October 23, 2020·52 cites

Self-Distillation as Instance-Specific Label Smoothing

Zhilu Zhang, Mert R. Sabuncu

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper explores how multi-generational self-distillation improves model generalization through increased prediction diversity, offering a new interpretation as instance-specific regularization and proposing a novel label smoothing method that enhances performance.

Contribution

It introduces a theoretical framework linking self-distillation to label smoothing via predictive diversity and presents a new instance-specific label smoothing technique that outperforms traditional methods.

Findings

01

Self-distillation increases prediction diversity, improving generalization.

02

Theoretical link between self-distillation and label smoothing.

03

Proposed method often outperforms classical label smoothing.

Abstract

It has been recently demonstrated that multi-generational self-distillation can improve generalization. Despite this intriguing observation, reasons for the enhancement remain poorly understood. In this paper, we first demonstrate experimentally that the improved performance of multi-generational self-distillation is in part associated with the increasing diversity in teacher predictions. With this in mind, we offer a new interpretation for teacher-student training as amortized MAP estimation, such that teacher predictions enable instance-specific regularization. Our framework allows us to theoretically relate self-distillation to label smoothing, a commonly used technique that regularizes predictive uncertainty, and suggests the importance of predictive diversity in addition to predictive uncertainty. We present experimental results using multiple datasets and neural network…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ZhiluZhang123/neurips_2020_distillation
pytorch

Videos

Self-Distillation as Instance-Specific Label Smoothing· slideslive

Taxonomy

TopicsMachine Learning and Data Classification · Robotic Path Planning Algorithms

MethodsLabel Smoothing