Snapshot Distillation: Teacher-Student Optimization in One Generation

Chenglin Yang; Lingxi Xie; Chi Su; Alan L. Yuille

arXiv:1812.00123·cs.CV·December 4, 2018·21 cites

Snapshot Distillation: Teacher-Student Optimization in One Generation

Chenglin Yang, Lingxi Xie, Chi Su, Alan L. Yuille

PDF

Open Access

TL;DR

Snapshot Distillation introduces a novel teacher-student training framework that enables effective model optimization within a single generation by extracting supervision signals from earlier epochs, improving accuracy without additional training time.

Contribution

It proposes the first framework for teacher-student optimization in one generation using snapshot-based signals, reducing training time while maintaining performance.

Findings

01

Achieves consistent accuracy improvements on CIFAR100 and ILSVRC2012.

02

Pre-trained models transfer well to object detection and segmentation tasks.

03

No significant increase in computational overhead.

Abstract

Optimizing a deep neural network is a fundamental task in computer vision, yet direct training methods often suffer from over-fitting. Teacher-student optimization aims at providing complementary cues from a model trained previously, but these approaches are often considerably slow due to the pipeline of training a few generations in sequence, i.e., time complexity is increased by several times. This paper presents snapshot distillation (SD), the first framework which enables teacher-student optimization in one generation. The idea of SD is very simple: instead of borrowing supervision signals from previous generations, we extract such information from earlier epochs in the same generation, meanwhile make sure that the difference between teacher and student is sufficiently large so as to prevent under-fitting. To achieve this goal, we implement SD in a cyclic learning rate policy, in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Machine Learning and Data Classification