Born Again Neural Networks
Tommaso Furlanello, Zachary C. Lipton, Michael Tschannen, Laurent Itti, and Anima Anandkumar

TL;DR
This paper introduces Born-Again Networks, a novel approach where models are trained to be identical to their teachers but surprisingly outperform them, achieving state-of-the-art results in vision tasks.
Contribution
The paper presents a new perspective on knowledge distillation by training identical models that surpass their teachers, demonstrating significant improvements in vision and language tasks.
Findings
Born-Again Networks outperform their teachers on CIFAR datasets.
State-of-the-art performance achieved with DenseNet-based BANs.
Two distillation objectives (CWTM and DKPP) highlight key components of knowledge transfer.
Abstract
Knowledge Distillation (KD) consists of transferring âknowledgeâ from one machine learning model (the teacher) to another (the student). Commonly, the teacher is a high-capacity model with formidable performance, while the student is more compact. By transferring knowledge, one hopes to benefit from the studentâs compactness, without sacrificing too much performance. We study KD from a new perspective: rather than compressing models, we train students parameterized identically to their teachers. Surprisingly, these Born-Again Networks (BANs), outperform their teachers significantly, both on computer vision and language modeling tasks. Our experiments with BANs based on DenseNets demonstrate state-of-the-art performance on the CIFAR-10 (3.5%) and CIFAR-100 (15.5%) datasets, by validation error. Additional experiments explore two distillation objectives: (i) Confidence-Weighted by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
[Live Machine Learning Research] Plain Self-Ensembles (I actually DISCOVER SOMETHING) - Part 1· youtube
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning
