Born Again Neural Networks

Tommaso Furlanello; Zachary C. Lipton; Michael Tschannen; Laurent Itti; and Anima Anandkumar

arXiv:1805.04770·stat.ML·March 5, 2024·443 cites

Born Again Neural Networks

Tommaso Furlanello, Zachary C. Lipton, Michael Tschannen, Laurent Itti, and Anima Anandkumar

PDF

Open Access 2 Repos 1 Video

TL;DR

This paper introduces Born-Again Networks, a novel approach where models are trained to be identical to their teachers but surprisingly outperform them, achieving state-of-the-art results in vision tasks.

Contribution

The paper presents a new perspective on knowledge distillation by training identical models that surpass their teachers, demonstrating significant improvements in vision and language tasks.

Findings

01

Born-Again Networks outperform their teachers on CIFAR datasets.

02

State-of-the-art performance achieved with DenseNet-based BANs.

03

Two distillation objectives (CWTM and DKPP) highlight key components of knowledge transfer.

Abstract

Knowledge Distillation (KD) consists of transferring âknowledgeâ from one machine learning model (the teacher) to another (the student). Commonly, the teacher is a high-capacity model with formidable performance, while the student is more compact. By transferring knowledge, one hopes to benefit from the studentâs compactness, without sacrificing too much performance. We study KD from a new perspective: rather than compressing models, we train students parameterized identically to their teachers. Surprisingly, these Born-Again Networks (BANs), outperform their teachers significantly, both on computer vision and language modeling tasks. Our experiments with BANs based on DenseNets demonstrate state-of-the-art performance on the CIFAR-10 (3.5%) and CIFAR-100 (15.5%) datasets, by validation error. Additional experiments explore two distillation objectives: (i) Confidence-Weighted by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

[Live Machine Learning Research] Plain Self-Ensembles (I actually DISCOVER SOMETHING) - Part 1· youtube

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning