There Are Many Consistent Explanations of Unlabeled Data: Why You Should   Average

Ben Athiwaratkun; Marc Finzi; Pavel Izmailov; Andrew Gordon Wilson

arXiv:1806.05594·cs.LG·February 22, 2019·149 cites

There Are Many Consistent Explanations of Unlabeled Data: Why You Should Average

Ben Athiwaratkun, Marc Finzi, Pavel Izmailov, Andrew Gordon Wilson

PDF

Open Access 2 Repos

TL;DR

This paper demonstrates that averaging weights during training with consistency regularization significantly improves semi-supervised learning performance, achieving state-of-the-art results on CIFAR datasets with limited labels.

Contribution

The authors introduce the use of Stochastic Weight Averaging (SWA) and fast-SWA to enhance semi-supervised learning with consistency regularization, leading to improved convergence and accuracy.

Findings

01

SWA improves convergence of consistency-based semi-supervised methods.

02

Achieved 5.0% error on CIFAR-10 with 4000 labels, surpassing previous results.

03

Fast-SWA accelerates training while maintaining high accuracy.

Abstract

Presently the most successful approaches to semi-supervised learning are based on consistency regularization, whereby a model is trained to be robust to small perturbations of its inputs and parameters. To understand consistency regularization, we conceptually explore how loss geometry interacts with training procedures. The consistency loss dramatically improves generalization performance over supervised-only training; however, we show that SGD struggles to converge on the consistency loss and continues to make large steps that lead to changes in predictions on the test data. Motivated by these observations, we propose to train consistency-based methods with Stochastic Weight Averaging (SWA), a recent approach which averages weights along the trajectory of SGD with a modified learning rate schedule. We also propose fast-SWA, which further accelerates convergence by averaging multiple…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications

MethodsStochastic Gradient Descent