There Are Many Consistent Explanations of Unlabeled Data: Why You Should Average
Ben Athiwaratkun, Marc Finzi, Pavel Izmailov, Andrew Gordon Wilson

TL;DR
This paper demonstrates that averaging weights during training with consistency regularization significantly improves semi-supervised learning performance, achieving state-of-the-art results on CIFAR datasets with limited labels.
Contribution
The authors introduce the use of Stochastic Weight Averaging (SWA) and fast-SWA to enhance semi-supervised learning with consistency regularization, leading to improved convergence and accuracy.
Findings
SWA improves convergence of consistency-based semi-supervised methods.
Achieved 5.0% error on CIFAR-10 with 4000 labels, surpassing previous results.
Fast-SWA accelerates training while maintaining high accuracy.
Abstract
Presently the most successful approaches to semi-supervised learning are based on consistency regularization, whereby a model is trained to be robust to small perturbations of its inputs and parameters. To understand consistency regularization, we conceptually explore how loss geometry interacts with training procedures. The consistency loss dramatically improves generalization performance over supervised-only training; however, we show that SGD struggles to converge on the consistency loss and continues to make large steps that lead to changes in predictions on the test data. Motivated by these observations, we propose to train consistency-based methods with Stochastic Weight Averaging (SWA), a recent approach which averages weights along the trajectory of SGD with a modified learning rate schedule. We also propose fast-SWA, which further accelerates convergence by averaging multiple…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications
MethodsStochastic Gradient Descent
