Stochastic Weight Averaging Revisited

Hao Guo; Jiyong Jin; Bin Liu

arXiv:2201.00519·cs.LG·September 20, 2022·1 cites

Stochastic Weight Averaging Revisited

Hao Guo, Jiyong Jin, Bin Liu

PDF

Open Access 1 Repo

TL;DR

This paper revisits stochastic weight averaging (SWA), analyzing its effects on neural network optimization, and introduces a new algorithm, PSWA, that leverages global geometric structures to improve model performance.

Contribution

The paper provides a detailed analysis of SWA's contributions, disentangles the effects of weight averaging and learning rate schedules, and proposes PSWA to better exploit loss landscape structures.

Findings

01

SWA helps discover wider optima but not always.

02

Weight averaging reduces variance in model weights.

03

PSWA outperforms standard SWA and SGD.

Abstract

Averaging neural network weights sampled by a backbone stochastic gradient descent (SGD) is a simple yet effective approach to assist the backbone SGD in finding better optima, in terms of generalization. From a statistical perspective, weight averaging (WA) contributes to variance reduction. Recently, a well-established stochastic weight averaging (SWA) method is proposed, which is featured by the application of a cyclical or high constant (CHC) learning rate schedule (LRS) in generating weight samples for WA. Then a new insight on WA appears, which states that WA helps to discover wider optima and then leads to better generalization. We conduct extensive experimental studies for SWA, involving a dozen modern DNN model structures and a dozen benchmark open-source image, graph, and text datasets. We disentangle contributions of the WA operation and the CHC LRS for SWA, showing that the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zjlab-ammi/pswa
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Brain Tumor Detection and Classification

MethodsStochastic Weight Averaging · Stochastic Gradient Descent