Adaptive Stochastic Weight Averaging
Caglar Demir, Arnab Sharma, Axel-Cyrille Ngonga Ngomo

TL;DR
Adaptive Stochastic Weight Averaging (ASWA) improves model generalization by selectively updating parameter averages based on validation performance, combining SWA with early stopping to address overfitting and underfitting issues.
Contribution
This work introduces ASWA, a novel method that adaptively updates model parameters only when validation performance improves, enhancing generalization over traditional SWA.
Findings
ASWA outperforms baseline models on 11 benchmark datasets.
ASWA achieves statistically significant improvements in generalization.
The method effectively balances overfitting and underfitting issues.
Abstract
Ensemble models often improve generalization performances in challenging tasks. Yet, traditional techniques based on prediction averaging incur three well-known disadvantages: the computational overhead of training multiple models, increased latency, and memory requirements at test time. To address these issues, the Stochastic Weight Averaging (SWA) technique maintains a running average of model parameters from a specific epoch onward. Despite its potential benefits, maintaining a running average of parameters can hinder generalization, as an underlying running model begins to overfit. Conversely, an inadequately chosen starting point can render SWA more susceptible to underfitting compared to an underlying running model. In this work, we propose Adaptive Stochastic Weight Averaging (ASWA) technique that updates a running average of model parameters, only when generalization performance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Manufacturing and Logistics Optimization
MethodsEarly Stopping · Stochastic Weight Averaging
