Stable Anderson Acceleration for Deep Learning
Massimiliano Lupo Pasini, Junqi Yin, Viktor Reshniak, Miroslav, Stoyanov

TL;DR
This paper introduces a stabilized Anderson acceleration method combined with an adaptive moving average to improve convergence speed in deep learning training, especially under stochastic oscillations caused by mini-batch sampling.
Contribution
It proposes a novel combination of Anderson acceleration with an adaptive smoothing technique and an automatic criterion to enhance deep learning optimization stability.
Findings
Effective stabilization of Anderson acceleration in stochastic settings
Improved convergence in deep learning models across various tasks
Demonstrated scalability on high-performance computing systems
Abstract
Anderson acceleration (AA) is an extrapolation technique designed to speed-up fixed-point iterations like those arising from the iterative training of DL models. Training DL models requires large datasets processed in randomly sampled batches that tend to introduce in the fixed-point iteration stochastic oscillations of amplitude roughly inversely proportional to the size of the batch. These oscillations reduce and occasionally eliminate the positive effect of AA. To restore AA's advantage, we combine it with an adaptive moving average procedure that smoothes the oscillations and results in a more regular sequence of gradient descent updates. By monitoring the relative standard deviation between consecutive iterations, we also introduce a criterion to automatically assess whether the moving average is needed. We applied the method to the following DL instantiations: (i) multi-layer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel Reduction and Neural Networks · Radiomics and Machine Learning in Medical Imaging · Seismic Imaging and Inversion Techniques
