Randomized Sharpness-Aware Training for Boosting Computational Efficiency in Deep Learning
Yang Zhao, Hao Zhang, Xiuyuan Hu

TL;DR
This paper introduces Randomized Sharpness-Aware Training (RST), a method that reduces computational costs of sharpness-aware algorithms like SAM by probabilistically mixing them with SGD, maintaining performance while halving computation.
Contribution
The paper proposes RST, a novel probabilistic training scheme that decreases computational burden of sharpness-aware methods and provides theoretical convergence analysis and practical guidelines.
Findings
RST can reduce computation by 50% compared to SAM.
G-RST outperforms SAM in most cases with less computation.
Theoretical analysis supports convergence of RST.
Abstract
By driving models to converge to flat minima, sharpness-aware learning algorithms (such as SAM) have shown the power to achieve state-of-the-art performances. However, these algorithms will generally incur one extra forward-backward propagation at each training iteration, which largely burdens the computation especially for scalable models. To this end, we propose a simple yet efficient training scheme, called Randomized Sharpness-Aware Training (RST). Optimizers in RST would perform a Bernoulli trial at each iteration to choose randomly from base algorithms (SGD) and sharpness-aware algorithms (SAM) with a probability arranged by a predefined scheduling function. Due to the mixture of base algorithms, the overall count of propagation pairs could be largely reduced. Also, we give theoretical analysis on the convergence of RST. Then, we empirically study the computation cost and effect…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Machine Learning and Data Classification · Advanced Neural Network Applications
MethodsStochastic Gradient Descent · Sharpness-Aware Minimization
