Randomized Sharpness-Aware Training for Boosting Computational   Efficiency in Deep Learning

Yang Zhao; Hao Zhang; Xiuyuan Hu

arXiv:2203.09962·cs.LG·April 11, 2023·1 cites

Randomized Sharpness-Aware Training for Boosting Computational Efficiency in Deep Learning

Yang Zhao, Hao Zhang, Xiuyuan Hu

PDF

Open Access

TL;DR

This paper introduces Randomized Sharpness-Aware Training (RST), a method that reduces computational costs of sharpness-aware algorithms like SAM by probabilistically mixing them with SGD, maintaining performance while halving computation.

Contribution

The paper proposes RST, a novel probabilistic training scheme that decreases computational burden of sharpness-aware methods and provides theoretical convergence analysis and practical guidelines.

Findings

01

RST can reduce computation by 50% compared to SAM.

02

G-RST outperforms SAM in most cases with less computation.

03

Theoretical analysis supports convergence of RST.

Abstract

By driving models to converge to flat minima, sharpness-aware learning algorithms (such as SAM) have shown the power to achieve state-of-the-art performances. However, these algorithms will generally incur one extra forward-backward propagation at each training iteration, which largely burdens the computation especially for scalable models. To this end, we propose a simple yet efficient training scheme, called Randomized Sharpness-Aware Training (RST). Optimizers in RST would perform a Bernoulli trial at each iteration to choose randomly from base algorithms (SGD) and sharpness-aware algorithms (SAM) with a probability arranged by a predefined scheduling function. Due to the mixture of base algorithms, the overall count of propagation pairs could be largely reduced. Also, we give theoretical analysis on the convergence of RST. Then, we empirically study the computation cost and effect…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Machine Learning and Data Classification · Advanced Neural Network Applications

MethodsStochastic Gradient Descent · Sharpness-Aware Minimization