SmoothOut: Smoothing Out Sharp Minima to Improve Generalization in Deep Learning
Wei Wen, Yandan Wang, Feng Yan, Cong Xu, Chunpeng Wu, Yiran Chen, Hai, Li

TL;DR
This paper introduces SmoothOut, a novel framework that smooths sharp minima in deep neural networks through noise injection and averaging, leading to improved generalization in training.
Contribution
The paper proposes SmoothOut and its variants, which effectively eliminate sharp minima and enhance generalization, differing from existing noise injection methods by applying de-noising and adaptive noise strategies.
Findings
SmoothOut effectively eliminates sharp minima.
SmoothOut and AdaSmoothOut improve generalization across various training settings.
The methods outperform state-of-the-art solutions in experiments.
Abstract
In Deep Learning, Stochastic Gradient Descent (SGD) is usually selected as a training method because of its efficiency; however, recently, a problem in SGD gains research interest: sharp minima in Deep Neural Networks (DNNs) have poor generalization; especially, large-batch SGD tends to converge to sharp minima. It becomes an open question whether escaping sharp minima can improve the generalization. To answer this question, we propose SmoothOut framework to smooth out sharp minima in DNNs and thereby improve generalization. In a nutshell, SmoothOut perturbs multiple copies of the DNN by noise injection and averages these copies. Injecting noises to SGD is widely used in the literature, but SmoothOut differs in lots of ways: (1) a de-noising process is applied before parameter updating; (2) noise strength is adapted to filter norm; (3) an alternative interpretation on the advantage of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Sparse and Compressive Sensing Techniques
MethodsStochastic Gradient Descent
