Neighborhood Region Smoothing Regularization for Finding Flat Minima In Deep Neural Networks
Yang Zhao, Hao Zhang

TL;DR
This paper introduces Neighborhood Region Smoothing (NRS), a regularization method that encourages deep neural networks to converge to flat minima by smoothing the neighborhood in weight space, improving generalization across various architectures.
Contribution
NRS is a novel regularization technique that explicitly guides models toward flat minima by minimizing divergence in neighborhood outputs, supported by empirical results on multiple datasets.
Findings
NRS improves generalization on CIFAR and ImageNet.
Models trained with NRS have smaller Hessian eigenvalues.
NRS effectively finds flatter minima than traditional methods.
Abstract
Due to diverse architectures in deep neural networks (DNNs) with severe overparameterization, regularization techniques are critical for finding optimal solutions in the huge hypothesis space. In this paper, we propose an effective regularization technique, called Neighborhood Region Smoothing (NRS). NRS leverages the finding that models would benefit from converging to flat minima, and tries to regularize the neighborhood region in weight space to yield approximate outputs. Specifically, gap between outputs of models in the neighborhood region is gauged by a defined metric based on Kullback-Leibler divergence. This metric provides similar insights with the minimum description length principle on interpreting flat minima. By minimizing both this divergence and empirical loss, NRS could explicitly drive the optimizer towards converging to flat minima. We confirm the effectiveness of NRS…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Advanced Neural Network Applications · Gaussian Processes and Bayesian Inference
