TL;DR
This paper introduces Z-Score Filtered Sharpness-Aware Minimization, a gradient filtering technique that enhances neural network generalization by focusing on significant gradient components, leading to improved test accuracy.
Contribution
It proposes a novel Z-score based gradient filtering method for SAM, improving its ability to find flatter minima and enhance generalization performance.
Findings
Consistently improves test accuracy across datasets and architectures.
Reduces influence of noisy or small gradient components.
Effective in various neural network models like ResNet, VGG, and Vision Transformers.
Abstract
Deep neural networks achieve high performance across many domains but can still face challenges in generalization when optimization is influenced by small or noisy gradient components. Sharpness-Aware Minimization improves generalization by perturbing parameters toward directions of high curvature, but it uses the entire gradient vector, which means that small or noisy components may affect the ascent step and cause the optimizer to miss optimal solutions. We propose Z-Score Filtered Sharpness-Aware Minimization, which applies Z-score based filtering to gradients in each layer. Instead of using all gradient components, a mask is constructed to retain only the top percentile with the largest absolute Z-scores. The percentile threshold determines how many components are kept, so that the ascent step focuses on directions that stand out most compared to the average of the layer. This…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
