G-TRACER: Expected Sharpness Optimization
John Williams, Stephen Roberts

TL;DR
G-TRACER introduces a regularization method that encourages flat minima in deep learning models, improving generalization and performance on challenging datasets by approximating natural-gradient descent.
Contribution
The paper presents G-TRACER, a novel curvature-based regularization scheme that is easy to implement and theoretically grounded, enhancing optimization for deep learning.
Findings
Achieves competitive results on vision and NLP benchmarks.
Effectively handles low signal-to-noise ratio problems.
Converges to a neighborhood of local minima.
Abstract
We propose a new regularization scheme for the optimization of deep learning architectures, G-TRACER ("Geometric TRACE Ratio"), which promotes generalization by seeking flat minima, and has a sound theoretical basis as an approximation to a natural-gradient descent based optimization of a generalized Bayes objective. By augmenting the loss function with a TRACER, curvature-regularized optimizers (eg SGD-TRACER and Adam-TRACER) are simple to implement as modifications to existing optimizers and don't require extensive tuning. We show that the method converges to a neighborhood (depending on the regularization strength) of a local minimum of the unregularized objective, and demonstrate competitive performance on a number of benchmark computer vision and NLP datasets, with a particular focus on challenging low signal-to-noise ratio problems.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Human Pose and Action Recognition · Advanced Image and Video Retrieval Techniques
MethodsFocus
