Exploring Landscapes for Better Minima along Valleys
Tong Zhao, Jiacheng Li, Yuanchang Zhou, Guangming Tan, Weile Jia

TL;DR
This paper introduces an adaptive optimizer extension called ALTO that encourages exploration along loss landscape valleys, leading to better minima and improved generalization in deep learning, especially in large-batch training.
Contribution
The paper proposes a novel adaptive optimizer that explores valleys in the loss landscape, enhancing the search for flatter minima and improving generalization performance.
Findings
ALTO improves test accuracy by an average of 2.5% in large-batch training.
The approach increases the likelihood of finding flatter, better minima.
Theoretical convergence is proven for both convex and non-convex cases.
Abstract
Finding lower and better-generalizing minima is crucial for deep learning. However, most existing optimizers stop searching the parameter space once they reach a local minimum. Given the complex geometric properties of the loss landscape, it is difficult to guarantee that such a point is the lowest or provides the best generalization. To address this, we propose an adaptor "E" for gradient-based optimizers. The adapted optimizer tends to continue exploring along landscape valleys (areas with low and nearly identical losses) in order to search for potentially better local minima even after reaching a local minimum. This approach increases the likelihood of finding a lower and flatter local minimum, which is often associated with better generalization. We also provide a proof of convergence for the adapted optimizers in both convex and non-convex scenarios for completeness. Finally, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Advanced Neural Network Applications · Metaheuristic Optimization Algorithms Research
