A Survey of Optimization Methods for Training DL Models: Theoretical Perspective on Convergence and Generalization
Jing Wang, Anna Choromanska

TL;DR
This survey comprehensively reviews the theoretical foundations of optimization methods in deep learning, analyzing convergence and generalization, including gradient-based, adaptive, and distributed techniques for convex and non-convex problems.
Contribution
It provides an extensive theoretical analysis of various optimization algorithms in deep learning, emphasizing convergence and generalization, which is often overlooked in existing surveys.
Findings
Analysis of convergence properties of gradient-based methods
Insights into generalization capabilities of optimization techniques
Discussion of distributed optimization approaches
Abstract
As data sets grow in size and complexity, it is becoming more difficult to pull useful features from them using hand-crafted feature extractors. For this reason, deep learning (DL) frameworks are now widely popular. The Holy Grail of DL and one of the most mysterious challenges in all of modern ML is to develop a fundamental understanding of DL optimization and generalization. While numerous optimization techniques have been introduced in the literature to navigate the exploration of the highly non-convex DL optimization landscape, many survey papers reviewing them primarily focus on summarizing these methodologies, often overlooking the critical theoretical analyses of these methods. In this paper, we provide an extensive summary of the theoretical foundations of optimization methods in DL, including presenting various methodologies, their convergence analyses, and generalization…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Multi-Objective Optimization Algorithms
MethodsFocus
