First-order and second-order variants of the gradient descent in a unified framework
Thomas Pierrot, Nicolas Perrin, Olivier Sigaud

TL;DR
This paper presents a unified framework that interprets six first- and second-order gradient descent variants used in machine learning, clarifying their relationships and specificities.
Contribution
It introduces a general framework that unifies six gradient descent variants, highlighting their connections and conditions under which they coincide.
Findings
Unified interpretation of six gradient descent variants
Conditions under which different methods coincide
Enhanced understanding of method-specificities
Abstract
In this paper, we provide an overview of first-order and second-order variants of the gradient descent method that are commonly used in machine learning. We propose a general framework in which 6 of these variants can be interpreted as different instances of the same approach. They are the vanilla gradient descent, the classical and generalized Gauss-Newton methods, the natural gradient descent method, the gradient covariance matrix approach, and Newton's method. Besides interpreting these methods within a single framework, we explain their specificities and show under which conditions some of them coincide.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel Reduction and Neural Networks · Advanced Optimization Algorithms Research · Stochastic Gradient Optimization Techniques
