First-order and second-order variants of the gradient descent in a   unified framework

Thomas Pierrot; Nicolas Perrin; Olivier Sigaud

arXiv:1810.08102·cs.LG·August 17, 2021·1 cites

First-order and second-order variants of the gradient descent in a unified framework

Thomas Pierrot, Nicolas Perrin, Olivier Sigaud

PDF

Open Access

TL;DR

This paper presents a unified framework that interprets six first- and second-order gradient descent variants used in machine learning, clarifying their relationships and specificities.

Contribution

It introduces a general framework that unifies six gradient descent variants, highlighting their connections and conditions under which they coincide.

Findings

01

Unified interpretation of six gradient descent variants

02

Conditions under which different methods coincide

03

Enhanced understanding of method-specificities

Abstract

In this paper, we provide an overview of first-order and second-order variants of the gradient descent method that are commonly used in machine learning. We propose a general framework in which 6 of these variants can be interpreted as different instances of the same approach. They are the vanilla gradient descent, the classical and generalized Gauss-Newton methods, the natural gradient descent method, the gradient covariance matrix approach, and Newton's method. Besides interpreting these methods within a single framework, we explain their specificities and show under which conditions some of them coincide.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel Reduction and Neural Networks · Advanced Optimization Algorithms Research · Stochastic Gradient Optimization Techniques