An overview of gradient descent optimization algorithms

Sebastian Ruder

arXiv:1609.04747·cs.LG·June 16, 2017·4.8k cites

An overview of gradient descent optimization algorithms

Sebastian Ruder

PDF

Open Access 5 Repos

TL;DR

This paper provides an intuitive overview of various gradient descent optimization algorithms, discussing their behaviors, challenges, and strategies for effective use in different settings.

Contribution

It offers a comprehensive summary of gradient descent variants, challenges, and optimization strategies, aiding practitioners in understanding their practical strengths and weaknesses.

Findings

01

Different variants of gradient descent are compared and explained.

02

Challenges in applying gradient descent are summarized.

03

Strategies for optimizing gradient descent in various architectures are reviewed.

Abstract

Gradient descent optimization algorithms, while increasingly popular, are often used as black-box optimizers, as practical explanations of their strengths and weaknesses are hard to come by. This article aims to provide the reader with intuitions with regard to the behaviour of different algorithms that will allow her to put them to use. In the course of this overview, we look at different variants of gradient descent, summarize challenges, introduce the most common optimization algorithms, review architectures in a parallel and distributed setting, and investigate additional strategies for optimizing gradient descent.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Medical Image Segmentation Techniques · Face and Expression Recognition