Empirically explaining SGD from a line search perspective

Maximus Mutschler; Andreas Zell

arXiv:2103.17132·cs.LG·November 23, 2022

Empirically explaining SGD from a line search perspective

Maximus Mutschler, Andreas Zell

PDF

1 Repo

TL;DR

This paper empirically analyzes the behavior of SGD from a line search perspective, revealing that the full-batch loss along update directions is parabolic and that SGD can perform near-exact line searches, providing insights into batch size and learning rate effects.

Contribution

It offers the first empirical analysis of SGD trajectories from a line search perspective, demonstrating parabolic loss behavior and near-exact line search conditions.

Findings

01

Full-batch loss along update lines is highly parabolic.

02

Existence of a learning rate enabling near-exact line searches.

03

Increasing batch size has a similar effect as decreasing learning rate.

Abstract

Optimization in Deep Learning is mainly guided by vague intuitions and strong assumptions, with a limited understanding how and why these work in practice. To shed more light on this, our work provides some deeper understandings of how SGD behaves by empirically analyzing the trajectory taken by SGD from a line search perspective. Specifically, a costly quantitative analysis of the full-batch loss along SGD trajectories from common used models trained on a subset of CIFAR-10 is performed. Our core results include that the full-batch loss along lines in update step direction is highly parabolically. Further on, we show that there exists a learning rate with which SGD always performs almost exact line searches on the full-batch loss. Finally, we provide a different perspective why increasing the batch size has almost the same effect as decreasing the learning rate by the same factor.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cogsys-tuebingen/empirically_explaining_sgd_from_a_line_search_perspective
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsStochastic Gradient Descent