Bolstering Stochastic Gradient Descent with Model Building

S. Ilker Birbil; Ozgur Martin; Gonenc Onay; Figen Oztoprak

arXiv:2111.07058·cs.LG·March 14, 2024

Bolstering Stochastic Gradient Descent with Model Building

S. Ilker Birbil, Ozgur Martin, Gonenc Onay, Figen Oztoprak

PDF

Open Access 1 Repo

TL;DR

This paper introduces a new stochastic gradient descent method that uses model building with second-order information to adaptively adjust step sizes and directions, leading to faster convergence and better generalization.

Contribution

The paper proposes a novel stochastic line search algorithm based on forward step model building that incorporates second-order information and adapts to parameter groups in deep learning.

Findings

01

Achieves faster convergence in test problems

02

Requires less tuning compared to existing methods

03

Demonstrates improved generalization performance

Abstract

Stochastic gradient descent method and its variants constitute the core optimization algorithms that achieve good convergence rates for solving machine learning problems. These rates are obtained especially when these algorithms are fine-tuned for the application at hand. Although this tuning process can require large computational costs, recent work has shown that these costs can be reduced by line search methods that iteratively adjust the step length. We propose an alternative approach to stochastic line search by using a new algorithm based on forward step model building. This model building step incorporates second-order information that allows adjusting not only the step length but also the search direction. Noting that deep learning model parameters come in groups (layers of tensors), our method builds its model and calculates a new step for each parameter group. This novel…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sibirbil/smb
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Advanced Neural Network Applications · Machine Learning and ELM