A Novel Fast Exact Subproblem Solver for Stochastic Quasi-Newton Cubic   Regularized Optimization

Jarad Forristal; Joshua Griffin; Wenwen Zhou; Seyedalireza Yektamaram

arXiv:2204.09116·math.OC·April 21, 2022

A Novel Fast Exact Subproblem Solver for Stochastic Quasi-Newton Cubic Regularized Optimization

Jarad Forristal, Joshua Griffin, Wenwen Zhou, Seyedalireza Yektamaram

PDF

Open Access

TL;DR

This paper introduces a fast, exact solver for cubic regularized subproblems in stochastic quasi-Newton optimization, enabling scalable second-order methods with competitive performance on deep neural networks.

Contribution

It presents a novel matrix-free, exact subproblem solver for LQN-based ARC methods, improving speed and scalability in large-scale nonconvex optimization.

Findings

01

Substantial speed-ups over traditional methods.

02

Competitive performance with Adam on DNNs.

03

Minimal tuning required for the proposed optimizer.

Abstract

In this work we describe an Adaptive Regularization using Cubics (ARC) method for large-scale nonconvex unconstrained optimization using Limited-memory Quasi-Newton (LQN) matrices. ARC methods are a relatively new family of optimization strategies that utilize a cubic-regularization (CR) term in place of trust-regions and line-searches. LQN methods offer a large-scale alternative to using explicit second-order information by taking identical inputs to those used by popular first-order methods such as stochastic gradient descent (SGD). Solving the CR subproblem exactly requires Newton's method, yet using properties of the internal structure of LQN matrices, we are able to find exact solutions to the CR subproblem in a matrix-free manner, providing large speedups and scaling into modern size requirements. Additionally, we expand upon previous ARC work and explicitly incorporate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Machine Learning and ELM

MethodsAdam