An Inertial Newton Algorithm for Deep Learning

Camille Castera; J\'er\^ome Bolte; C\'edric F\'evotte; Edouard Pauwels

arXiv:1905.12278·cs.LG·August 17, 2021·6 cites

An Inertial Newton Algorithm for Deep Learning

Camille Castera, J\'er\^ome Bolte, C\'edric F\'evotte, Edouard Pauwels

PDF

Open Access 2 Repos

TL;DR

This paper introduces INNA, a second-order inertial optimization algorithm tailored for deep learning, combining Newton-like behavior with stochastic approximations, and demonstrates its convergence and competitive performance on benchmark tasks.

Contribution

The paper presents INNA, a novel second-order inertial method for deep learning that leverages loss geometry and proves its convergence, addressing spurious stationary points and enabling aggressive learning rates.

Findings

01

INNA achieves competitive results on deep learning benchmarks.

02

Theoretical convergence of INNA is established for deep learning problems.

03

Addresses spurious stationary points via $D$-criticality framework.

Abstract

We introduce a new second-order inertial optimization method for machine learning called INNA. It exploits the geometry of the loss function while only requiring stochastic approximations of the function values and the generalized gradients. This makes INNA fully implementable and adapted to large-scale optimization problems such as the training of deep neural networks. The algorithm combines both gradient-descent and Newton-like behaviors as well as inertia. We prove the convergence of INNA for most deep learning problems. To do so, we provide a well-suited framework to analyze deep learning loss functions involving tame optimization in which we study a continuous dynamical system together with its discrete stochastic approximations. We prove sublinear convergence for the continuous-time differential inclusion which underlies our algorithm. Additionally, we also show how standard…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Markov Chains and Monte Carlo Methods · Sparse and Compressive Sensing Techniques