Exact Stochastic Second Order Deep Learning

Fares B. Mehouachi; Chaouki Kasmi

arXiv:2104.03804·cs.LG·April 9, 2021

Exact Stochastic Second Order Deep Learning

Fares B. Mehouachi, Chaouki Kasmi

PDF

Open Access

TL;DR

This paper introduces an exact stochastic second-order optimization method for deep learning that overcomes traditional computational challenges, enabling more efficient training by leveraging regularization and spectral adjustments.

Contribution

It provides a closed-form formula for the exact stochastic Hessian and Newton direction, addressing non-convexity and promoting flat minima in deep learning optimization.

Findings

01

The method accurately computes the stochastic Hessian eigenvalues.

02

It effectively finds the Newton direction in non-convex settings.

03

Experimental results show improved optimization on popular datasets.

Abstract

Optimization in Deep Learning is mainly dominated by first-order methods which are built around the central concept of backpropagation. Second-order optimization methods, which take into account the second-order derivatives are far less used despite superior theoretical properties. This inadequacy of second-order methods stems from its exorbitant computational cost, poor performance, and the ineluctable non-convex nature of Deep Learning. Several attempts were made to resolve the inadequacy of second-order optimization without reaching a cost-effective solution, much less an exact solution. In this work, we show that this long-standing problem in Deep Learning could be solved in the stochastic case, given a suitable regularization of the neural network. Interestingly, we provide an expression of the stochastic Hessian and its exact eigenvalues. We provide a closed-form formula for the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Machine Learning and ELM