Saddle-free Hessian-free Optimization

Martin Arjovsky

arXiv:1506.00059·cs.NA·November 8, 2016·1 cites

Saddle-free Hessian-free Optimization

Martin Arjovsky

PDF

Open Access

TL;DR

This paper introduces a novel optimization algorithm designed to efficiently navigate nonconvex loss landscapes in deep learning, addressing saddle point proliferation and computational challenges associated with second-order methods.

Contribution

The paper presents a saddle-free Hessian-free optimization algorithm that overcomes computational complexity and saddle point attraction issues in second-order methods for deep neural network training.

Findings

01

Effective in high-dimensional nonconvex optimization

02

Reduces saddle point trapping in training

03

Improves convergence speed over traditional methods

Abstract

Nonconvex optimization problems such as the ones in training deep neural networks suffer from a phenomenon called saddle point proliferation. This means that there are a vast number of high error saddle points present in the loss function. Second order methods have been tremendously successful and widely adopted in the convex optimization community, while their usefulness in deep learning remains limited. This is due to two problems: computational complexity and the methods being driven towards the high error saddle points. We introduce a novel algorithm specially designed to solve these two issues, providing a crucial first step to take the widely known advantages of Newton's method to the nonconvex optimization community, especially in high dimensional settings.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSparse and Compressive Sensing Techniques · Stochastic Gradient Optimization Techniques · Model Reduction and Neural Networks