Path-SGD: Path-Normalized Optimization in Deep Neural Networks

Behnam Neyshabur; Ruslan Salakhutdinov; Nathan Srebro

arXiv:1506.02617·cs.LG·June 9, 2015·164 cites

Path-SGD: Path-Normalized Optimization in Deep Neural Networks

Behnam Neyshabur, Ruslan Salakhutdinov, Nathan Srebro

PDF

Open Access 1 Repo

TL;DR

This paper introduces Path-SGD, a new optimization method for deep neural networks based on a rescaling-invariant geometry, demonstrating empirical improvements over traditional optimizers like SGD and AdaGrad.

Contribution

The paper proposes Path-SGD, an approximate steepest descent method using a path-wise regularizer, offering a novel geometry for training neural networks.

Findings

01

Path-SGD is easy and efficient to implement.

02

Path-SGD outperforms SGD and AdaGrad in experiments.

03

The method is based on a rescaling-invariant geometry.

Abstract

We revisit the choice of SGD for training deep neural networks by reconsidering the appropriate geometry in which to optimize the weights. We argue for a geometry invariant to rescaling of weights that does not affect the output of the network, and suggest Path-SGD, which is an approximate steepest descent method with respect to a path-wise regularizer related to max-norm regularization. Path-SGD is easy and efficient to implement and leads to empirical gains over SGD and AdaGrad.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bneyshabur/path-sgd
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Advanced Neural Network Applications

MethodsAdaGrad · Stochastic Gradient Descent