Path-SGD: Path-Normalized Optimization in Deep Neural Networks
Behnam Neyshabur, Ruslan Salakhutdinov, Nathan Srebro

TL;DR
This paper introduces Path-SGD, a new optimization method for deep neural networks based on a rescaling-invariant geometry, demonstrating empirical improvements over traditional optimizers like SGD and AdaGrad.
Contribution
The paper proposes Path-SGD, an approximate steepest descent method using a path-wise regularizer, offering a novel geometry for training neural networks.
Findings
Path-SGD is easy and efficient to implement.
Path-SGD outperforms SGD and AdaGrad in experiments.
The method is based on a rescaling-invariant geometry.
Abstract
We revisit the choice of SGD for training deep neural networks by reconsidering the appropriate geometry in which to optimize the weights. We argue for a geometry invariant to rescaling of weights that does not affect the output of the network, and suggest Path-SGD, which is an approximate steepest descent method with respect to a path-wise regularizer related to max-norm regularization. Path-SGD is easy and efficient to implement and leads to empirical gains over SGD and AdaGrad.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Advanced Neural Network Applications
MethodsAdaGrad · Stochastic Gradient Descent
