Online Second Order Methods for Non-Convex Stochastic Optimizations
Xi-Lin Li

TL;DR
This paper introduces advanced online second order methods based on PSGD for non-convex stochastic optimization, improving stability, preconditioning, and explaining neural network training performance, with practical TensorFlow implementations.
Contribution
It develops improved PSGD algorithms with new preconditioners, better Hessian computations, and theoretical insights into feature normalization, enhancing deep neural network training.
Findings
PSGD outperforms traditional SGD in convergence speed.
Preconditioners significantly improve generalization.
Theoretical link between feature normalization and preconditioning.
Abstract
This paper proposes a family of online second order methods for possibly non-convex stochastic optimizations based on the theory of preconditioned stochastic gradient descent (PSGD), which can be regarded as an enhance stochastic Newton method with the ability to handle gradient noise and non-convexity simultaneously. We have improved the implementations of the original PSGD in several ways, e.g., new forms of preconditioners, more accurate Hessian vector product calculations, and better numerical stability with vanishing or ill-conditioned Hessian, etc.. We also have unrevealed the relationship between feature normalization and PSGD with Kronecker product preconditioners, which explains the excellent performance of Kronecker product preconditioners in deep neural network learning. A software package (https://github.com/lixilinx/psgd_tf) implemented in Tensorflow is provided to compare…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Stochastic Gradient Optimization Techniques · Machine Learning and ELM
