Active Probabilistic Inference on Matrices for Pre-Conditioning in Stochastic Optimization
Filip de Roos, Philipp Hennig

TL;DR
This paper introduces an active probabilistic inference method for constructing pre-conditioners in stochastic optimization, improving convergence in machine learning tasks, especially in high-dimensional deep learning problems.
Contribution
It presents a novel iterative algorithm that actively infers pre-conditioners using probabilistic models under noisy Hessian projections, applicable to stochastic gradient descent.
Findings
Efficiently constructs pre-conditioners for stochastic gradient descent.
Improves convergence in low-dimensional problems.
Acts as an automatic learning-rate adaptation in high-dimensional deep learning.
Abstract
Pre-conditioning is a well-known concept that can significantly improve the convergence of optimization algorithms. For noise-free problems, where good pre-conditioners are not known a priori, iterative linear algebra methods offer one way to efficiently construct them. For the stochastic optimization problems that dominate contemporary machine learning, however, this approach is not readily available. We propose an iterative algorithm inspired by classic iterative linear solvers that uses a probabilistic model to actively infer a pre-conditioner in situations where Hessian-projections can only be constructed with strong Gaussian noise. The algorithm is empirically demonstrated to efficiently construct effective pre-conditioners for stochastic gradient descent and its variants. Experiments on problems of comparably low dimensionality show improved convergence. In very high-dimensional…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Neural Networks and Applications · Sparse and Compressive Sensing Techniques
