Black Box Lie Group Preconditioners for SGD

Xilin Li

arXiv:2211.04422·stat.ML·November 9, 2022·1 cites

Black Box Lie Group Preconditioners for SGD

Xilin Li

PDF

Open Access 2 Repos

TL;DR

This paper introduces Lie group-based preconditioners for stochastic gradient descent that leverage curvature information, are matrix-free, and maintain invariance properties, leading to more robust and efficient optimization.

Contribution

It proposes novel preconditioners constrained on Lie groups, enabling efficient curvature exploitation without line search or damping, improving SGD convergence.

Findings

01

Preconditioners accelerate SGD convergence.

02

Lie group constraints preserve symmetry and invariance.

03

Default hyperparameters perform well across tasks.

Abstract

A matrix free and a low rank approximation preconditioner are proposed to accelerate the convergence of stochastic gradient descent (SGD) by exploiting curvature information sampled from Hessian-vector products or finite differences of parameters and gradients similar to the BFGS algorithm. Both preconditioners are fitted with an online updating manner minimizing a criterion that is free of line search and robust to stochastic gradient noise, and further constrained to be on certain connected Lie groups to preserve their corresponding symmetry or invariance, e.g., orientation of coordinates by the connected general linear group with positive determinants. The Lie group's equivariance property facilitates preconditioner fitting, and its invariance property saves any need of damping, which is common in second-order optimizers, but difficult to tune. The learning rate for parameter…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Tensor decomposition and applications