Matrix-Free Preconditioning in Online Learning

Ashok Cutkosky; Tamas Sarlos

arXiv:1905.12721·cs.LG·May 31, 2019·1 cites

Matrix-Free Preconditioning in Online Learning

Ashok Cutkosky, Tamas Sarlos

PDF

Open Access

TL;DR

This paper introduces a novel online convex optimization algorithm that adaptively interpolates between diagonal and full-matrix preconditioning, achieving improved regret bounds with efficient computation, and demonstrates its effectiveness on synthetic and deep learning tasks.

Contribution

The paper presents a new matrix-free preconditioning algorithm for online learning with regret bounds that outperform diagonal preconditioning and match full-matrix methods, while maintaining efficiency.

Findings

01

Regret bounds are never worse than diagonal preconditioning.

02

In certain settings, the algorithm surpasses full-matrix preconditioning.

03

Benchmarking shows effectiveness on synthetic and deep learning data.

Abstract

We provide an online convex optimization algorithm with regret that interpolates between the regret of an algorithm using an optimal preconditioning matrix and one using a diagonal preconditioning matrix. Our regret bound is never worse than that obtained by diagonal preconditioning, and in certain setting even surpasses that of algorithms with full-matrix preconditioning. Importantly, our algorithm runs in the same time and space complexity as online gradient descent. Along the way we incorporate new techniques that mildly streamline and improve logarithmic factors in prior regret analyses. We conclude by benchmarking our algorithm on synthetic data and deep learning tasks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Sparse and Compressive Sensing Techniques · Stochastic Gradient Optimization Techniques