Implicit Interpretation of Importance Weight Aware Updates

Keyi Chen; Francesco Orabona

arXiv:2307.11955·cs.LG·July 25, 2023

Implicit Interpretation of Importance Weight Aware Updates

Keyi Chen, Francesco Orabona

PDF

Open Access

TL;DR

This paper demonstrates that Importance Weight Aware (IWA) updates in convex optimization have a strictly better regret bound than standard gradient updates, explaining their empirical success through a new theoretical framework.

Contribution

The paper introduces a novel analysis showing IWA updates possess superior regret bounds, framing them as approximate implicit/proximal updates within a new generalized implicit FTRL framework.

Findings

01

IWA updates have a strictly better regret upper bound than plain gradient updates.

02

IWA updates can be viewed as approximate implicit/proximal updates.

03

The analysis is based on a new generalized implicit FTRL framework.

Abstract

Due to its speed and simplicity, subgradient descent is one of the most used optimization algorithms in convex machine learning algorithms. However, tuning its learning rate is probably its most severe bottleneck to achieve consistent good performance. A common way to reduce the dependency on the learning rate is to use implicit/proximal updates. One such variant is the Importance Weight Aware (IWA) updates, which consist of infinitely many infinitesimal updates on each loss function. However, IWA updates' empirical success is not completely explained by their theory. In this paper, we show for the first time that IWA updates have a strictly better regret upper bound than plain gradient updates in the online learning setting. Our analysis is based on the new framework, generalized implicit Follow-the-Regularized-Leader (FTRL) (Chen and Orabona, 2023), to analyze generalized implicit…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSparse and Compressive Sensing Techniques · Advanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques

MethodsAttentive Walk-Aggregating Graph Neural Network · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings