Banker Online Mirror Descent: A Universal Approach for Delayed Online   Bandit Learning

Jiatai Huang; Yan Dai; Longbo Huang

arXiv:2301.10500·cs.LG·May 30, 2023

Banker Online Mirror Descent: A Universal Approach for Delayed Online Bandit Learning

Jiatai Huang, Yan Dai, Longbo Huang

PDF

Open Access 1 Video

TL;DR

The paper introduces Banker-OMD, a flexible framework that generalizes online mirror descent to efficiently handle delayed feedback in bandit learning, achieving near-optimal regret bounds in various scenarios.

Contribution

It presents Banker-OMD, a universal method decoupling delay handling from task-specific algorithms, enabling new delayed bandit algorithms with improved regret guarantees.

Findings

01

First delayed scale-free adversarial MAB algorithm with $ ilde{O}( ext{sqrt}(K) L ( ext{sqrt} T + ext{sqrt} D))$ regret.

02

First delayed adversarial linear bandit algorithm with $ ilde{O}( ext{poly}(n)( ext{sqrt} T + ext{sqrt} D))$ regret.

03

Achieves near-optimal regret bounds matching lower bounds in non-delayed settings.

Abstract

We propose Banker Online Mirror Descent (Banker-OMD), a novel framework generalizing the classical Online Mirror Descent (OMD) technique in the online learning literature. The Banker-OMD framework almost completely decouples feedback delay handling and the task-specific OMD algorithm design, thus facilitating the design of new algorithms capable of efficiently and robustly handling feedback delays. Specifically, it offers a general methodology for achieving $O (T + D)$ -style regret bounds in online bandit learning tasks with delayed feedback, where $T$ is the number of rounds and $D$ is the total feedback delay. We demonstrate the power of \texttt{Banker-OMD} by applications to two important bandit learning scenarios with delayed feedback, including delayed scale-free adversarial Multi-Armed Bandits (MAB) and delayed adversarial linear bandits.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Banker Online Mirror Descent: A Universal Approach for Delayed Online Bandit Learning· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Adversarial Robustness in Machine Learning · Machine Learning and Algorithms