Bounded Memory Adversarial Bandits with Composite Anonymous Delayed   Feedback

Zongqi Wan; Xiaoming Sun; Jialin Zhang

arXiv:2204.12764·cs.LG·April 29, 2022

Bounded Memory Adversarial Bandits with Composite Anonymous Delayed Feedback

Zongqi Wan, Xiaoming Sun, Jialin Zhang

PDF

Open Access

TL;DR

This paper investigates adversarial bandit problems with composite anonymous delayed feedback, revealing the challenges of non-oblivious settings and proposing algorithms with sublinear regret bounds for bounded memory loss sequences.

Contribution

It introduces a wrapper algorithm achieving sublinear policy regret in non-oblivious delay settings and establishes a matching lower bound, advancing understanding of delayed feedback in adversarial bandits.

Findings

01

Non-oblivious setting incurs linear regret without assumptions.

02

Proposed algorithm achieves $o(T)$ policy regret under bounded memory.

03

Lower bound matches $T^{2/3}$ regret, confirming the difficulty of non-oblivious delays.

Abstract

We study the adversarial bandit problem with composite anonymous delayed feedback. In this setting, losses of an action are split into $d$ components, spreading over consecutive rounds after the action is chosen. And in each round, the algorithm observes the aggregation of losses that come from the latest $d$ rounds. Previous works focus on oblivious adversarial setting, while we investigate the harder non-oblivious setting. We show non-oblivious setting incurs $Ω (T)$ pseudo regret even when the loss sequence is bounded memory. However, we propose a wrapper algorithm which enjoys $o (T)$ policy regret on many adversarial bandit problems with the assumption that the loss sequence is bounded memory. Especially, for $K$ -armed bandit and bandit convex optimization, we have $O (T^{2/3})$ policy regret bound. We also prove a matching lower bound for $K$ -armed bandit. Our lower…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Adversarial Robustness in Machine Learning · Stochastic Gradient Optimization Techniques