Bounded Memory Adversarial Bandits with Composite Anonymous Delayed Feedback
Zongqi Wan, Xiaoming Sun, Jialin Zhang

TL;DR
This paper investigates adversarial bandit problems with composite anonymous delayed feedback, revealing the challenges of non-oblivious settings and proposing algorithms with sublinear regret bounds for bounded memory loss sequences.
Contribution
It introduces a wrapper algorithm achieving sublinear policy regret in non-oblivious delay settings and establishes a matching lower bound, advancing understanding of delayed feedback in adversarial bandits.
Findings
Non-oblivious setting incurs linear regret without assumptions.
Proposed algorithm achieves $o(T)$ policy regret under bounded memory.
Lower bound matches $T^{2/3}$ regret, confirming the difficulty of non-oblivious delays.
Abstract
We study the adversarial bandit problem with composite anonymous delayed feedback. In this setting, losses of an action are split into components, spreading over consecutive rounds after the action is chosen. And in each round, the algorithm observes the aggregation of losses that come from the latest rounds. Previous works focus on oblivious adversarial setting, while we investigate the harder non-oblivious setting. We show non-oblivious setting incurs pseudo regret even when the loss sequence is bounded memory. However, we propose a wrapper algorithm which enjoys policy regret on many adversarial bandit problems with the assumption that the loss sequence is bounded memory. Especially, for -armed bandit and bandit convex optimization, we have policy regret bound. We also prove a matching lower bound for -armed bandit. Our lower…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Adversarial Robustness in Machine Learning · Stochastic Gradient Optimization Techniques
