Adversarial Bandits against Arbitrary Strategies

Jung-hun Kim; Se-Young Yun

arXiv:2205.14839·cs.LG·November 26, 2025

Adversarial Bandits against Arbitrary Strategies

Jung-hun Kim, Se-Young Yun

PDF

Open Access

TL;DR

This paper addresses adversarial bandit problems with arbitrary strategies by developing algorithms that adapt to the number of switches in the best arm, achieving improved regret bounds through adaptive methods.

Contribution

The paper introduces a master-base framework with adaptive learning rates for online mirror descent, leading to tighter regret bounds in adversarial bandit settings with switches.

Findings

01

Achieved regret of O(S^{1/2}K^{1/3}T^{2/3}) with simple OMD.

02

Improved regret to O(\u221a{ ext{min}\{ ext{SKT} ho, ext{S} ext{KT} ight}) using adaptive learning rates.

03

Demonstrated effectiveness of adaptive methods in handling variance in loss estimators.

Abstract

We study the adversarial bandit problem against arbitrary strategies, where the difficulty is captured by an unknown parameter $S$ , which is the number of switches in the best arm in hindsight. To handle this problem, we adopt the master-base framework using the online mirror descent method (OMD). We first provide a master-base algorithm with simple OMD, achieving $\tilde{O} (S^{1/2} K^{1/3} T^{2/3})$ , in which $T^{2/3}$ comes from the variance of loss estimators. To mitigate the impact of the variance, we propose using adaptive learning rates for OMD and achieve $\tilde{O} (min {S K T ρ, S K T})$ , where $ρ$ is a variance term for loss estimators.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Machine Learning and Algorithms · Advanced Bandit Algorithms Research