Adversarial Bandits against Arbitrary Strategies
Jung-hun Kim, Se-Young Yun

TL;DR
This paper addresses adversarial bandit problems with arbitrary strategies by developing algorithms that adapt to the number of switches in the best arm, achieving improved regret bounds through adaptive methods.
Contribution
The paper introduces a master-base framework with adaptive learning rates for online mirror descent, leading to tighter regret bounds in adversarial bandit settings with switches.
Findings
Achieved regret of O(S^{1/2}K^{1/3}T^{2/3}) with simple OMD.
Improved regret to O(\u221a{ ext{min}\{ ext{SKT} ho, ext{S} ext{KT} ight}) using adaptive learning rates.
Demonstrated effectiveness of adaptive methods in handling variance in loss estimators.
Abstract
We study the adversarial bandit problem against arbitrary strategies, where the difficulty is captured by an unknown parameter , which is the number of switches in the best arm in hindsight. To handle this problem, we adopt the master-base framework using the online mirror descent method (OMD). We first provide a master-base algorithm with simple OMD, achieving , in which comes from the variance of loss estimators. To mitigate the impact of the variance, we propose using adaptive learning rates for OMD and achieve , where is a variance term for loss estimators.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Machine Learning and Algorithms · Advanced Bandit Algorithms Research
