Efficient and Optimal Policy Gradient Algorithm for Corrupted   Multi-armed Bandits

Jiayuan Liu; Siwei Wang; Zhixuan Fang

arXiv:2502.14146·cs.LG·February 21, 2025

Efficient and Optimal Policy Gradient Algorithm for Corrupted Multi-armed Bandits

Jiayuan Liu, Siwei Wang, Zhixuan Fang

PDF

Open Access

TL;DR

This paper introduces SAMBA, a policy gradient algorithm for corrupted multi-armed bandits, achieving near-optimal regret bounds and outperforming existing methods through theoretical analysis and simulations.

Contribution

The paper proposes SAMBA, a computationally efficient policy gradient algorithm that improves regret bounds for corrupted bandit problems, reducing the logarithmic factor compared to prior algorithms.

Findings

01

SAMBA achieves a regret bound of O(K log T / Δ + C / Δ).

02

SAMBA reduces the log T factor in regret compared to CBARBAR.

03

Simulations show SAMBA outperforms existing algorithms in practice.

Abstract

In this paper, we consider the stochastic multi-armed bandits problem with adversarial corruptions, where the random rewards of the arms are partially modified by an adversary to fool the algorithm. We apply the policy gradient algorithm SAMBA to this setting, and show that it is computationally efficient, and achieves a state-of-the-art $O (K lo g T /Δ) + O (C /Δ)$ regret upper bound, where $K$ is the number of arms, $C$ is the unknown corruption level, $Δ$ is the minimum expected reward gap between the best arm and other ones, and $T$ is the time horizon. Compared with the best existing efficient algorithm (e.g., CBARBAR), whose regret upper bound is $O (K lo g^{2} T /Δ) + O (C)$ , we show that SAMBA reduces one $lo g T$ factor in the regret bound, while maintaining the corruption-dependent term to be linear with $C$ . This is indeed asymptotically optimal. We also conduct…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Machine Learning and ELM · Optimization and Search Problems