Stochastic Bandits Robust to Adversarial Attacks
Xuchuang Wang, Jinhang Zuo, Xutong Liu, John C.S. Lui, Mohammad, Hajiesmaili

TL;DR
This paper develops robust stochastic bandit algorithms resilient to adversarial reward manipulations, providing tight regret bounds and demonstrating a fundamental difference from corruption models.
Contribution
It introduces new algorithms with proven regret bounds for both known and unknown attack budgets, advancing robustness in adversarial bandit settings.
Findings
Achieves regret bounds of O((K/Δ) log T + KC) and √(KTC) for known attack budgets.
Achieves regret bounds of √(KT) + KC^2 and KC√T for unknown attack budgets.
Provides lower bounds confirming the optimality of the proposed algorithms.
Abstract
This paper investigates stochastic multi-armed bandit algorithms that are robust to adversarial attacks, where an attacker can first observe the learner's action and {then} alter their reward observation. We study two cases of this model, with or without the knowledge of an attack budget , defined as an upper bound of the summation of the difference between the actual and altered rewards. For both cases, we devise two types of algorithms with regret bounds having additive or multiplicative dependence terms. For the known attack budget case, we prove our algorithms achieve the regret bound of and for the additive and multiplicative terms, respectively, where is the number of arms, is the time horizon, is the gap between the expected rewards of the optimal arm and the second-best arm, and hides…
Peer Reviews
Decision·ICLR 2025 Poster
1) The paper explores an under-studied area of stochastic bandits where adversarial attacks are present and obtains novel results as well as improving prior work about the already studied corruption model. 2) The theoretical bounds are tight (up to log terms), with mathematical proofs for each statement. 3) Experimental results to validate the theoretical claims are provided. 4) The authors do a good job clarifying the differences between corruption and attack models, highlighting the need for s
1) The implications for practical settings, such as recommendation systems or online auctions, could use some expanding. 2) Unfortunately, all the proofs are relegated to the appendix.
- This paper addresses a gap in the literature, recognizing that adversarial attacks have not been thoroughly explored within the classical multi-armed bandit (MAB) framework and effectively filling this gap. - The authors examine both additive and multiplicative bounds, providing a clear comparison that shows which approach performs better based on the attack budget C. - Figures 1 and, especially, Figure 2 nicely illustrate the results of attack-based multiplicative and additive bounds, offerin
1. Algorithm Design: I didn’t notice any novel or original elements in terms of algorithm design. The PE algorithm has been applied in this context in prior work (cited below), and the idea of using CORRAL has already been explored in similar settings, such as in Misspecified Gaussian Process Bandit Optimization. However, I only find this to be a minor weakness of the paper. 2. Terminology: I like the terminology of “attacks” to distinguish it from the classical “corrupted” setting. However, i
The paper advances the state of the art on algorithms robust to adversarial attacks. The paper is well-written and the relationship/improvement relative to previous work is well described.
The technical contribution is quite weak. For instance, the algorithmic approaches follow previous work and the analysis is not very involved.
Videos
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Bandit Algorithms Research · Data Stream Mining Techniques
