Saving Stochastic Bandits from Poisoning Attacks via Limited Data Verification
Anshuka Rangi, Long Tran-Thanh, Haifeng Xu, Massimo Franceschetti

TL;DR
This paper investigates the vulnerability of bandit algorithms to data poisoning attacks and proposes verification-based methods, including Secure-UCB and Secure-BARBAR, to effectively mitigate such attacks with limited verification resources.
Contribution
It introduces verification mechanisms for bandit algorithms that restore optimal regret under poisoning attacks, providing both upper and lower bounds on verification requirements.
Findings
Verification reduces attack impact to achieve near-optimal regret.
Secure-UCB and modified ETC algorithms recover from attacks with O(log T) verifications.
Secure-BARBAR achieves sublinear regret with bounded verification budget.
Abstract
We study bandit algorithms under data poisoning attacks in a bounded reward setting. We consider a strong attacker model in which the attacker can observe both the selected actions and their corresponding rewards and can contaminate the rewards with additive noise. We show that any bandit algorithm with regret can be forced to suffer a regret with an expected amount of contamination . This amount of contamination is also necessary, as we prove that there exists an regret bandit algorithm, specifically the classical UCB, that requires amount of contamination to suffer regret . To combat such attacks, our second main contribution is to propose verification based mechanisms, which use limited verification to access a limited number of uncontaminated rewards. In particular, for the case of unlimited verifications, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Adversarial Robustness in Machine Learning
