Sample Complexity of an Adversarial Attack on UCB-based Best-arm Identification Policy
Varsha Pendyala

TL;DR
This paper analyzes the sample complexity of an adversarial attack on a UCB-based best-arm identification policy in stochastic multi-armed bandits, providing theoretical bounds on the attack's effectiveness.
Contribution
It derives the sample complexity needed for an adversarially targeted arm to be identified as the best, based on the attack model and UCB stopping conditions.
Findings
The attack can cause the target arm to be identified as best within T rounds.
Sample complexity depends on the number of arms and the reward variance parameter.
The stopping condition can be achieved by the target arm under the attack model.
Abstract
In this work I study the problem of adversarial perturbations to rewards, in a Multi-armed bandit (MAB) setting. Specifically, I focus on an adversarial attack to a UCB type best-arm identification policy applied to a stochastic MAB. The UCB attack presented in [1] results in pulling a target arm K very often. I used the attack model of [1] to derive the sample complexity required for selecting target arm K as the best arm. I have proved that the stopping condition of UCB based best-arm identification algorithm given in [2], can be achieved by the target arm K in T rounds, where T depends only on the total number of arms and parameter of sub-Gaussian random rewards of the arms.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Adversarial Robustness in Machine Learning · Advanced Bandit Algorithms Research
