Sample Complexity of an Adversarial Attack on UCB-based Best-arm   Identification Policy

Varsha Pendyala

arXiv:2209.05692·cs.LG·September 14, 2022

Sample Complexity of an Adversarial Attack on UCB-based Best-arm Identification Policy

Varsha Pendyala

PDF

Open Access

TL;DR

This paper analyzes the sample complexity of an adversarial attack on a UCB-based best-arm identification policy in stochastic multi-armed bandits, providing theoretical bounds on the attack's effectiveness.

Contribution

It derives the sample complexity needed for an adversarially targeted arm to be identified as the best, based on the attack model and UCB stopping conditions.

Findings

01

The attack can cause the target arm to be identified as best within T rounds.

02

Sample complexity depends on the number of arms and the reward variance parameter.

03

The stopping condition can be achieved by the target arm under the attack model.

Abstract

In this work I study the problem of adversarial perturbations to rewards, in a Multi-armed bandit (MAB) setting. Specifically, I focus on an adversarial attack to a UCB type best-arm identification policy applied to a stochastic MAB. The UCB attack presented in [1] results in pulling a target arm K very often. I used the attack model of [1] to derive the sample complexity required for selecting target arm K as the best arm. I have proved that the stopping condition of UCB based best-arm identification algorithm given in [2], can be achieved by the target arm K in T rounds, where T depends only on the total number of arms and $σ$ parameter of $σ^{2} -$ sub-Gaussian random rewards of the arms.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Adversarial Robustness in Machine Learning · Advanced Bandit Algorithms Research