Heterogeneous Multi-Player Multi-Armed Bandits Robust To Adversarial Attacks
Akshayaa Magesh, Venugopal V. Veeravalli

TL;DR
This paper studies a multi-player multi-armed bandit problem with heterogeneous rewards and adversarial attacks, proposing a communication-based policy that achieves near-optimal regret despite adversarial interference.
Contribution
It introduces a new policy allowing players to communicate briefly, effectively mitigating adversarial attacks in a heterogeneous multi-player bandit setting.
Findings
Achieves near order optimal regret of O(log^{1+δ} T + W)
Handles multiple adversarial attacks per time step
Supports heterogeneous reward distributions across players
Abstract
We consider a multi-player multi-armed bandit setting in the presence of adversaries that attempt to negatively affect the rewards received by the players in the system. The reward distributions for any given arm are heterogeneous across the players. In the event of a collision (more than one player choosing the same arm), all the colliding users receive zero rewards. The adversaries use collisions to affect the rewards received by the players, i.e., if an adversary attacks an arm, any player choosing that arm will receive zero reward. At any time step, the adversaries may attack more than one arm. It is assumed that the players in the system do not deviate from a pre-determined policy used by all the players, and that the probability that none of the arms face adversarial attacks is strictly positive at every time step. In order to combat the adversarial attacks, the players are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Adversarial Robustness in Machine Learning · Data Stream Mining Techniques
