Byzantine-Resilient Decentralized Multi-Armed Bandits
Jingxuan Zhu, Alec Koppel, Alvaro Velasquez, Ji Liu

TL;DR
This paper introduces a decentralized resilient UCB algorithm for multi-armed bandits that maintains performance despite Byzantine agents, improving collective regret in adversarial environments through information fusion and truncation.
Contribution
It develops a fully decentralized resilient UCB algorithm that handles Byzantine agents, ensuring normal agents' regret matches single-agent performance and improves collective regret.
Findings
Normal agents' regret is no worse than single-agent UCB1.
Collective regret is strictly better with sufficient neighbors.
Algorithm performs well in experiments under adversarial conditions.
Abstract
In decentralized cooperative multi-armed bandits (MAB), each agent observes a distinct stream of rewards, and seeks to exchange information with others to select a sequence of arms so as to minimize its regret. Agents in the cooperative setting can outperform a single agent running a MAB method such as Upper-Confidence Bound (UCB) independently. In this work, we study how to recover such salient behavior when an unknown fraction of the agents can be Byzantine, that is, communicate arbitrarily wrong information in the form of reward mean-estimates or confidence sets. This framework can be used to model attackers in computer networks, instigators of offensive content into recommender systems, or manipulators of financial markets. Our key contribution is the development of a fully decentralized resilient upper confidence bound (UCB) algorithm that fuses an information mixing step among…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Blockchain Technology Applications and Security · Age of Information Optimization
