Multi-armed Bandit Algorithm against Strategic Replication
Suho Shin, Seungjoon Lee, Jungseul Ok

TL;DR
This paper introduces Hierarchical UCB algorithms designed to prevent strategic replication in multi-armed bandit problems, achieving low regret even with irrational agents and demonstrating effectiveness through theoretical analysis and experiments.
Contribution
The paper proposes replication-proof Hierarchical UCB algorithms that mitigate strategic arm replication and maintain low regret in multi-armed bandit settings.
Findings
H-UCB achieves $O( ext{log } T)$ regret under equilibrium.
RH-UCB maintains sublinear regret with irrational agents.
Algorithms are validated through numerical experiments.
Abstract
We consider a multi-armed bandit problem in which a set of arms is registered by each agent, and the agent receives reward when its arm is selected. An agent might strategically submit more arms with replications, which can bring more reward by abusing the bandit algorithm's exploration-exploitation balance. Our analysis reveals that a standard algorithm indeed fails at preventing replication and suffers from linear regret in time . We aim to design a bandit algorithm which demotivates replications and also achieves a small cumulative regret. We devise Hierarchical UCB (H-UCB) of replication-proof, which has -regret under any equilibrium. We further propose Robust Hierarchical UCB (RH-UCB) which has a sublinear regret even in a realistic scenario with irrational agents replicating careless. We verify our theoretical findings through numerical experiments.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Auction Theory and Applications
