Multi-Armed Bandits with Dependent Arms
Rahul Singh, Fang Liu, Yin Sun, Ness Shroff

TL;DR
This paper introduces a new variant of the multi-armed bandit problem where arms are grouped into clusters with known reward functions, and develops algorithms that leverage these dependencies to improve learning efficiency.
Contribution
The paper proposes UCB-based algorithms for dependent arms in clustered bandits, achieving regret bounds that scale with the number of clusters rather than total arms.
Findings
Regret grows as O(K log T), where K is the number of clusters.
Algorithms effectively utilize side observations from dependencies.
Improved regret bounds over classical UCB in dependent arm settings.
Abstract
We study a variant of the classical multi-armed bandit problem (MABP) which we call as Multi-Armed Bandits with dependent arms. More specifically, multiple arms are grouped together to form a cluster, and the reward distributions of arms belonging to the same cluster are known functions of an unknown parameter that is a characteristic of the cluster. Thus, pulling an arm not only reveals information about its own reward distribution, but also about all those arms that share the same cluster with arm . This "correlation" amongst the arms complicates the exploration-exploitation trade-off that is encountered in the MABP because the observation dependencies allow us to test simultaneously multiple hypotheses regarding the optimality of an arm. We develop learning algorithms based on the UCB principle which utilize these additional side observations appropriately while performing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Machine Learning and Algorithms
