Combinatorial Bandits without Total Order for Arms
Shuo Yang, Tongzheng Ren, Inderjit S. Dhillon, Sujay Sanghavi

TL;DR
This paper introduces a new combinatorial bandit model with set-dependent rewards and no total order among arms, proposing an UCB algorithm with near-optimal regret bounds and broad empirical validation.
Contribution
The paper formulates a novel reward model for combinatorial bandits without total order and develops a near-optimal UCB algorithm with rigorous regret analysis.
Findings
Achieves $O(k^2 n rac{ ext{log} T}{ ext{epsilon}})$ regret bound
Achieves $O(k^2 ext{sqrt}{n T ext{log} T})$ regret bound
Empirical results demonstrate broad applicability
Abstract
We consider the combinatorial bandits problem, where at each time step, the online learner selects a size- subset from the arms set , where , and observes a stochastic reward of each arm in the selected set . The goal of the online learner is to minimize the regret, induced by not selecting which maximizes the expected total reward. Specifically, we focus on a challenging setting where 1) the reward distribution of an arm depends on the set it is part of, and crucially 2) there is \textit{no total order} for the arms in . In this paper, we formally present a reward model that captures set-dependent reward distribution and assumes no total order for arms. Correspondingly, we propose an Upper Confidence Bound (UCB) algorithm that maintains UCB for each individual arm and selects the arms with top- UCB. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Auction Theory and Applications
