Combinatorial Bandits without Total Order for Arms

Shuo Yang; Tongzheng Ren; Inderjit S. Dhillon; Sujay Sanghavi

arXiv:2103.02741·cs.LG·March 5, 2021

Combinatorial Bandits without Total Order for Arms

Shuo Yang, Tongzheng Ren, Inderjit S. Dhillon, Sujay Sanghavi

PDF

Open Access

TL;DR

This paper introduces a new combinatorial bandit model with set-dependent rewards and no total order among arms, proposing an UCB algorithm with near-optimal regret bounds and broad empirical validation.

Contribution

The paper formulates a novel reward model for combinatorial bandits without total order and develops a near-optimal UCB algorithm with rigorous regret analysis.

Findings

01

Achieves $O(k^2 n rac{ ext{log} T}{ ext{epsilon}})$ regret bound

02

Achieves $O(k^2 ext{sqrt}{n T ext{log} T})$ regret bound

03

Empirical results demonstrate broad applicability

Abstract

We consider the combinatorial bandits problem, where at each time step, the online learner selects a size- $k$ subset $s$ from the arms set $A$ , where $∣ A ∣ = n$ , and observes a stochastic reward of each arm in the selected set $s$ . The goal of the online learner is to minimize the regret, induced by not selecting $s^{*}$ which maximizes the expected total reward. Specifically, we focus on a challenging setting where 1) the reward distribution of an arm depends on the set $s$ it is part of, and crucially 2) there is \textit{no total order} for the arms in $A$ . In this paper, we formally present a reward model that captures set-dependent reward distribution and assumes no total order for arms. Correspondingly, we propose an Upper Confidence Bound (UCB) algorithm that maintains UCB for each individual arm and selects the arms with top- $k$ UCB. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Auction Theory and Applications