Combinatorial Bandits for Maximum Value Reward Function under Max   Value-Index Feedback

Yiliu Wang; Wei Chen; and Milan Vojnovi\'c

arXiv:2305.16074·cs.LG·May 26, 2023·1 cites

Combinatorial Bandits for Maximum Value Reward Function under Max Value-Index Feedback

Yiliu Wang, Wei Chen, and Milan Vojnovi\'c

PDF

Open Access 1 Repo

TL;DR

This paper introduces a new combinatorial bandit problem with a novel feedback structure, proposing an algorithm with regret bounds comparable to more informative feedback settings, and demonstrates its effectiveness through experiments.

Contribution

It presents a new feedback structure for combinatorial bandits, along with an algorithm and regret analysis for maximum value reward functions.

Findings

01

The algorithm achieves $O((k/\Delta)\log(T))$ regret bound.

02

The regret bounds are comparable to semi-bandit feedback scenarios.

03

Experimental results confirm the algorithm's effectiveness.

Abstract

We consider a combinatorial multi-armed bandit problem for maximum value reward function under maximum value and index feedback. This is a new feedback structure that lies in between commonly studied semi-bandit and full-bandit feedback structures. We propose an algorithm and provide a regret bound for problem instances with stochastic arm outcomes according to arbitrary distributions with finite supports. The regret analysis rests on considering an extended set of arms, associated with values and probabilities of arm outcomes, and applying a smoothness condition. Our algorithm achieves a $O ((k /Δ) lo g (T))$ distribution-dependent and a $\tilde{O} (T)$ distribution-independent regret where $k$ is the number of arms selected in each round, $Δ$ is a distribution-dependent reward gap and $T$ is the horizon time. Perhaps surprisingly, the regret bound is comparable to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sketch-exp/kmax
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Decision-Making and Behavioral Economics