Combinatorial Multi-Armed Bandit with General Reward Functions

Wei Chen; Wei Hu; Fu Li; Jian Li; Yu Liu; Pinyan Lu

arXiv:1610.06603·cs.LG·July 23, 2018·73 cites

Combinatorial Multi-Armed Bandit with General Reward Functions

Wei Chen, Wei Hu, Fu Li, Jian Li, Yu Liu, Pinyan Lu

PDF

Open Access

TL;DR

This paper introduces a new algorithm for stochastic combinatorial multi-armed bandits with general nonlinear reward functions, achieving near-optimal regret bounds and enabling solutions for complex problems like $K$-MAX.

Contribution

The paper proposes the SDCB algorithm that estimates entire distributions for complex reward functions, extending bandit techniques beyond mean-based methods.

Findings

01

Achieves $O(\log T)$ distribution-dependent regret.

02

Achieves $ ilde{O}(\sqrt T)$ distribution-independent regret.

03

Provides a polynomial-time approximation scheme for $K$-MAX.

Abstract

In this paper, we study the stochastic combinatorial multi-armed bandit (CMAB) framework that allows a general nonlinear reward function, whose expected value may not depend only on the means of the input random variables but possibly on the entire distributions of these variables. Our framework enables a much larger class of reward functions such as the $max ()$ function and nonlinear utility functions. Existing techniques relying on accurate estimations of the means of random variables, such as the upper confidence bound (UCB) technique, do not work directly on these functions. We propose a new algorithm called stochastically dominant confidence bound (SDCB), which estimates the distributions of underlying random variables and their stochastically dominant confidence bounds. We prove that SDCB can achieve $O (lo g T)$ distribution-dependent regret and $\tilde{O} (T)$ …

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Machine Learning and Algorithms