Adversarial Combinatorial Bandits with General Non-linear Reward   Functions

Xi Chen; Yanjun Han; Yining Wang

arXiv:2101.01301·stat.ML·January 6, 2021·6 cites

Adversarial Combinatorial Bandits with General Non-linear Reward Functions

Xi Chen, Yanjun Han, Yining Wang

PDF

Open Access 1 Video

TL;DR

This paper investigates adversarial combinatorial bandits with general non-linear reward functions, establishing minimax regret bounds that reveal a fundamental difference from linear reward cases and applying findings to online recommendation.

Contribution

It provides the first minimax regret bounds for non-linear reward functions in adversarial combinatorial bandits, highlighting a significant gap from linear reward scenarios.

Findings

01

Regret bounds of ( ilde{\Theta}_d(\u221a{N^d T}) for polynomial rewards

02

Regret bounds of ((\u2208{K}((\u221a{N^K T}) for non-polynomial rewards

03

Application to adversarial assortment optimization showing the need to treat each assortment independently.

Abstract

In this paper we study the adversarial combinatorial bandit with a known non-linear reward function, extending existing work on adversarial linear combinatorial bandit. {The adversarial combinatorial bandit with general non-linear reward is an important open problem in bandit literature, and it is still unclear whether there is a significant gap from the case of linear reward, stochastic bandit, or semi-bandit feedback.} We show that, with $N$ arms and subsets of $K$ arms being chosen at each of $T$ time periods, the minimax optimal regret is $Θ_{d} (N^{d} T)$ if the reward function is a $d$ -degree polynomial with $d < K$ , and $Θ_{K} (N^{K} T)$ if the reward function is not a low-degree polynomial. {Both bounds are significantly different from the bound $O (poly (N, K) T)$ for the linear case, which suggests that there is a fundamental gap between…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Adversarial Combinatorial Bandits with General Non-linear Reward Functions· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Auction Theory and Applications