Adversarial Combinatorial Bandits with General Non-linear Reward Functions
Xi Chen, Yanjun Han, Yining Wang

TL;DR
This paper investigates adversarial combinatorial bandits with general non-linear reward functions, establishing minimax regret bounds that reveal a fundamental difference from linear reward cases and applying findings to online recommendation.
Contribution
It provides the first minimax regret bounds for non-linear reward functions in adversarial combinatorial bandits, highlighting a significant gap from linear reward scenarios.
Findings
Regret bounds of ( ilde{\Theta}_d(\u221a{N^d T}) for polynomial rewards
Regret bounds of ((\u2208{K}((\u221a{N^K T}) for non-polynomial rewards
Application to adversarial assortment optimization showing the need to treat each assortment independently.
Abstract
In this paper we study the adversarial combinatorial bandit with a known non-linear reward function, extending existing work on adversarial linear combinatorial bandit. {The adversarial combinatorial bandit with general non-linear reward is an important open problem in bandit literature, and it is still unclear whether there is a significant gap from the case of linear reward, stochastic bandit, or semi-bandit feedback.} We show that, with arms and subsets of arms being chosen at each of time periods, the minimax optimal regret is if the reward function is a -degree polynomial with , and if the reward function is not a low-degree polynomial. {Both bounds are significantly different from the bound for the linear case, which suggests that there is a fundamental gap between…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Auction Theory and Applications
