Factored Bandits

Julian Zimmert; Yevgeny Seldin

arXiv:1807.01488·cs.LG·October 30, 2018

Factored Bandits

Julian Zimmert, Yevgeny Seldin

PDF

Open Access

TL;DR

This paper introduces the factored bandits model, a flexible framework for learning with limited feedback where actions are decomposed into atomic parts, providing algorithms with optimal regret bounds and applications to dueling bandits.

Contribution

It proposes a new factored bandits model that relaxes previous assumptions, along with an anytime algorithm achieving near-optimal regret bounds and adaptable to utility-based dueling bandits.

Findings

01

Matching upper and lower regret bounds up to constants

02

Algorithm applicable to utility-based dueling bandits

03

Improved regret bounds over state-of-the-art methods

Abstract

We introduce the factored bandits model, which is a framework for learning with limited (bandit) feedback, where actions can be decomposed into a Cartesian product of atomic actions. Factored bandits incorporate rank-1 bandits as a special case, but significantly relax the assumptions on the form of the reward function. We provide an anytime algorithm for stochastic factored bandits and up to constants matching upper and lower regret bounds for the problem. Furthermore, we show that with a slight modification the proposed algorithm can be applied to utility based dueling bandits. We obtain an improvement in the additive terms of the regret bound compared to state of the art algorithms (the additive terms are dominating up to time horizons which are exponential in the number of arms).

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Auction Theory and Applications · Data Stream Mining Techniques