Adapting multi-armed bandits policies to contextual bandits scenarios

David Cortes

arXiv:1811.04383·cs.LG·November 26, 2019·25 cites

Adapting multi-armed bandits policies to contextual bandits scenarios

David Cortes

PDF

Open Access 2 Repos

TL;DR

This paper adapts multi-armed bandit policies to contextual bandit scenarios with binary rewards, using classification algorithms and randomness techniques to improve scalability and flexibility over existing methods.

Contribution

It introduces scalable adaptations of bandit policies for contextual scenarios, leveraging classification algorithms and randomness methods like bootstrapping.

Findings

01

Adaptive-Greedy outperforms UCB and Thompson sampling in many cases

02

The methods are more scalable and flexible with any classification algorithm

03

Achieves better performance with more hyperparameters to tune

Abstract

This work explores adaptations of successful multi-armed bandits policies to the online contextual bandits scenario with binary rewards using binary classification algorithms such as logistic regression as black-box oracles. Some of these adaptations are achieved through bootstrapping or approximate bootstrapping, while others rely on other forms of randomness, resulting in more scalable approaches than previous works, and the ability to work with any type of classification algorithm. In particular, the Adaptive-Greedy algorithm shows a lot of promise, in many cases achieving better performance than upper confidence bound and Thompson sampling strategies, at the expense of more hyperparameters to tune.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Smart Grid Energy Management

MethodsLogistic Regression