Contextual Bandits with Similarity Information
Aleksandrs Slivkins

TL;DR
This paper introduces adaptive algorithms for contextual bandits that leverage similarity information to improve decision-making efficiency, especially in large or infinite strategy spaces, by focusing on relevant contexts and high-payoff arms.
Contribution
It proposes novel adaptive partitioning algorithms that utilize similarity information more efficiently than uniform partitioning in contextual bandit problems.
Findings
Algorithms outperform uniform partitioning methods.
Adaptive partitions focus on popular contexts and high-payoff arms.
Improved efficiency in large or infinite strategy spaces.
Abstract
In a multi-armed bandit (MAB) problem, an online algorithm makes a sequence of choices. In each round it chooses from a time-invariant set of alternatives and receives the payoff associated with this alternative. While the case of small strategy sets is by now well-understood, a lot of recent work has focused on MAB problems with exponentially or infinitely large strategy sets, where one needs to assume extra structure in order to make the problem tractable. In particular, recent literature considered information on similarity between arms. We consider similarity information in the setting of "contextual bandits", a natural extension of the basic MAB problem where before each round an algorithm is given the "context" -- a hint about the payoffs in this round. Contextual bandits are directly motivated by placing advertisements on webpages, one of the crucial problems in sponsored…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Reinforcement Learning in Robotics
