Greedy Algorithm almost Dominates in Smoothed Contextual Bandits
Manish Raghavan, Aleksandrs Slivkins, Jennifer Wortman Vaughan, Zhiwei, Steven Wu

TL;DR
This paper demonstrates that in smoothed linear contextual bandits, a greedy algorithm nearly matches the best possible regret rates under certain diversity conditions, reducing the need for explicit exploration.
Contribution
It improves existing results by showing greedy algorithms perform nearly optimally in smoothed contexts, under specific diversity assumptions.
Findings
Greedy algorithm achieves near-optimal Bayesian regret in smoothed linear bandits.
Regret bound of approximately O(T^{1/3}) under diversity conditions.
Explicit exploration may be unnecessary in diverse data environments.
Abstract
Online learning algorithms, widely used to power search and content optimization on the web, must balance exploration and exploitation, potentially sacrificing the experience of current users in order to gain information that will lead to better decisions in the future. While necessary in the worst case, explicit exploration has a number of disadvantages compared to the greedy algorithm that always "exploits" by choosing an action that currently looks optimal. We ask under what conditions inherent diversity in the data makes explicit exploration unnecessary. We build on a recent line of work on the smoothed analysis of the greedy algorithm in the linear contextual bandits model. We improve on prior results to show that a greedy approach almost matches the best possible Bayesian regret rate of any other algorithm on the same problem instance whenever the diversity conditions hold, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Optimization and Search Problems
