A Smoothed Analysis of the Greedy Algorithm for the Linear Contextual   Bandit Problem

Sampath Kannan; Jamie Morgenstern; Aaron Roth; Bo Waggoner; and Zhiwei Steven Wu

arXiv:1801.03423·cs.LG·January 11, 2018·6 cites

A Smoothed Analysis of the Greedy Algorithm for the Linear Contextual Bandit Problem

Sampath Kannan, Jamie Morgenstern, Aaron Roth, Bo Waggoner, and Zhiwei Steven Wu

PDF

Open Access

TL;DR

This paper demonstrates through smoothed analysis that the greedy algorithm for linear contextual bandits can achieve no regret in slightly perturbed environments, alleviating the exploration-exploitation conflict in sensitive decision-making settings.

Contribution

It provides a novel smoothed analysis showing the greedy algorithm's effectiveness in linear contextual bandits under adversarial contexts with small perturbations.

Findings

01

Greedy algorithm achieves no regret with small context perturbations.

02

Slight environment perturbations suffice for effective learning.

03

Exploration and exploitation can be reconciled in linear bandits under certain conditions.

Abstract

Bandit learning is characterized by the tension between long-term exploration and short-term exploitation. However, as has recently been noted, in settings in which the choices of the learning algorithm correspond to important decisions about individual people (such as criminal recidivism prediction, lending, and sequential drug trials), exploration corresponds to explicitly sacrificing the well-being of one individual for the potential future benefit of others. This raises a fairness concern. In such settings, one might like to run a "greedy" algorithm, which always makes the (myopically) optimal decision for the individuals at hand - but doing this can result in a catastrophic failure to learn. In this paper, we consider the linear contextual bandit problem and revisit the performance of the greedy algorithm. We give a smoothed analysis, showing that even when contexts may be chosen…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Optimization and Search Problems