Nonparametric Bandits with Covariates

Philippe Rigollet; Assaf Zeevi

arXiv:1003.1630·math.ST·March 9, 2010·COLT·76 cites

Nonparametric Bandits with Covariates

Philippe Rigollet, Assaf Zeevi

PDF

Open Access

TL;DR

This paper addresses a nonparametric bandit problem with covariates, deriving lower bounds and proposing an algorithm that nearly achieves optimal performance by localizing the problem, blending nonparametric statistics with bandit methods.

Contribution

It introduces a novel approach to nonparametric bandits with covariates, providing performance bounds and an algorithm that nearly attains these bounds.

Findings

01

Derived general lower bounds on bandit performance with covariates.

02

Developed an algorithm that nearly matches the lower bounds up to logarithmic factors.

03

Unified ideas from nonparametric statistics and bandit theory.

Abstract

We consider a bandit problem which involves sequential sampling from two populations (arms). Each arm produces a noisy reward realization which depends on an observable random covariate. The goal is to maximize cumulative expected reward. We derive general lower bounds on the performance of any admissible policy, and develop an algorithm whose performance achieves the order of said lower bound up to logarithmic terms. This is done by decomposing the global problem into suitably "localized" bandit problems. Proofs blend ideas from nonparametric statistics and traditional methods used in the bandit literature.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Consumer Market Behavior and Pricing · Reinforcement Learning in Robotics