Nonparametric Bandits with Covariates
Philippe Rigollet, Assaf Zeevi

TL;DR
This paper addresses a nonparametric bandit problem with covariates, deriving lower bounds and proposing an algorithm that nearly achieves optimal performance by localizing the problem, blending nonparametric statistics with bandit methods.
Contribution
It introduces a novel approach to nonparametric bandits with covariates, providing performance bounds and an algorithm that nearly attains these bounds.
Findings
Derived general lower bounds on bandit performance with covariates.
Developed an algorithm that nearly matches the lower bounds up to logarithmic factors.
Unified ideas from nonparametric statistics and bandit theory.
Abstract
We consider a bandit problem which involves sequential sampling from two populations (arms). Each arm produces a noisy reward realization which depends on an observable random covariate. The goal is to maximize cumulative expected reward. We derive general lower bounds on the performance of any admissible policy, and develop an algorithm whose performance achieves the order of said lower bound up to logarithmic terms. This is done by decomposing the global problem into suitably "localized" bandit problems. Proofs blend ideas from nonparametric statistics and traditional methods used in the bandit literature.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Consumer Market Behavior and Pricing · Reinforcement Learning in Robotics
