Statistical Inference for Online Decision-Making: In a Contextual Bandit   Setting

Haoyu Chen; Wenbin Lu; Rui Song

arXiv:2010.07283·stat.ML·October 15, 2020

Statistical Inference for Online Decision-Making: In a Contextual Bandit Setting

Haoyu Chen, Wenbin Lu, Rui Song

PDF

TL;DR

This paper develops statistical inference methods for online decision-making in a contextual bandit setting, establishing asymptotic normality of estimators under correct and misspecified models, with applications to real data.

Contribution

It introduces asymptotic normality results for online estimators in contextual bandits, including under model misspecification, using martingale CLT techniques.

Findings

01

Online OLS estimator is asymptotically normal.

02

Weighted least squares estimator remains normal under misspecification.

03

In-sample inverse propensity weighted value estimator is asymptotically normal.

Abstract

Online decision-making problem requires us to make a sequence of decisions based on incremental information. Common solutions often need to learn a reward model of different actions given the contextual information and then maximize the long-term reward. It is meaningful to know if the posited model is reasonable and how the model performs in the asymptotic sense. We study this problem under the setup of the contextual bandit framework with a linear reward model. The $ε$ -greedy policy is adopted to address the classic exploration-and-exploitation dilemma. Using the martingale central limit theorem, we show that the online ordinary least squares estimator of model parameters is asymptotically normal. When the linear model is misspecified, we propose the online weighted least squares estimator using the inverse propensity score weighting and also establish its asymptotic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.