Statistical Inference on Multi-armed Bandits with Delayed Feedback

Lei Shi; Jingshen Wang; Tianhao Wu

arXiv:2307.00752·stat.ME·July 4, 2023·ICML·1 cites

Statistical Inference on Multi-armed Bandits with Delayed Feedback

Lei Shi, Jingshen Wang, Tianhao Wu

PDF

Open Access 1 Repo

TL;DR

This paper develops a statistical inference framework for multi-armed bandit policies with delayed feedback, enabling valid uncertainty quantification and policy evaluation in complex, real-world scenarios.

Contribution

It introduces an adaptively weighted estimator that accounts for arm-dependent delays and does not rely on estimating the delay mechanism, with proven asymptotic normality.

Findings

01

Estimator achieves consistency under delay conditions.

02

Finite-sample performance is validated via Monte Carlo simulations.

03

Provides asymptotic normality guarantees for large samples.

Abstract

Multi armed bandit (MAB) algorithms have been increasingly used to complement or integrate with A/B tests and randomized clinical trials in e-commerce, healthcare, and policymaking. Recent developments incorporate possible delayed feedback. While existing MAB literature often focuses on maximizing the expected cumulative reward outcomes (or, equivalently, regret minimization), few efforts have been devoted to establish valid statistical inference approaches to quantify the uncertainty of learned policies. We attempt to fill this gap by providing a unified statistical inference framework for policy evaluation where a target policy is allowed to differ from the data collecting policy, and our framework allows delay to be associated with the treatment arms. We present an adaptively weighted estimator that on one hand incorporates the arm-dependent delaying mechanism to achieve consistency,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

leishi-rocks/delaybandits
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Healthcare Operations and Scheduling Optimization · Advanced Causal Inference Techniques