Sequential Batch Learning in Finite-Action Linear Contextual Bandits

Yanjun Han; Zhengqing Zhou; Zhengyuan Zhou; Jose Blanchet; Peter W.; Glynn; Yinyu Ye

arXiv:2004.06321·cs.LG·April 15, 2020·31 cites

Sequential Batch Learning in Finite-Action Linear Contextual Bandits

Yanjun Han, Zhengqing Zhou, Zhengyuan Zhou, Jose Blanchet, Peter W., Glynn, Yinyu Ye

PDF

Open Access

TL;DR

This paper investigates the problem of sequential batch learning in linear contextual bandits with finite actions, providing theoretical bounds and algorithms that nearly match, thus advancing understanding of decision making under batch constraints.

Contribution

The paper introduces regret bounds and algorithms for batch-constrained linear contextual bandits, characterizing the trade-offs between batch number and performance in both arbitrary and iid context settings.

Findings

01

Regret lower bounds established for both settings.

02

Algorithms nearly matching the lower bounds are proposed.

03

Full online performance achievable with polynomial or logarithmic batch numbers.

Abstract

We study the sequential batch learning problem in linear contextual bandits with finite action sets, where the decision maker is constrained to split incoming individuals into (at most) a fixed number of batches and can only observe outcomes for the individuals within a batch at the batch's end. Compared to both standard online contextual bandits learning or offline policy learning in contexutal bandits, this sequential batch learning problem provides a finer-grained formulation of many personalized sequential decision making problems in practical applications, including medical treatment in clinical trials, product recommendation in e-commerce and adaptive experiment design in crowdsourcing. We study two settings of the problem: one where the contexts are arbitrarily generated and the other where the contexts are \textit{iid} drawn from some distribution. In each setting, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Machine Learning and Algorithms