Learning from Logged Implicit Exploration Data

Alex Strehl; John Langford; Sham Kakade; Lihong Li

arXiv:1003.0120·cs.LG·June 15, 2010·107 cites

Learning from Logged Implicit Exploration Data

Alex Strehl, John Langford, Sham Kakade, Lihong Li

PDF

Open Access 1 Repo

TL;DR

This paper develops a theoretical foundation for learning from logged implicit exploration data in contextual bandit problems, removing the need for explicit randomization or control during data collection.

Contribution

It introduces methods that enable policy learning from nonrandom, logged data without requiring explicit exploration policies or randomization, expanding applicability in real-world scenarios.

Findings

01

Validated on Yahoo! data sets showing effective policy learning

02

Achieved consistent and sound theoretical guarantees

03

Extended offline learning capabilities to nonrandom logged data

Abstract

We provide a sound and consistent foundation for the use of \emph{nonrandom} exploration data in "contextual bandit" or "partially labeled" settings where only the value of a chosen action is learned. The primary challenge in a variety of settings is that the exploration policy, in which "offline" data is logged, is not explicitly known. Prior solutions here require either control of the actions during the learning process, recorded random exploration, or actions chosen obliviously in a repeated manner. The techniques reported here lift these restrictions, allowing the learning of a policy for choosing actions given features from historical data where no randomization occurred or was logged. We empirically verify our solution on two reasonably sized sets of real-world data obtained from Yahoo!.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

PlaytikaOSS/pybandits
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Data Stream Mining Techniques