Off-policy Learning for Multiple Loggers

Li He; Long Xia; Wei Zeng; Zhi-Ming Ma; Yihong Zhao; and Dawei Yin

arXiv:1907.09652·stat.ML·August 6, 2019·1 cites

Off-policy Learning for Multiple Loggers

Li He, Long Xia, Wei Zeng, Zhi-Ming Ma, Yihong Zhao, and Dawei Yin

PDF

Open Access

TL;DR

This paper develops off-policy learning methods for scenarios with multiple historical data sources, providing theoretical analysis and algorithms that outperform existing approaches in benchmark tests.

Contribution

It introduces a novel off-policy learning framework for multiple loggers, including generalization error bounds and a constrained optimization algorithm.

Findings

01

Outperforms state-of-the-art methods on benchmark datasets

02

Provides theoretical generalization error bounds for multi-logger off-policy learning

03

Develops a minimax-based algorithm for the constrained optimization problem

Abstract

It is well known that the historical logs are used for evaluating and learning policies in interactive systems, e.g. recommendation, search, and online advertising. Since direct online policy learning usually harms user experiences, it is more crucial to apply off-policy learning in real-world applications instead. Though there have been some existing works, most are focusing on learning with one single historical policy. However, in practice, usually a number of parallel experiments, e.g. multiple AB tests, are performed simultaneously. To make full use of such historical data, learning policies from multiple loggers becomes necessary. Motivated by this, in this paper, we investigate off-policy learning when the training data coming from multiple historical policies. Specifically, policies, e.g. neural networks, can be learned directly from multi-logger data, with counterfactual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Machine Learning and Algorithms