Combining Experimental and Historical Data for Policy Evaluation

Ting Li; Chengchun Shi; Qianglin Wen; Yang Sui; Yongli Qin; Chunbo; Lai; Hongtu Zhu

arXiv:2406.00317·stat.ML·June 4, 2024

Combining Experimental and Historical Data for Policy Evaluation

Ting Li, Chengchun Shi, Qianglin Wen, Yang Sui, Yongli Qin, Chunbo, Lai, Hongtu Zhu

PDF

Open Access 1 Repo

TL;DR

This paper introduces new methods for policy evaluation that combine experimental and historical data sources, optimizing for minimal mean square error and robustness, with theoretical guarantees and practical validation.

Contribution

It proposes novel data integration techniques that linearly combine estimators from experimental and historical data, with optimized weights and robustness features, extending to sequential decision making.

Findings

01

Proposed estimators outperform existing methods in simulations.

02

Theoretical error bounds and robustness properties are established.

03

Real-data experiments demonstrate superior performance.

Abstract

This paper studies policy evaluation with multiple data sources, especially in scenarios that involve one experimental dataset with two arms, complemented by a historical dataset generated under a single control arm. We propose novel data integration methods that linearly integrate base policy value estimators constructed based on the experimental and historical data, with weights optimized to minimize the mean square error (MSE) of the resulting combined estimator. We further apply the pessimistic principle to obtain more robust estimators, and extend these developments to sequential decision making. Theoretically, we establish non-asymptotic error bounds for the MSEs of our proposed estimators, and derive their oracle, efficiency and robustness properties across a broad spectrum of reward shift scenarios. Numerical experiments and real-data-based analyses from a ridesharing company…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tingstat/Data_Combination
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Transportation and Mobility Innovations

MethodsBalanced Selection