Counterfactual Learning with General Data-generating Policies

Yusuke Narita; Kyohei Okumura; Akihiro Shimizu; Kohei Yata

arXiv:2212.01925·cs.LG·December 6, 2022

Counterfactual Learning with General Data-generating Policies

Yusuke Narita, Kyohei Okumura, Akihiro Shimizu, Kohei Yata

PDF

Open Access 1 Video

TL;DR

This paper introduces a new off-policy evaluation method capable of handling both full support and deficient support logging policies, including deterministic policies, with proven convergence and practical validation.

Contribution

It extends off-policy evaluation to a broader class of logging policies, including deterministic ones, with theoretical guarantees and real-world application.

Findings

01

Method converges to true policy performance as data increases

02

Validated on deterministic and partly deterministic logging policies

03

Applied to online platform coupon targeting to improve policies

Abstract

Off-policy evaluation (OPE) attempts to predict the performance of counterfactual policies using log data from a different policy. We extend its applicability by developing an OPE method for a class of both full support and deficient support logging policies in contextual-bandit settings. This class includes deterministic bandit (such as Upper Confidence Bound) as well as deterministic decision-making based on supervised and unsupervised learning. We prove that our method's prediction converges in probability to the true performance of a counterfactual policy as the sample size increases. We validate our method with experiments on partly and entirely deterministic logging policies. Finally, we apply it to evaluate coupon targeting policies by a major online platform and show how to improve the existing policy.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Counterfactual Learning with General Data-generating Policies· underline

Taxonomy

TopicsSmart Grid Energy Management · Recommender Systems and Techniques · Mobile Crowdsensing and Crowdsourcing