Doubly-Robust Estimation of Counterfactual Policy Mean Embeddings
Houssam Zenati, Bariscan Bozkurt, Arthur Gretton

TL;DR
This paper introduces a novel nonparametric framework, CPME, for estimating and testing the entire distribution of outcomes under counterfactual policies using kernel methods, with improved robustness and efficiency.
Contribution
It proposes the CPME framework with both plug-in and doubly robust estimators, and develops a kernel test statistic for hypothesis testing, advancing off-policy evaluation techniques.
Findings
Doubly robust estimator improves convergence rates.
Kernel test statistic achieves asymptotic normality.
Numerical simulations show CPME outperforms existing methods.
Abstract
Estimating the distribution of outcomes under counterfactual policies is critical for decision-making in domains such as recommendation, advertising, and healthcare. We propose and analyze a novel framework-Counterfactual Policy Mean Embedding (CPME)-that represents the entire counterfactual outcome distribution in a reproducing kernel Hilbert space (RKHS), enabling flexible and nonparametric distributional off-policy evaluation. We introduce both a plug-in estimator and a doubly robust estimator; the latter enjoys improved convergence rates by correcting for bias in both the outcome embedding and propensity models. Building on this, we develop a doubly robust kernel test statistic for hypothesis testing, which achieves asymptotic normality and thus enables computationally efficient testing and straightforward construction of confidence intervals. Our framework also supports sampling…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsProbability and Risk Models · Economic Policies and Impacts · Risk and Portfolio Optimization
