Counterfactual Density Estimation using Kernel Stein Discrepancies
Diego Martinez-Taboada, Edward H. Kennedy

TL;DR
This paper introduces a novel method for estimating counterfactual distributions using kernel Stein discrepancies, enabling robust and flexible causal inference beyond mean effects.
Contribution
It proposes a doubly robust approach for modeling counterfactual densities within a known class, with theoretical guarantees and empirical validation.
Findings
The estimator is consistent under certain conditions.
It achieves asymptotic normality.
Empirical results demonstrate effective counterfactual density estimation.
Abstract
Causal effects are usually studied in terms of the means of counterfactual distributions, which may be insufficient in many scenarios. Given a class of densities known up to normalizing constants, we propose to model counterfactual distributions by minimizing kernel Stein discrepancies in a doubly robust manner. This enables the estimation of counterfactuals over large classes of distributions while exploiting the desired double robustness. We present a theoretical analysis of the proposed estimator, providing sufficient conditions for consistency and asymptotic normality, as well as an examination of its empirical performance.
Peer Reviews
Decision·ICLR 2024 poster
Originality: It appears that the estimator is a new combination of known tools, namely doubly robust estimation, cross fitting, and kernel Stein discrepancies, for a different task than previous work that combined these tools (Lam and Zhang 2023). The closest pieces of work are Fawkes et al. (2022) and Martinez-Taboada et al. (2023), which use kernel mean embeddings instead of kernel Stein discrepancies. All of these papers are well cited. Quality: The results are generally high quality, thoug
1. In my evaluation, the analysis is a direct extension of Martinez-Taboada et al. (2023), replacing features of Y with the xi object evaluated at Y. 2. I would like to see more discussion of how good the optimization of the parameter must be for these results to be applicable. 3. The paragraph beginning with “we underscore that…” was confusing.
- MKSD estimators have primarily been employed for conducting goodness-of-fit tests and sample quality analysis. Nevertheless, in the counterfactual context of this study, the MKSD estimator had not been previously proposed. - Conversely, under certain assumptions, the distribution of either counterfactual can be expressed in terms of observational data. This opens the possibility of using MKSD as the primary tool to address the counterfactual distribution estimation problem. - The paper is
Numerous other studies have explored semiparametric estimators within the debiased machine learning framework for counterfactual density estimation, such as: [1] Mou, Wenlong, Martin J. Wainwright, and Peter L. Bartlett. "Off-policy estimation of linear functionals: Non-asymptotic theory for semi-parametric efficiency." _arXiv preprint arXiv:2209.13075_ (2022). [2] Mou, Wenlong, et al. "Kernel-based off-policy estimation without overlap: Instance optimality beyond semiparametric efficiency." _
1. This paper is technically strong. It presents a complicated theory on semiparametric inference in a comprehensive manner, which is well-written. 2. Related works provide a comprehensive summary of the relevant literature.
1. This paper could be improved by adding a discussion on the comparison between the KSD-based method and the projecting-based method (e.g., Kennedy et al., 2021). Readers may be interested in understanding the reasons or practical guidelines for choosing the KSD-based method over other methods. 2. I am concerned about the sample complexity of the KSD-based method, as it appears to have a time complexity of $O(n^2)$. Could you please discuss how the time complexity of this method compares to oth
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Bayesian Inference · Statistical Distribution Estimation and Applications
