High Dimensional Causal Inference with Variational Backdoor Adjustment
Daniel Israel, Aditya Grover, Guy Van den Broeck

TL;DR
This paper introduces a generative modeling approach using variational inference for high-dimensional causal inference via backdoor adjustment, effectively handling complex confounders without proxies.
Contribution
It presents the first method applying backdoor adjustment to high-dimensional variables, overcoming tractability and identifiability challenges in causal inference.
Findings
Accurately estimates interventional likelihood in high-dimensional settings
Demonstrates effectiveness on semi-synthetic X-ray medical data
Handles high-dimensional confounders without proxy variables
Abstract
Backdoor adjustment is a technique in causal inference for estimating interventional quantities from purely observational data. For example, in medical settings, backdoor adjustment can be used to control for confounding and estimate the effectiveness of a treatment. However, high dimensional treatments and confounders pose a series of potential pitfalls: tractability, identifiability, optimization. In this work, we take a generative modeling approach to backdoor adjustment for high dimensional treatments and confounders. We cast backdoor adjustment as an optimization problem in variational inference without reliance on proxy variables and hidden confounders. Empirically, our method is able to estimate interventional likelihood in a variety of high dimensional settings, including semi-synthetic X-ray medical data. To the best of our knowledge, this is the first application of backdoor…
Peer Reviews
Decision·ICLR 2024 Conference Withdrawn Submission
- The introduction of Variational Backdoor Adjustment (VBA) presents an innovative solution for handling high-dimensional datasets, offering a fresh perspective on causal inference. - The paper provides empirical evidence of the effectiveness of VBA in computing backdoor adjustments, including synthetic and real-world data scenarios. This empirical support strengthens the paper's credibility. - The paper appears well-structured and clear in presenting the methodology and its empirical validati
1/ There is no comparison with any baseline making it hard to justify for the perforance of the proposed method. I understand that the proposed method is for high dimensional data. But it should be comparable with some baselines. Is it possible to compare with some popular methods such as BART, X-learner, R-learner, CFRnet, TARnet, etc? 2/ It is unclear on how do you calculate causal effect after training the model. Do you use Eq. (4) as an approximation for log p(y | do(x))?
The authors show an ELBO for the interventional log-likelihood $p(y|do(x))$ under a measured confounder set $Z$. The authors demonstrate a way to optimize the ELBO using a two-stage optimization procedure: in the first stage, the three models are trained separately using the MLE. In the second stage, the proposal distribution's parameters are optimized to increase the ELBO while holding the other two models fixed. The experiments show that sampling from $q(z|x, y)$ leads to lower variance than
There are weaknesses in both the theoretical contributions as well as the experimental setup. ## Weaknesses in the theoretical results. ### Re ELBO in Eq. 4: The main concern with the lower bound in Eq. 4 is when equality will hold, i.e., under what conditions is the ELBO equal to the interventional distribution in Eq. 2? In the subsequent paragraph, the authors say that "penalty incurred by encoder will be its KL divergence with ... $p(z)$". The is incorrect: the (Jensen) gap between Eq. 2
The derivations of this paper seem technically correct and intuitive. The authors discuss the issues of directly applying the backdoor adjustment formula and joint training. This should help the reader to understand the importance of this problem. The authors provided almost every detail about the experiments which will definitely help reproducing the results easily. The experiment section of this paper is quite rigorous. It contains one synthetic and two semi-synthetic experiments with high dim
Here, I provide the weaknesses of this paper and some concerns. [Section 2] * The authors should discuss existing neural causal models [1][2][3] that can sample from identifiable interventional distributions. These methods are suitable for training on high-dimensional data and can sample from interventional distributions. [Section 3.2] * The authors mentioned, “sampled high dimensional Z will almost never give high probability to p(y | x, z) for a chosen Y and X.” I would request the authors t
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Causal Inference Techniques · Health Systems, Economic Evaluations, Quality of Life · Statistical Methods and Inference
MethodsVariational Inference · Causal inference
