TL;DR
This paper introduces ProCI, a novel framework using large language models to iteratively identify and impute hidden confounders, significantly enhancing causal effect estimation from observational data.
Contribution
ProCI is the first method leveraging LLMs for mitigating hidden confounding by iteratively generating and validating confounders, improving causal inference accuracy.
Findings
ProCI uncovers meaningful hidden confounders in diverse datasets.
ProCI significantly improves treatment effect estimation accuracy.
ProCI demonstrates robustness through distributional reasoning strategies.
Abstract
Hidden confounding remains a central challenge in estimating treatment effects from observational data, as unobserved variables can lead to biased causal estimates. While recent work has explored the use of large language models (LLMs) for causal inference, most approaches still rely on the unconfoundedness assumption. In this paper, we make the first attempt to mitigate hidden confounding using LLMs. We propose ProCI (Progressive Confounder Imputation), a framework that elicits the semantic and world knowledge of LLMs to iteratively generate, impute, and validate hidden confounders. ProCI leverages two key capabilities of LLMs: their strong semantic reasoning ability, which enables the discovery of plausible confounders from both structured and unstructured inputs, and their embedded world knowledge, which supports counterfactual reasoning under latent confounding. To improve…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
The paper addresses an important pain point in observational causal inference and articulates a clear pipeline. Progressive confounder generation and distributional reasoning are interesting design choices that seem to empirically mitigate collapse in direct value imputation and appear to yield semantically coherent variables. The validation procedure is easy to integrate with existing estimators and the method is model-agnostic in practice. The experimental section is comprehensive for the chos
- The abstract and positioning somewhat overstate the novelty relative to the broader literature on handling unobserved confounding. Sensitivity analysis, instrumental-variable and front-door strategies, and designs that exploit temporal or quasi-experimental variation represent long-standing alternatives; I would consider tightening the claims accordingly. - More consequentially, the empirical “unconfoundedness validation” is not a test of unconfoundedness in the data-generating process but a
1. The paper is well-written and easy to follow despite the technical depth, with helpful figures and consistent terminology. 1. The idea that utilizing the world knowledge from LLM to impute unobserved confounders is novel and also interesting. This represents a creative and timely direction bridging causal inference and foundation models. 1. The proposed Unconfoundedness Validation is new, reasonable, and well aligned with the overall framework. 1. The experimental results are comprehensiv
1. The framework’s performance and reliability may vary substantially with different LLMs. It would be helpful to discuss potential limitations or safeguards when weaker or domain-mismatched models are used. 1. While Theorem 1 provides some justification, a more detailed discussion of theoretical limitations or assumptions on the LLM’s implicit confounder knowledge would strengthen the rigor. 1. How is the significance level $\alpha$ chosen in the KCIT test? Does it affect the accuracy of CATE e
- The paper clearly defines the problem, so the motivation is fair and the proposed workflow can be easily understood. The procedure of progressive imputation is conceptually consistent, and the stopping criterion with conditional independence tests provides an interpretable mechanism for validating the confounders. The use of distributional reasoning mitigates mode collapse and it is experimentaly ablated. The method is model-agnostic, making it adaptable to multiple models. - The paper propos
I have several important concerns: - In the theorem 1, and associated Lemmas 1 and 2, the authors provide asymptotic guarantees of KCIT (kernel conditional independence test). However, this implies that the distribution of both hidden confounders ($\hat{U}$) and unobserved potential outcomes ($\hat{Y}^{1-T}$) is well captured by the LLM. However, how can this be guaranteed in practice? The authors should adress at least discussion on this assumptions, and recall that this _guarantee_ depends on
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
