Constructing Confidence Intervals for Average Treatment Effects from Multiple Datasets

Yuxin Wang; Maresa Schr\"oder; Dennis Frauen; Jonas Schweisthal; Konstantin Hess; Stefan Feuerriegel

arXiv:2412.11511·cs.LG·October 16, 2025

Constructing Confidence Intervals for Average Treatment Effects from Multiple Datasets

Yuxin Wang, Maresa Schr\"oder, Dennis Frauen, Jonas Schweisthal, Konstantin Hess, Stefan Feuerriegel

PDF

Open Access 1 Repo 3 Reviews

TL;DR

This paper introduces a new, assumption-light method for constructing valid confidence intervals for average treatment effects by combining multiple observational datasets, improving precision through prediction-powered inference.

Contribution

The paper presents a novel method for estimating ATE and constructing valid CIs from multiple datasets, including an extension to combine experimental and observational data.

Findings

01

The method provides unbiased ATE estimates with valid confidence intervals.

02

Numerical experiments confirm the theoretical properties of the proposed approach.

03

Extension allows combining experimental and observational datasets for CI construction.

Abstract

Constructing confidence intervals (CIs) for the average treatment effect (ATE) from patient records is crucial to assess the effectiveness and safety of drugs. However, patient records typically come from different hospitals, thus raising the question of how multiple observational datasets can be effectively combined for this purpose. In our paper, we propose a new method that estimates the ATE from multiple observational datasets and provides valid CIs. Our method makes little assumptions about the observational datasets and is thus widely applicable in medical practice. The key idea of our method is that we leverage prediction-powered inferences and thereby essentially `shrink' the CIs so that we offer more precise uncertainty quantification as compared to na\"ive approaches. We further prove the unbiasedness of our method and the validity of our CIs. We confirm our theoretical…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 8Confidence 3

Strengths

The paper addresses an important issue that has not been covered in the literature before. The paper also does a good job of placing itself within the literature. The paper also does a good job of precisely backing up their claims with theorems.

Weaknesses

On the theoretical side, the paper only shows the asymptotic validity of the proposed method. There are no results demonstrating the width of the resulting confidence intervals. One would like to know under what circumstances the given method will produce a confidence interval that converges on exactly 1-$\alpha$. Moreover, the paper does not give any theoretical insight into when it is beneficial to use both datasets as opposed to just the unconfounded dataset. There should be some result sho

Reviewer 02Rating 6Confidence 3

Strengths

The article is well written, and technically sound. The proposed approach is interesting, with great potential for practical application. The previous literature is well integrated, and the paper's novelty is well explicited.

Weaknesses

1. It appears that the novelty of this work is incremental. Use the previously proposed estimator for causal effects and use it in a PPI framework to construct confidence intervals. 2. Lemma 6.1 is a theoretical property of the inverse probability weighting (IPW) estimator and should not be considered a contribution of this paper. Furthermore, the IPW estimator can also be interpreted as an estimation method based on the influence function, while the augmented inverse probability weighting (AI

Reviewer 03Rating 6Confidence 5

Strengths

The paper focuses on a very important topic, which is uncertainty quantification of point estimate for the ATE, based on observational data sets. Obtaining accurate confidence intervals for the ATE helps to understand the effect of a treatment (e.g., drugs). Observational data are becoming more and more common in causal inference. The main contribution of the paper is to propose a new way of computing confidence intervals based on two observational data sets. The formula is rather simple and eas

Weaknesses

There is a potential practical interest for the confidence interval and the point estimate developed in this paper. I have two majors concerns with the paper: - the experimental results on the toy data set are inconclusive : the input variable is of dimension one, which does not correspond to a real-world problem. For the conclusion to hold, I would expect more experiments on more complex settings (more input variables, with or without dependence, how relaxing the assumption of all observed con

Code & Models

Repositories

yuxin217/causalppi
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Causal Inference Techniques · Statistical Methods and Inference · Statistical Methods in Clinical Trials