Causal machine learning methods and use of cross-fitting in settings with high-dimensional confounding

Susan Ellul; Stijn Vansteelandt; John B. Carlin; Margarita Moreno-Betancur

arXiv:2405.15242·stat.ME·August 29, 2025·3 cites

Causal machine learning methods and use of cross-fitting in settings with high-dimensional confounding

Susan Ellul, Stijn Vansteelandt, John B. Carlin, Margarita Moreno-Betancur

PDF

Open Access

TL;DR

This study compares advanced causal inference methods, AIPW and TMLE, in high-dimensional confounding settings, highlighting the benefits of cross-fitting and comprehensive ensemble libraries for accurate effect estimation.

Contribution

It provides empirical guidance on implementing AIPW and TMLE with cross-fitting and Super Learner libraries in high-dimensional confounding scenarios.

Findings

01

TMLE shows more stability than AIPW.

02

Cross-fitting improves variance estimation and coverage.

03

Full Super Learner libraries reduce bias and variance.

Abstract

Observational epidemiological studies commonly seek to estimate the causal effect of an exposure on an outcome. Adjustment for potential confounding bias in modern studies is challenging due to the presence of high-dimensional confounding, which occurs when there are many confounders relative to sample size or complex relationships between continuous confounders and exposure and outcome. Doubly robust methods such as Augmented Inverse Probability Weighting (AIPW) and Targeted Maximum Likelihood Estimation (TMLE) have the potential to address these challenges, using data-adaptive approaches and cross-fitting, but despite recent advances limited evaluation and guidance are available on their implementation in realistic settings where high-dimensional confounding is present. Motivated by an early-life cohort study, we conducted an extensive simulation study to compare the relative…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods and Inference · Advanced Statistical Methods and Models