Causal machine learning methods and use of cross-fitting in settings with high-dimensional confounding
Susan Ellul, Stijn Vansteelandt, John B. Carlin, Margarita Moreno-Betancur

TL;DR
This study compares advanced causal inference methods, AIPW and TMLE, in high-dimensional confounding settings, highlighting the benefits of cross-fitting and comprehensive ensemble libraries for accurate effect estimation.
Contribution
It provides empirical guidance on implementing AIPW and TMLE with cross-fitting and Super Learner libraries in high-dimensional confounding scenarios.
Findings
TMLE shows more stability than AIPW.
Cross-fitting improves variance estimation and coverage.
Full Super Learner libraries reduce bias and variance.
Abstract
Observational epidemiological studies commonly seek to estimate the causal effect of an exposure on an outcome. Adjustment for potential confounding bias in modern studies is challenging due to the presence of high-dimensional confounding, which occurs when there are many confounders relative to sample size or complex relationships between continuous confounders and exposure and outcome. Doubly robust methods such as Augmented Inverse Probability Weighting (AIPW) and Targeted Maximum Likelihood Estimation (TMLE) have the potential to address these challenges, using data-adaptive approaches and cross-fitting, but despite recent advances limited evaluation and guidance are available on their implementation in realistic settings where high-dimensional confounding is present. Motivated by an early-life cohort study, we conducted an extensive simulation study to compare the relative…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Advanced Statistical Methods and Models
