# Handling missing data when estimating causal effects with targeted maximum likelihood estimation

**Authors:** S Ghazaleh Dashti, Katherine J Lee, Julie A Simpson, Ian R White, John B Carlin, Margarita Moreno-Betancur

PMC · DOI: 10.1093/aje/kwae012 · American Journal of Epidemiology · 2024-02-22

## TL;DR

This paper evaluates methods for handling missing data when using targeted maximum likelihood estimation for causal inference, finding that parametric multiple imputation with interactions performs best in most scenarios.

## Contribution

The study provides empirical guidance on handling missing data in TMLE-based causal inference, comparing eight methods across various missingness mechanisms.

## Key findings

- Parametric multiple imputation with interactions showed the best performance in bias and variance reduction across most settings.
- Complete-case analysis and extended TMLE had small biases when outcome did not influence missingness in other variables.
- Parametric MI without interactions had large bias when exposure/outcome models included interactions.

## Abstract

Targeted maximum likelihood estimation (TMLE) is increasingly used for doubly robust causal inference, but how missing data should be handled when using TMLE with data-adaptive approaches is unclear. Based on data (1992-1998) from the Victorian Adolescent Health Cohort Study, we conducted a simulation study to evaluate 8 missing-data methods in this context: complete-case analysis, extended TMLE incorporating an outcome-missingness model, the missing covariate missing indicator method, and 5 multiple imputation (MI) approaches using parametric or machine-learning models. We considered 6 scenarios that varied in terms of exposure/outcome generation models (presence of confounder-confounder interactions) and missingness mechanisms (whether outcome influenced missingness in other variables and presence of interaction/nonlinear terms in missingness models). Complete-case analysis and extended TMLE had small biases when outcome did not influence missingness in other variables. Parametric MI without interactions had large bias when exposure/outcome generation models included interactions. Parametric MI including interactions performed best in bias and variance reduction across all settings, except when missingness models included a nonlinear term. When choosing a method for handling missing data in the context of TMLE, researchers must consider the missingness mechanism and, for MI, compatibility with the analysis method. In many settings, a parametric MI approach that incorporates interactions and nonlinearities is expected to perform well.

## Full-text entities

- **Genes:** AP2B1 (adaptor related protein complex 2 subunit beta 1) [NCBI Gene 163] {aka ADTB2, AP105B, AP2-BETA, CLAPB1}
- **Diseases:** MI (MESH:D009104), anxiety (MESH:D001007), depression (MESH:D003866), CCA (MESH:C536211), antisocial behavior (MESH:D000987), alcohol (MESH:D000437)
- **Chemicals:** alcohol (MESH:D000438), -DAG (-)
- **Species:** Mus musculus (house mouse, species) [taxon 10090]
- **Cell lines:** S2 — Drosophila melanogaster (Fruit fly), Spontaneously immortalized cell line (CVCL_Z232)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC11228874/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC11228874/full.md

## References

40 references — full list in the complete paper: https://tomesphere.com/paper/PMC11228874/full.md

---
Source: https://tomesphere.com/paper/PMC11228874