# How to manage missing covariates in randomized controlled trials: a comparison of strategies

**Authors:** Shiyu Zhang, Yajuan Si, John J. Dziak

PMC · DOI: 10.1186/s12874-025-02708-w · 2025-11-25

## TL;DR

This paper compares strategies for handling missing data in RCTs, showing which methods produce unbiased treatment effect estimates under different conditions.

## Contribution

The paper clarifies how different missing data strategies perform in RCTs and reconciles conflicting recommendations in the literature.

## Key findings

- MI by arm provides unbiased estimates for both average and subgroup treatment effects under MAR.
- Baseline-only MI, grand mean imputation, and missing indicator method yield unbiased average treatment effects but biased subgroup effects.
- Simple strategies can perform well for primary analyses but fail for secondary analyses under certain conditions.

## Abstract

When analyzing randomized controlled trials (RCTs) data, covariate adjustment is often employed to increase the precision of estimated treatment effects. Missing data in covariates, if not handled properly, can result in biased and inefficient estimates. However, the existing literature on handling missing covariate data is limited, and recommendations vary regarding a valid and efficient approach.

To help reconcile the seemingly inconsistent recommendations, we address two questions through methodological descriptions and simulated demonstrations. First, how should a multiple imputation (MI) model be specified for RCTs to best preserve the benefit of the randomization design? We consider three different approaches: MI with only baseline variables, “MI overall”, and “MI by arm”. Second, when and why will simple general strategies, such as grand mean imputation and the missing indicator method, perform as well as or better than MI in estimating treatment effects, and when and why do they fail?

“MI by arm” has the potential to produce unbiased estimates for both the average and subgroup treatment effect (primary and secondary analyses) under the missing at random assumption. Strategies that capitalize on the randomization design, including MI with baseline variables, grand mean imputation, and the missing indicator method, may generate unbiased estimates for the average treatment effect (primary analysis) regardless of the missing data mechanism.

This article clarifies the assumptions and mechanisms by which different missing data strategies accommodate missingness in covariates and reconcile recommendations that sometimes appear contradictory in the literature. Under MAR, “MI by arm” produces unbiased estimates for both the average treatment effect and subgroup treatment effects. Leveraging the randomization design, “baseline-only MI”, grand mean imputation, and the missing indicator method produce unbiased estimates for the average treatment effect, but biased subgroup treatment effects, regardless of the missing data mechanism.

## Full-text entities

- **Diseases:** GMI (MESH:D004830), MI (MESH:D009104), MIM (MESH:D000030)
- **Species:** Homo sapiens (human, species) [taxon 9606], Mus musculus (house mouse, species) [taxon 10090]

---
Source: https://tomesphere.com/paper/PMC12649034