Multiple imputation for logistic regression models: incorporating an interaction
Matthew J. Smith, Matteo Quartagno, Edmund Njeru Njagi

TL;DR
This study compares multiple imputation methods for logistic regression models with interaction terms involving partially observed binary variables, highlighting the strengths and limitations of each approach through simulations and real data application.
Contribution
It evaluates and compares four imputation methods specifically for logistic regression models with interactions, providing guidance on their performance under various conditions.
Findings
SMCFCS and SIA showed least bias and good coverage.
SMCFCS performed better with continuous underlying variables.
SIA performed poorly with low prevalence of the fully observed variable.
Abstract
Background: Multiple imputation is often used to reduce bias and gain efficiency when there is missing data. The most appropriate imputation method depends on the model the analyst is interested in fitting. Several imputation approaches have been proposed for when this model is a logistic regression model with an interaction term that contains a binary partially observed variable; however, it is not clear which performs best under certain parameter settings. Methods: Using 1000 simulations, each with 10,000 observations, under six data-generating mechanisms (DGM), we investigate the performance of four methods: (i) 'passive imputation', (ii) 'just another variable' (JAV), (iii) 'stratify-impute-append' (SIA), and (iv) 'substantive model compatible fully conditional specifica-tion' (SMCFCS). The application of each method is shown in an empirical example using England-based cancer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Bayesian Inference · Statistical Methods and Inference · demographic modeling and climate adaptation
