Evaluation of Missing Data Analytical Techniques in Longitudinal Research: Traditional and Machine Learning Approaches
Dandan Tang, Xin Tong

TL;DR
This study compares traditional and machine learning techniques for handling missing data in longitudinal research, revealing FIML's effectiveness for MNAR data and highlighting conditions where machine learning methods excel.
Contribution
It provides a comprehensive Monte Carlo simulation analysis of six missing data techniques in growth curve modeling, focusing on MNAR and MAR scenarios with nonnormal data.
Findings
FIML is most effective for MNAR data.
TSRE performs well with MAR data.
missForest is advantageous with large samples and skewed distributions.
Abstract
Missing Not at Random (MNAR) and nonnormal data are challenging to handle. Traditional missing data analytical techniques such as full information maximum likelihood estimation (FIML) may fail with nonnormal data as they are built on normal distribution assumptions. Two-Stage Robust Estimation (TSRE) does manage nonnormal data, but both FIML and TSRE are less explored in longitudinal studies under MNAR conditions with nonnormal distributions. Unlike traditional statistical approaches, machine learning approaches do not require distributional assumptions about the data. More importantly, they have shown promise for MNAR data; however, their application in longitudinal studies, addressing both Missing at Random (MAR) and MNAR scenarios, is also underexplored. This study utilizes Monte Carlo simulations to assess and compare the effectiveness of six analytical techniques for missing data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare
