Leveraging variational autoencoders for multiple data imputation
Breeshey Roskams-Hieter, Jude Wells, Sara Wade

TL;DR
This paper explores using variational autoencoders for multiple data imputation, highlighting their limitations and proposing $eta$-VAEs with cross-validation to improve uncertainty calibration and reduce false discoveries.
Contribution
It introduces the use of $eta$-VAEs for data imputation, addressing the calibration issues of standard VAEs and demonstrating improved downstream task reliability.
Findings
VAEs show poor empirical coverage for missing data.
$eta$-VAEs improve uncertainty calibration.
Proper $eta$ selection via cross-validation enhances imputation quality.
Abstract
Missing data persists as a major barrier to data analysis across numerous applications. Recently, deep generative models have been used for imputation of missing data, motivated by their ability to capture highly non-linear and complex relationships in the data. In this work, we investigate the ability of deep models, namely variational autoencoders (VAEs), to account for uncertainty in missing data through multiple imputation strategies. We find that VAEs provide poor empirical coverage of missing data, with underestimation and overconfident imputations, particularly for more extreme missing data values. To overcome this, we employ -VAEs, which viewed from a generalized Bayes framework, provide robustness to model misspecification. Assigning a good value of is critical for uncertainty calibration and we demonstrate how this can be achieved using cross-validation. In…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Machine Learning in Healthcare · AI in cancer detection
