Leveraging variational autoencoders for multiple data imputation

Breeshey Roskams-Hieter; Jude Wells; Sara Wade

arXiv:2209.15321·stat.ML·October 3, 2022·1 cites

Leveraging variational autoencoders for multiple data imputation

Breeshey Roskams-Hieter, Jude Wells, Sara Wade

PDF

Open Access 1 Repo

TL;DR

This paper explores using variational autoencoders for multiple data imputation, highlighting their limitations and proposing $eta$-VAEs with cross-validation to improve uncertainty calibration and reduce false discoveries.

Contribution

It introduces the use of $eta$-VAEs for data imputation, addressing the calibration issues of standard VAEs and demonstrating improved downstream task reliability.

Findings

01

VAEs show poor empirical coverage for missing data.

02

$eta$-VAEs improve uncertainty calibration.

03

Proper $eta$ selection via cross-validation enhances imputation quality.

Abstract

Missing data persists as a major barrier to data analysis across numerous applications. Recently, deep generative models have been used for imputation of missing data, motivated by their ability to capture highly non-linear and complex relationships in the data. In this work, we investigate the ability of deep models, namely variational autoencoders (VAEs), to account for uncertainty in missing data through multiple imputation strategies. We find that VAEs provide poor empirical coverage of missing data, with underestimation and overconfident imputations, particularly for more extreme missing data values. To overcome this, we employ $β$ -VAEs, which viewed from a generalized Bayes framework, provide robustness to model misspecification. Assigning a good value of $β$ is critical for uncertainty calibration and we demonstrate how this can be achieved using cross-validation. In…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

roskamsh/betavaemimputation
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Machine Learning in Healthcare · AI in cancer detection