Identifiable Deep Latent Variable Models for MNAR Data
Huiming Xie, Fei Xue, Xiao Wang

TL;DR
This paper introduces a deep latent variable model framework for MNAR data that guarantees identifiability and accurate distribution recovery, addressing limitations of existing methods that assume MAR.
Contribution
It establishes the theoretical identifiability of MNAR data distributions under a no self-censoring assumption and develops an efficient importance-weighted autoencoder algorithm for estimation.
Findings
Theoretically guarantees distribution identifiability under certain conditions.
Accurately recovers ground-truth distributions in simulations.
Outperforms classical and state-of-the-art imputation methods in experiments.
Abstract
Missing data is a ubiquitous challenge in data analysis, often leading to biased and inaccurate results. Traditional imputation methods usually assume that the missingness mechanism is missing-at-random (MAR), where the missingness is independent of the missing values themselves. This assumption is frequently violated in real-world scenarios, prompted by recent advances in imputation methods using deep learning to address this challenge. However, these methods neglect the crucial issue of nonparametric identifiability in missing-not-at-random (MNAR) data, which can lead to biased and unreliable results. This paper seeks to bridge this gap by proposing a novel framework based on deep latent variable models for MNAR data. Building on the assumption of conditional no self-censoring given latent variables, we establish the identifiability of the data distribution. This crucial theoretical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
