Non-standard conditionally specified models for non-ignorable missing data
Alexander M Franks, Edoardo M Airoldi, Donald B Rubin

TL;DR
This paper introduces a flexible modeling approach for non-ignorable missing data using non-standard conditional distributions, enabling more realistic data analysis when missingness depends on unobserved factors.
Contribution
It proposes a novel application of Tukey's conditional representation to exponential family models, with a tractable inference method for non-ignorable missing data.
Findings
Effective modeling of non-ignorable missing data in biological datasets
Conditional representation allows incorporation of substantive knowledge
Demonstrated improved inference on high-throughput biological data
Abstract
Data analyses typically rely upon assumptions about missingness mechanisms that lead to observed versus missing data. When the data are missing not at random, direct assumptions about the missingness mechanism, and indirect assumptions about the distributions of observed and missing data, are typically untestable. We explore an approach, where the joint distribution of observed data and missing data is specified through non-standard conditional distributions. In this formulation, which traces back to a factorization of the joint distribution, apparently proposed by J.W. Tukey, the modeling assumptions about the conditional factors are either testable or are designed to allow the incorporation of substantive knowledge about the problem at hand, thereby offering a possibly realistic portrayal of the data, both missing and observed. We apply Tukey's conditional representation to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
