Quantifying the Fraction of Missing Information for Hypothesis Testing in Statistical and Genetic Studies
Dan L. Nicolae, Xiao-Li Meng, Augustine Kong

TL;DR
This paper develops likelihood-based measures to quantify the impact of missing data on hypothesis testing in genetic and statistical studies, facilitating better experimental design and data interpretation.
Contribution
It introduces two new measures based on Kullback-Leibler information for assessing missing data impact, applicable to large and small samples, with practical examples.
Findings
Likelihood-based measures are computationally inexpensive.
Measures are demonstrated on genetic mapping data.
Bayesian approaches offer robustness for small samples.
Abstract
Many practical studies rely on hypothesis testing procedures applied to data sets with missing information. An important part of the analysis is to determine the impact of the missing data on the performance of the test, and this can be done by properly quantifying the relative (to complete data) amount of available information. The problem is directly motivated by applications to studies, such as linkage analyses and haplotype-based association projects, designed to identify genetic contributions to complex diseases. In the genetic studies the relative information measures are needed for the experimental design, technology comparison, interpretation of the data, and for understanding the behavior of some of the inference tools. The central difficulties in constructing such information measures arise from the multiple, and sometimes conflicting, aims in practice. For large samples, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
