Simultaneous SNP identification in association studies with missing data
Zhen Li, Vikneswaran Gopal, Xiaobo Li, John M. Davis, George Casella

TL;DR
This paper introduces BAMD, a Bayesian hierarchical model and Gibbs sampler for association testing with missing SNP data, providing unbiased estimates and enabling detection of SNP interactions.
Contribution
The paper presents BAMD, a novel Bayesian method that effectively handles missing data in association studies and improves computational efficiency.
Findings
Unbiased SNP effect estimates with missing data
Validated known SNP-phenotype associations
Discovered an additional SNP linked to the trait
Abstract
Association testing aims to discover the underlying relationship between genotypes (usually Single Nucleotide Polymorphisms, or SNPs) and phenotypes (attributes, or traits). The typically large data sets used in association testing often contain missing values. Standard statistical methods either impute the missing values using relatively simple assumptions, or delete them, or both, which can generate biased results. Here we describe the Bayesian hierarchical model BAMD (Bayesian Association with Missing Data). BAMD is a Gibbs sampler, in which missing values are multiply imputed based upon all of the available information in the data set. We estimate the parameters and prove that updating one SNP at each iteration preserves the ergodic property of the Markov chain, and at the same time improves computational speed. We also implement a model selection option in BAMD, which enables…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
