Model-based clustering for identifying disease-associated SNPs in case-control genome-wide association studies
Yan Xu, Li Xing, Jessica Su, Xuekui Zhang, Weiliang Qiu

TL;DR
This paper introduces a model-based clustering method for GWAS data that improves detection of disease-associated SNPs by leveraging information across SNPs and controlling false discovery rates more effectively.
Contribution
The authors propose a novel clustering approach that transforms high-dimensional GWAS analysis into a more manageable form, outperforming traditional methods in simulations and real data analysis.
Findings
Better control of false discovery rate (FDR) in simulations
Detection of known and novel SNPs in real GWAS data
Outperforms traditional SNP-wise approach in sensitivity
Abstract
Genome-wide association studies (GWASs) aim to detect genetic risk factors for complex human diseases by identifying disease-associated single-nucleotide polymorphisms (SNPs). The traditional SNP-wise approach along with multiple testing adjustment is over-conservative and lack of power in many GWASs. In this article, we proposed a model-based clustering method that transforms the challenging high-dimension-small-sample-size problem to low-dimension-large-sample-size problem and borrows information across SNPs by grouping SNPs into three clusters. We pre-specify the patterns of clusters by minor allele frequencies of SNPs between cases and controls, and enforce the patterns with prior distributions. In the simulation studies our proposed novel model outperform traditional SNP-wise approach by showing better controls of false discovery rate (FDR) and higher sensitivity. We re-analyzed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
