Simultaneous Selection of Multiple Important Single Nucleotide Polymorphisms in Familial Genome Wide Association Studies Data
Subhabrata Majumdar, Saonli Basu, Matt McGue, Snigdhansu Chatterjee

TL;DR
This paper introduces a fast, resampling-based variable selection method for identifying relevant SNPs in family-based genome-wide association studies, improving detection power over traditional single-marker tests.
Contribution
It presents a novel, computationally efficient model selection approach using the e-values framework that accounts for familial dependencies and detects multiple SNPs simultaneously.
Findings
More effective in detecting trait-associated SNPs than traditional methods
Successfully identified SNPs linked to alcohol consumption in real data
Scalable bootstrap procedure enhances computational efficiency
Abstract
We propose a resampling-based fast variable selection technique for detecting relevant single nucleotide polymorphisms (SNP) in a multi-marker mixed effect model. Due to computational complexity, current practice primarily involves testing the effect of one SNP at a time, commonly termed as `single SNP association analysis'. Joint modeling of genetic variants within a gene or pathway may have better power to detect associated genetic variants, especially the ones with weak effects. In this paper, we propose a computationally efficient model selection approach -- based on the e-values framework -- for single SNP detection in families while utilizing information on multiple SNPs simultaneously. To overcome computational bottleneck of traditional model selection methods, our method trains one single model, and utilizes a fast and scalable bootstrap procedure. We illustrate through…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
