Tradeoffs of Linear Mixed Models in Genome-wide Association Studies
Haohan Wang, Bryon Aragam, Eric Xing

TL;DR
This paper analyzes the statistical properties of linear mixed models in GWAS, focusing on their sensitivity to candidate SNP inclusion and their effectiveness in correcting confounders like population stratification and environmental factors.
Contribution
It provides theoretical insights into the impact of including candidate SNPs in kinship matrices and compares how different methods handle confounders in GWAS.
Findings
Including candidate SNPs in kinship matrices introduces quantifiable errors.
Mixed models effectively correct for population stratification.
Different methods trade off confounder correction and computational efficiency.
Abstract
Motivated by empirical arguments that are well-known from the genome-wide association studies (GWAS) literature, we study the statistical properties of linear mixed models (LMMs) applied to GWAS. First, we study the sensitivity of LMMs to the inclusion of a candidate SNP in the kinship matrix, which is often done in practice to speed up computations. Our results shed light on the size of the error incurred by including a candidate SNP, providing a justification to this technique in order to trade-off velocity against veracity. Second, we investigate how mixed models can correct confounders in GWAS, which is widely accepted as an advantage of LMMs over traditional methods. We consider two sources of confounding factors, population stratification and environmental confounding factors, and study how different methods that are commonly used in practice trade-off these two confounding…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenetic Associations and Epidemiology · Genetic and phenotypic traits in livestock · Genetic Mapping and Diversity in Plants and Animals
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
