Efficient computation with a linear mixed model on large-scale data sets with applications to genetic studies
Matti Pirinen, Peter Donnelly, Chris C. A. Spencer

TL;DR
This paper introduces a fast and accurate method for linear mixed model analysis tailored for large-scale genetic data, enabling efficient genome-wide association studies with improved computational speed and Bayesian model comparison.
Contribution
It presents a novel transformation, a faster likelihood-maximization algorithm, and efficient marginal likelihood computation methods for large-scale genetic data analysis.
Findings
Successfully applied to a study with 20,000 individuals and 500,000 variants
Achieved an order of magnitude faster computation than previous methods
Enabled Bayesian model comparison in large-scale genetic studies
Abstract
Motivated by genome-wide association studies, we consider a standard linear model with one additional random effect in situations where many predictors have been collected on the same subjects and each predictor is analyzed separately. Three novel contributions are (1) a transformation between the linear and log-odds scales which is accurate for the important genetic case of small effect sizes; (2) a likelihood-maximization algorithm that is an order of magnitude faster than the previously published approaches; and (3) efficient methods for computing marginal likelihoods which allow Bayesian model comparison. The methodology has been successfully applied to a large-scale association study of multiple sclerosis including over 20,000 individuals and 500,000 genetic variants.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
