A near-exact linear mixed model for genome-wide association studies

Zhibin Pu; Shufei Ge; Shijia Wang

arXiv:2508.05278·stat.CO·August 8, 2025

A near-exact linear mixed model for genome-wide association studies

Zhibin Pu, Shufei Ge, Shijia Wang

PDF

TL;DR

The paper introduces NExt-LMM, a computationally efficient linear mixed model for GWAS that uses low-rank matrix approximations to significantly speed up analysis while maintaining high accuracy.

Contribution

The novel NExt-LMM framework exploits low-rank structures and HODLR formats to overcome computational bottlenecks in GWAS LMMs, with proven error bounds.

Findings

01

NExt-LMM accelerates GWAS analysis significantly.

02

The method maintains low approximation error.

03

Numerical experiments validate efficiency improvements.

Abstract

Linear mixed models (LMM) are widely adopted in genome-wide association studies (GWAS) to account for population stratification and cryptic relatedness. However, the parameter estimation of LMMs imposes substantial computational burdens due to large-scale operations on genetic similarity matrices (GSM). We introduced the near-exact linear mixed model (NExt-LMM), a novel LMM framework that overcomes critical computational bottlenecks in GWAS through the following key innovations. Firstly, we exploit the inherent low-rank structure of the GSM iteratively with the Hierarchical Off-Diagonal Low-Rank (HODLR) format, which is much faster than traditional decomposition methods. Secondly, we leverage the HODLR-approximated GSM to dramatically accelerate the further maximum likelihood estimation with the shared heritability ratios. Moreover, we establish rigorous error bounds for the NExt-LMM…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.