Training Flexible Models of Genetic Variant Effects from Functional Annotations using Accelerated Linear Algebra
Alan N. Amin, Andres Potapczynski, Andrew Gordon Wilson

TL;DR
This paper introduces DeepWAS, a novel method leveraging accelerated linear algebra to train large neural network models for genetic variant effect prediction, overcoming previous computational limitations and improving disease prediction accuracy.
Contribution
The paper presents a new approach using modern linear algebra techniques to enable training of large, flexible neural networks for genetic variant effect modeling, surpassing prior small-model limitations.
Findings
Larger models trained with full likelihood outperform small models.
Full likelihood training enables effective use of larger neural networks.
Models trained with traditional summary statistics do not benefit from increased size.
Abstract
To understand how genetic variants in human genomes manifest in phenotypes -- traits like height or diseases like asthma -- geneticists have sequenced and measured hundreds of thousands of individuals. Geneticists use this data to build models that predict how a genetic variant impacts phenotype given genomic features of the variant, like DNA accessibility or the presence of nearby DNA-bound proteins. As more data and features become available, one might expect predictive models to improve. Unfortunately, training these models is bottlenecked by the need to solve expensive linear algebra problems because variants in the genome are correlated with nearby variants, requiring inversion of large matrices. Previous methods have therefore been restricted to fitting small models, and fitting simplified summary statistics, rather than the full likelihood of the statistical model. In this paper,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsGenomics and Rare Diseases · Genetic Associations and Epidemiology · Genomics and Phylogenetic Studies
