Deep generative models of genetic variation capture mutation effects
Adam J. Riesselman, John B. Ingraham, Debora S. Marks

TL;DR
DeepSequence is a novel deep generative model that captures complex, higher-order dependencies in genetic sequences, improving mutation effect predictions beyond traditional pairwise models using unsupervised learning.
Contribution
The paper introduces DeepSequence, a probabilistic latent variable model that captures higher-order interactions in genetic data, outperforming existing models in mutation effect prediction.
Findings
DeepSequence predicts mutation effects more accurately than pairwise models.
The model reveals latent structures in sequence families.
It can extrapolate to unobserved sequence regions.
Abstract
The functions of proteins and RNAs are determined by a myriad of interactions between their constituent residues, but most quantitative models of how molecular phenotype depends on genotype must approximate this by simple additive effects. While recent models have relaxed this constraint to also account for pairwise interactions, these approaches do not provide a tractable path towards modeling higher-order dependencies. Here, we show how latent variable models with nonlinear dependencies can be applied to capture beyond-pairwise constraints in biomolecules. We present a new probabilistic model for sequence families, DeepSequence, that can predict the effects of mutations across a variety of deep mutational scanning experiments significantly better than site independent or pairwise models that are based on the same evolutionary data. The model, learned in an unsupervised manner solely…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · RNA and protein synthesis mechanisms · Genetics, Bioinformatics, and Biomedical Research
