Variational auto-encoding of protein sequences

Sam Sinai; Eric Kelsic; George M. Church; Martin A. Nowak

arXiv:1712.03346·q-bio.QM·January 4, 2018·49 cites

Variational auto-encoding of protein sequences

Sam Sinai, Eric Kelsic, George M. Church, Martin A. Nowak

PDF

Open Access 2 Repos

TL;DR

This paper introduces a variational auto-encoder model to embed protein sequences, enabling better prediction of mutation effects and understanding of sequence-function relationships, which advances protein analysis and design.

Contribution

The paper presents a novel unsupervised variational auto-encoder approach for protein sequences that outperforms baseline methods and sometimes surpasses state-of-the-art models in predicting mutation impacts.

Findings

01

Better mutation effect prediction than baseline methods

02

Outperforms some state-of-the-art inverse-Potts models

03

Facilitates exploration of protein sequence space

Abstract

Proteins are responsible for the most diverse set of functions in biology. The ability to extract information from protein sequences and to predict the effects of mutations is extremely valuable in many domains of biology and medicine. However the mapping between protein sequence and function is complex and poorly understood. Here we present an embedding of natural protein sequences using a Variational Auto-Encoder and use it to predict how mutations affect protein function. We use this unsupervised approach to cluster natural variants and learn interactions between sets of positions within a protein. This approach generally performs better than baseline methods that consider no interactions within sequences, and in some cases better than the state-of-the-art approaches that use the inverse-Potts model. This generative model can be used to computationally guide exploration of protein…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenomics and Phylogenetic Studies · RNA and protein synthesis mechanisms · Machine Learning in Bioinformatics