Few Shot Protein Generation
Soumya Ram, Tristan Bepler

TL;DR
This paper introduces the MSA-to-protein transformer, a novel generative model that conditions on multiple sequence alignments to produce protein sequences, outperforming existing methods especially with small MSAs.
Contribution
It presents a new transformer-based model that directly encodes MSAs for protein generation, avoiding the need for dedicated family models and enabling efficient sampling.
Findings
Outperforms traditional family modeling approaches.
Generalizes well to unseen protein families.
Accurately models epistasis and indels.
Abstract
We present the MSA-to-protein transformer, a generative model of protein sequences conditioned on protein families represented by multiple sequence alignments (MSAs). Unlike existing approaches to learning generative models of protein families, the MSA-to-protein transformer conditions sequence generation directly on a learned encoding of the multiple sequence alignment, circumventing the need for fitting dedicated family models. By training on a large set of well-curated multiple sequence alignments in Pfam, our MSA-to-protein transformer generalizes well to protein families not observed during training and outperforms conventional family modeling approaches, especially when MSAs are small. Our generative approach accurately models epistasis and indels and allows for exact inference and efficient sampling unlike other approaches. We demonstrate the protein sequence modeling…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Genetics, Bioinformatics, and Biomedical Research · Machine Learning in Bioinformatics
