ReLSO: A Transformer-based Model for Latent Space Optimization and Generation of Proteins
Egbert Castro, Abhinav Godavarthi, Julian Rubinfien, Kevin B., Givechian, Dhananjay Bhaskar, Smita Krishnaswamy

TL;DR
ReLSO is a transformer-based autoencoder that models protein sequence-function landscapes, enabling efficient generation and optimization of high-fitness protein sequences through a structured latent space and gradient-based methods.
Contribution
This paper introduces ReLSO, a novel deep transformer autoencoder with a structured latent space for simultaneous sequence generation and fitness prediction in proteins.
Findings
ReLSO outperforms other methods in sequence optimization efficiency.
ReLSO more robustly generates high-fitness protein sequences.
Attention mechanisms in ReLSO provide insights into sequence-function relationships.
Abstract
The development of powerful natural language models have increased the ability to learn meaningful representations of protein sequences. In addition, advances in high-throughput mutagenesis, directed evolution, and next-generation sequencing have allowed for the accumulation of large amounts of labeled fitness data. Leveraging these two trends, we introduce Regularized Latent Space Optimization (ReLSO), a deep transformer-based autoencoder which features a highly structured latent space that is trained to jointly generate sequences as well as predict fitness. Through regularized prediction heads, ReLSO introduces a powerful protein sequence encoder and novel approach for efficient fitness landscape traversal. Using ReLSO, we explicitly model the sequence-function landscape of large labeled datasets and generate new molecules by optimizing within the latent space using gradient-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topicsvaccines and immunoinformatics approaches · Machine Learning in Bioinformatics · RNA and protein synthesis mechanisms
