ProtFlow: Fast Protein Sequence Design via Flow Matching on Compressed Protein Language Model Embeddings
Zitai Kong, Yiheng Zhu, Yinlong Xu, Hanjing Zhou, Mingzhe Yin, Jialu, Wu, Hongxia Xu, Chang-Yu Hsieh, Tingjun Hou, Jian Wu

TL;DR
ProtFlow is a rapid, efficient protein sequence design framework that uses flow matching on compressed embeddings from protein language models, enabling high-quality single-step generation and broad applicability.
Contribution
It introduces a novel flow matching approach operating on compressed latent space embeddings for fast, resource-efficient protein sequence design.
Findings
Outperforms task-specific methods in diverse protein design tasks
Enables high-quality single-step sequence generation
Effective on peptides, long-chain proteins, and antibodies
Abstract
The design of protein sequences with desired functionalities is a fundamental task in protein engineering. Deep generative methods, such as autoregressive models and diffusion models, have greatly accelerated the discovery of novel protein sequences. However, these methods mainly focus on local or shallow residual semantics and suffer from low inference efficiency, large modeling space and high training cost. To address these challenges, we introduce ProtFlow, a fast flow matching-based protein sequence design framework that operates on embeddings derived from semantically meaningful latent space of protein language models. By compressing and smoothing the latent space, ProtFlow enhances performance while training on limited computational resources. Leveraging reflow techniques, ProtFlow enables high-quality single-step sequence generation. Additionally, we develop a joint design…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topicsvaccines and immunoinformatics approaches · Monoclonal and Polyclonal Antibodies Research · Machine Learning in Bioinformatics
MethodsDiffusion · Focus
