Protein Design with Dynamic Protein Vocabulary
Nuowei Liu, Jiahao Kuang, Yanting Liu, Tao Ji, Changzhi Sun, Man Lan, Yuanbin Wu

TL;DR
This paper introduces ProDVa, a novel protein design method that combines natural protein fragments with deep generative models, significantly improving structural plausibility and foldability with minimal training data.
Contribution
ProDVa integrates a text encoder, protein language model, and fragment encoder to enhance protein design, leveraging natural fragments to improve foldability and structural plausibility.
Findings
ProDVa achieves comparable functional alignment with less than 0.04% of training data.
ProDVa increases the proportion of well-folded proteins by 7.38%.
ProDVa reduces PAE below 10 by 9.6%.
Abstract
Protein design is a fundamental challenge in biotechnology, aiming to design novel sequences with specific functions within the vast space of possible proteins. Recent advances in deep generative models have enabled function-based protein design from textual descriptions, yet struggle with structural plausibility. Inspired by classical protein design methods that leverage natural protein structures, we explore whether incorporating fragments from natural proteins can enhance foldability in generative models. Our empirical results show that even random incorporation of fragments improves foldability. Building on this insight, we introduce ProDVa, a novel protein design approach that integrates a text encoder for functional descriptions, a protein language model for designing proteins, and a fragment encoder to dynamically retrieve protein fragments based on textual functional…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsChemical Synthesis and Analysis · Machine Learning in Bioinformatics
