CFP-Gen: Combinatorial Functional Protein Generation via Diffusion Language Models
Junbo Yin, Chao Zha, Wenjia He, Chencheng Xu, Xin Gao

TL;DR
CFP-Gen is a diffusion language model that enables the design of novel, multifunctional proteins by integrating multiple constraints across functional, sequence, and structural modalities, improving control and diversity in protein generation.
Contribution
The paper introduces CFP-Gen, a novel multimodal diffusion model with modules for functional annotation guidance and residue-level control, advancing de novo protein design capabilities.
Findings
High-throughput generation of functional proteins
Achieves high success rate in designing multifunctional proteins
Generates proteins with functionality comparable to natural ones
Abstract
Existing PLMs generate protein sequences based on a single-condition constraint from a specific modality, struggling to simultaneously satisfy multiple constraints across different modalities. In this work, we introduce CFP-Gen, a novel diffusion language model for Combinatorial Functional Protein GENeration. CFP-Gen facilitates the de novo protein design by integrating multimodal conditions with functional, sequence, and structural constraints. Specifically, an Annotation-Guided Feature Modulation (AGFM) module is introduced to dynamically adjust the protein feature distribution based on composable functional annotations, e.g., GO terms, IPR domains and EC numbers. Meanwhile, the Residue-Controlled Functional Encoding (RCFE) module captures residue-wise interaction to ensure more precise control. Additionally, off-the-shelf 3D structure encoders can be seamlessly integrated to impose…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsBiochemical and Structural Characterization · vaccines and immunoinformatics approaches · Protein Structure and Dynamics
MethodsDiffusion
