Controllable Protein Sequence Generation with LLM Preference Optimization
Xiangyu Liu, Yi Liu, Silei Chen, Wei Hu

TL;DR
This paper introduces CtrlProt, a novel method for controllable protein sequence generation using fine-tuned large language models optimized with a multi-listwise preference strategy, achieving state-of-the-art results in attribute control.
Contribution
The paper presents a new fine-tuning approach with preference optimization for improved controllability and stability in protein sequence generation.
Findings
Achieves state-of-the-art performance in multi-attribute control.
Improves functionality and structural stability of generated proteins.
Supports both single-attribute and multi-attribute generation.
Abstract
Designing proteins with specific attributes offers an important solution to address biomedical challenges. Pre-trained protein large language models (LLMs) have shown promising results on protein sequence generation. However, to control sequence generation for specific attributes, existing work still exhibits poor functionality and structural stability. In this paper, we propose a novel controllable protein design method called CtrlProt. We finetune a protein LLM with a new multi-listwise preference optimization strategy to improve generation quality and support multi-attribute controllable generation. Experiments demonstrate that CtrlProt can meet functionality and structural stability requirements effectively, achieving state-of-the-art performance in both single-attribute and multi-attribute protein sequence generation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsGlycosylation and Glycoproteins Research · Genetics, Bioinformatics, and Biomedical Research · Microbial Metabolic Engineering and Bioproduction
