Controllable Protein Sequence Generation with LLM Preference   Optimization

Xiangyu Liu; Yi Liu; Silei Chen; Wei Hu

arXiv:2501.15007·cs.AI·January 28, 2025

Controllable Protein Sequence Generation with LLM Preference Optimization

Xiangyu Liu, Yi Liu, Silei Chen, Wei Hu

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces CtrlProt, a novel method for controllable protein sequence generation using fine-tuned large language models optimized with a multi-listwise preference strategy, achieving state-of-the-art results in attribute control.

Contribution

The paper presents a new fine-tuning approach with preference optimization for improved controllability and stability in protein sequence generation.

Findings

01

Achieves state-of-the-art performance in multi-attribute control.

02

Improves functionality and structural stability of generated proteins.

03

Supports both single-attribute and multi-attribute generation.

Abstract

Designing proteins with specific attributes offers an important solution to address biomedical challenges. Pre-trained protein large language models (LLMs) have shown promising results on protein sequence generation. However, to control sequence generation for specific attributes, existing work still exhibits poor functionality and structural stability. In this paper, we propose a novel controllable protein design method called CtrlProt. We finetune a protein LLM with a new multi-listwise preference optimization strategy to improve generation quality and support multi-attribute controllable generation. Experiments demonstrate that CtrlProt can meet functionality and structural stability requirements effectively, achieving state-of-the-art performance in both single-attribute and multi-attribute protein sequence generation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nju-websoft/CtrlProt
pytorchOfficial

Videos

Controllable Protein Sequence Generation with LLM Preference Optimization· underline

Taxonomy

TopicsGlycosylation and Glycoproteins Research · Genetics, Bioinformatics, and Biomedical Research · Microbial Metabolic Engineering and Bioproduction