Protein Inverse Folding From Structure Feedback

Junde Xu; Zijun Gao; Xinyi Zhou; Jie Hu; Xingyi Cheng; Le Song; Guangyong Chen; Pheng-Ann Heng; Jiezhong Qiu

arXiv:2506.03028·cs.LG·June 4, 2025

Protein Inverse Folding From Structure Feedback

Junde Xu, Zijun Gao, Xinyi Zhou, Jie Hu, Xingyi Cheng, Le Song, Guangyong Chen, Pheng-Ann Heng, Jiezhong Qiu

PDF

Open Access

TL;DR

This paper presents a novel method using Direct Preference Optimization to improve protein inverse folding models by leveraging structure feedback, significantly enhancing their ability to generate sequences that fold into desired structures.

Contribution

The paper introduces a DPO-based fine-tuning approach for inverse folding models that utilizes structure feedback to improve sequence design accuracy.

Findings

01

TM-Score increased from 0.77 to 0.81 after DPO fine-tuning

02

Significant improvement in sequence recovery performance

03

Iterative DPO application yields 79.5% TM-Score gain on challenging structures

Abstract

The inverse folding problem, aiming to design amino acid sequences that fold into desired three-dimensional structures, is pivotal for various biotechnological applications. Here, we introduce a novel approach leveraging Direct Preference Optimization (DPO) to fine-tune an inverse folding model using feedback from a protein folding model. Given a target protein structure, we begin by sampling candidate sequences from the inverse-folding model, then predict the three-dimensional structure of each sequence with the folding model to generate pairwise structural-preference labels. These labels are used to fine-tune the inverse-folding model under the DPO objective. Our results on the CATH 4.2 test set demonstrate that DPO fine-tuning not only improves sequence recovery of baseline models but also leads to a significant improvement in average TM-Score from 0.77 to 0.81, indicating enhanced…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsProtein Structure and Dynamics

MethodsDirect Preference Optimization · Sparse Evolutionary Training