AnnoDPO: Protein Functional Annotation Learning with Direct Preference Optimization

Zixuan Jiang; Renjing Xu

arXiv:2506.07035·q-bio.BM·June 10, 2025

AnnoDPO: Protein Functional Annotation Learning with Direct Preference Optimization

Zixuan Jiang, Renjing Xu

PDF

Open Access 1 Repo

TL;DR

AnnoDPO introduces a novel framework for protein function prediction that uses Direct Preference Optimization to better handle annotation scarcity and imbalance, improving biological knowledge integration in protein models.

Contribution

It pioneers the application of Direct Preference Optimization in protein function learning, addressing annotation challenges with a preference-aligned training approach.

Findings

01

Enhanced annotation learning through preference alignment

02

Improved handling of data imbalance in protein annotations

03

Establishment of a new paradigm for biological knowledge integration

Abstract

Deciphering protein function remains a fundamental challenge in protein representation learning. The task presents significant difficulties for protein language models (PLMs) due to the sheer volume of functional annotation categories and the highly imbalanced distribution of annotated instances across biological ontologies. Inspired by the remarkable success of reinforcement learning from human feedback (RLHF) in large language model (LLM) alignment, we propose AnnoDPO, a novel multi-modal framework for protein function prediction that leverages Direct Preference Optimization (DPO) to enhance annotation learning. Our methodology addresses the dual challenges of annotation scarcity and category imbalance through preference-aligned training objectives, establishing a new paradigm for biological knowledge integration in protein representation learning.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

AzusaXuan/AnnoDPO
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBiomedical Text Mining and Ontologies · Machine Learning in Bioinformatics · Genomics and Rare Diseases