AnnoDPO: Protein Functional Annotation Learning with Direct Preference Optimization
Zixuan Jiang, Renjing Xu

TL;DR
AnnoDPO introduces a novel framework for protein function prediction that uses Direct Preference Optimization to better handle annotation scarcity and imbalance, improving biological knowledge integration in protein models.
Contribution
It pioneers the application of Direct Preference Optimization in protein function learning, addressing annotation challenges with a preference-aligned training approach.
Findings
Enhanced annotation learning through preference alignment
Improved handling of data imbalance in protein annotations
Establishment of a new paradigm for biological knowledge integration
Abstract
Deciphering protein function remains a fundamental challenge in protein representation learning. The task presents significant difficulties for protein language models (PLMs) due to the sheer volume of functional annotation categories and the highly imbalanced distribution of annotated instances across biological ontologies. Inspired by the remarkable success of reinforcement learning from human feedback (RLHF) in large language model (LLM) alignment, we propose AnnoDPO, a novel multi-modal framework for protein function prediction that leverages Direct Preference Optimization (DPO) to enhance annotation learning. Our methodology addresses the dual challenges of annotation scarcity and category imbalance through preference-aligned training objectives, establishing a new paradigm for biological knowledge integration in protein representation learning.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Machine Learning in Bioinformatics · Genomics and Rare Diseases
