OntoProtein: Protein Pretraining With Gene Ontology Embedding
Ningyu Zhang, Zhen Bi, Xiaozhuan Liang, Siyuan Cheng, Haosen Hong,, Shumin Deng, Jiazhang Lian, Qiang Zhang, Huajun Chen

TL;DR
OntoProtein introduces a novel protein pretraining framework that integrates Gene Ontology knowledge graphs with contrastive learning, significantly improving protein representation and prediction tasks over existing models.
Contribution
This work is the first to incorporate Gene Ontology structure into protein pretraining using a knowledge graph and contrastive learning, enhancing protein representations.
Findings
Outperforms state-of-the-art models on TAPE benchmark.
Achieves better results in protein-protein interaction prediction.
Improves protein function prediction accuracy.
Abstract
Self-supervised protein language models have proved their effectiveness in learning the proteins representations. With the increasing computational power, current protein language models pre-trained with millions of diverse sequences can advance the parameter scale from million-level to billion-level and achieve remarkable improvement. However, those prevailing approaches rarely consider incorporating knowledge graphs (KGs), which can provide rich structured knowledge facts for better protein representations. We argue that informative biology knowledge in KGs can enhance protein representation with external knowledge. In this work, we propose OntoProtein, the first general framework that makes use of structure in GO (Gene Ontology) into protein pre-training models. We construct a novel large-scale knowledge graph that consists of GO and its related proteins, and gene annotation texts or…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMachine Learning in Bioinformatics · Bioinformatics and Genomic Networks · Biomedical Text Mining and Ontologies
MethodsContrastive Learning
