STAR-GO: Improving Protein Function Prediction by Learning to Hierarchically Integrate Ontology-Informed Semantic Embeddings
Mehmet Efe Ak\c{c}a, G\"ok\c{c}e Uludo\u{g}an, Arzucan \"Ozg\"ur, \.Inci M. Bayta\c{s}

TL;DR
STAR-GO is a Transformer-based model that combines semantic definitions and hierarchical structure of Gene Ontology to improve protein function prediction, especially for unseen GO terms, by learning unified, hierarchical embeddings.
Contribution
It introduces a novel hierarchical integration of semantic and structural GO information within a Transformer framework for enhanced zero-shot protein function prediction.
Findings
Achieves state-of-the-art performance in protein function prediction.
Demonstrates superior zero-shot generalization to unseen GO terms.
Effectively integrates textual and structural GO data for robust embeddings.
Abstract
Accurate prediction of protein function is essential for elucidating molecular mechanisms and advancing biological and therapeutic discovery. Yet experimental annotation lags far behind the rapid growth of protein sequence data. Computational approaches address this gap by associating proteins with Gene Ontology (GO) terms, which encode functional knowledge through hierarchical relations and textual definitions. However, existing models often emphasize one modality over the other, limiting their ability to generalize, particularly to unseen or newly introduced GO terms that frequently arise as the ontology evolves, and making the previously trained models outdated. We present STAR-GO, a Transformer-based framework that jointly models the semantic and structural characteristics of GO terms to enhance zero-shot protein function prediction. STAR-GO integrates textual definitions with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBioinformatics and Genomic Networks · Biomedical Text Mining and Ontologies · Machine Learning in Bioinformatics
