GOAnnotator: accurate protein function annotation using automatically retrieved literature
Huiying Yan, Hancheng Liu, Shaojun Wang, Shanfeng Zhu

TL;DR
GOAnnotator is a new tool that improves automated protein function annotation by using automatically retrieved literature, making the process more efficient and accurate.
Contribution
GOAnnotator introduces a novel framework combining improved literature retrieval and enhanced GO term identification for automated protein function annotation.
Findings
GOAnnotator outperforms GORetriever in realistic scenarios by uncovering unique literature.
The method predicts additional protein functions not previously identified.
Experiments on benchmark datasets show high-quality functional annotations.
Abstract
Automated protein function prediction/annotation (AFP) is vital for understanding biological processes and advancing biomedical research. Existing text-based AFP methods including the state-of-the-art method, GORetriever, rely on expert-curated relevant literature, which is costly and time-consuming, and cover only a small portion of the proteins in UniProt. To overcome this limitation, we propose GOAnnotator, a novel framework for automated protein function annotation. It consists of two key modules: PubRetriever, a hybrid system for retrieving and re-ranking relevant literature, and GORetriever+, an enhanced module for identifying Gene Ontology (GO) terms from the retrieved texts. Extensive experiments over three benchmark datasets demonstrate that GOAnnotator delivers high-quality functional annotations, surpassing GORetriever in realistic situations by uncovering unique literature…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBioinformatics and Genomic Networks · Biomedical Text Mining and Ontologies · Machine Learning in Bioinformatics
