Automation of gene function prediction through modeling human curators' decisions in GO phylogenetic annotation project
Haiming Tang, Paul D Thomas, Huaiyu Mi

TL;DR
This paper introduces an automated pipeline that models human curator decisions to streamline gene function annotation in the GO-PAINT project, reducing manual effort and increasing efficiency.
Contribution
It presents a novel automated pipeline that simulates curator decisions, enabling faster and scalable gene function annotation in phylogenetic analysis.
Findings
Automated pipeline successfully models curator decisions.
Over 4000 phylogenetic families annotated manually.
Pipeline available for public use and integration.
Abstract
The Gene Ontology Consortium launched the GO-PAINT project (Phylogenetic Annotation and INference Tool) 9 years ago and is currently being used in the GO Reference Genome Annotation Project to support inference of GO function terms (molecular function, cellular component and biological process) by homology. PAINT uses a phylogenetic model to infer gene function by homology, a process that requires manual curation of experienced biocurators. Tremendous amount of time and efforts have been spent on the GO-PAINT project yielding more than 4000 fully annotated phylogenetic families with more than 170,000 annotations. These preliminary data have thus enabled potential algorithmic representation and automatic solvation of the additional 9000 unannoated phylogenetic families. Here we present an automated pipeline for phylogenetic annotation and inference, which simulates the standard…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBioinformatics and Genomic Networks · Genomics and Phylogenetic Studies · Biomedical Text Mining and Ontologies
