Analysis of Novel Annotations in the Gene Ontology for Boosting the Selection of Negative Examples
Maryam Sepehri, Marco Frasca

TL;DR
This paper analyzes the evolution of Gene Ontology annotations to improve the selection of negative examples for protein function prediction, introducing a novel algorithm validated on multiple organisms.
Contribution
It provides the first extensive analysis of GO annotation evolution and proposes a new algorithm for selecting reliable negative examples in protein function prediction.
Findings
Novel annotations reveal patterns in GO hierarchy.
Identified proteins likely unreliable as negative examples.
Validated algorithm improves negative sample selection.
Abstract
Public repositories for genome and proteome annotations, such as the Gene Ontology (GO), rarely stores negative annotations, i.e. proteins not possessing a given function. This leaves undefined or ill defined the set of negative examples, which is crucial for training the majority of machine learning methods inferring proteins functions. Automated techniques to choose reliable negative proteins are thereby required to train accurate function prediction models. This study proposes the first extensive analysis of the temporal evolution of protein annotations in the GO repository. Novel annotations registered through the years have been analyzed to verify the presence of annotation patterns in the GO hierarchy. Our research supplied fundamental clues about proteins likely to be unreliable as negative examples, that we verified into a novel algorithm of our own construction, validated on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
