ProTranslator: zero-shot protein function prediction using textual description
Hanwen Xu, Sheng Wang

TL;DR
ProTranslator introduces a novel zero-shot approach to protein function prediction by translating textual function descriptions into amino acid sequences, enabling annotation of functions without known associated proteins and improving predictions for novel and sparsely annotated functions.
Contribution
It redefines protein function prediction as a machine translation task, allowing transfer of annotations from similar textual descriptions and enabling predictions for functions with no prior protein annotations.
Findings
Significant improvement in annotating novel functions.
Effective prediction of gene members for pathways based on descriptions.
Enables generation of textual descriptions for proteins.
Abstract
Accurately finding proteins and genes that have a certain function is the prerequisite for a broad range of biomedical applications. Despite the encouraging progress of existing computational approaches in protein function prediction, it remains challenging to annotate proteins to a novel function that is not collected in the Gene Ontology and does not have any annotated proteins. This limitation, a side effect from the widely-used multi-label classification problem setting of protein function prediction, hampers the progress of studying new pathways and biological processes, and further slows down research in various biomedical areas. Here, we tackle this problem by annotating proteins to a function only based on its textual description so that we do not need to know any associated proteins for this function. The key idea of our method ProTranslator is to redefine protein function…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Bioinformatics · Bioinformatics and Genomic Networks · Biomedical Text Mining and Ontologies
MethodsOntology
