On the unknown proteins of eukaryotic proteomes
Yves-Henri Sanejouand

TL;DR
This study analyzes unknown proteins across diverse eukaryotic proteomes, revealing that many are species-specific, poorly predicted by structure models, and that singleton counts vary with evolutionary distance, especially in metazoans.
Contribution
It establishes a large-scale reference system for eukaryotic proteomes and uncovers patterns in unknown proteins and their evolution across different lineages.
Findings
Most singletons are not known at the protein level.
AlphaFold2 predictions for singletons are generally poor.
Singleton counts increase with evolutionary distance in metazoans.
Abstract
In order to study unknown proteins on a large scale, a reference system has been set up for the three major eukaryotic lineages, built with 36 proteomes as taxonomically diverse as possible. Proteins from 362 eukaryotic proteomes with no known homologue in this set were then analyzed, focusing noteworthy on singletons, that is, on unknown proteins with no known homologue in their own proteome. Consistently, according to Uniprot, for a given species, no more than 12% of the singletons thus found are known at the protein level. Also, since they rely on the information found in the alignment of homologous sequences, predictions of AlphaFold2 for their tridimensional structure are usually poor. In the case of metazoan species, the number of singletons seems to increase as a function of the evolutionary distance from the reference system. Interestingly, no such trend is found in the cases of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Advanced Proteomics Techniques and Applications · Identification and Quantification in Food
