Neighborhood-Based Label Propagation in Large Protein Graphs
Sabeur Aridhi (1), Seyed Ziaeddin Alborzi (1), Malika Sma\"il-Tabbone, (2), Marie-Dominique Devignes (1), David Ritchie (1) ((1) CAPSID, (2), ORPAILLEUR)

TL;DR
This paper introduces DistNBLP, a distributed label propagation method leveraging protein similarity graphs to automatically annotate large-scale protein datasets efficiently.
Contribution
The paper presents a novel distributed label propagation algorithm, DistNBLP, optimized for large protein graphs using the akka toolkit for scalable protein function annotation.
Findings
Efficient annotation of large protein datasets achieved.
Distributed approach scales well with data size.
Improved accuracy over traditional methods.
Abstract
Understanding protein function is one of the keys to understanding life at the molecular level. It is also important in several scenarios including human disease and drug discovery. In this age of rapid and affordable biological sequencing, the number of sequences accumulating in databases is rising with an increasing rate. This presents many challenges for biologists and computer scientists alike. In order to make sense of this huge quantity of data, these sequences should be annotated with functional properties. UniProtKB consists of two components: i) the UniProtKB/Swiss-Prot database containing protein sequences with reliable information manually reviewed by expert bio-curators and ii) the UniProtKB/TrEMBL database that is used for storing and processing the unknown sequences. Hence, for all proteins we have available the sequence along with few more information such as the taxon…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Bioinformatics · Bioinformatics and Genomic Networks · Gene expression and cancer classification
