Parallel and Scalable Precise Clustering for Homologous Protein   Discovery

Stuart Byma; Akash Dhasade; Adrian Altenhoff; Christophe Dessimoz,; James R. Larus

arXiv:1908.10574·cs.DC·August 29, 2019

Parallel and Scalable Precise Clustering for Homologous Protein Discovery

Stuart Byma, Akash Dhasade, Adrian Altenhoff, Christophe Dessimoz,, James R. Larus

PDF

TL;DR

This paper introduces ClusterMerge, a parallel algorithm for precise protein clustering that significantly speeds up homologous protein discovery while maintaining high accuracy and scalability.

Contribution

The paper presents ClusterMerge, a novel parallel clustering algorithm that leverages transitive relationships for scalable and efficient homologous protein identification.

Findings

01

Achieves 99.8% recall of similar pairs compared to full comparison

02

Attains 604× speedup on 768 cores

03

Maintains high parallel and distributed scalability

Abstract

This paper presents a new, parallel implementation of clustering and demonstrates its utility in greatly speeding up the process of identifying homologous proteins. Clustering is a technique to reduce the number of comparison needed to find similar pairs in a set of $n$ elements such as protein sequences. Precise clustering ensures that each pair of similar elements appears together in at least one cluster, so that similarities can be identified by all-to-all comparison in each cluster rather than on the full set. This paper introduces ClusterMerge, a new algorithm for precise clustering that uses transitive relationships among the elements to enable parallel and scalable implementations of this approach. We apply ClusterMerge to the important problem of finding similar amino acid sequences in a collection of proteins. ClusterMerge identifies 99.8% of similar pairs found by a full…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.