Communication Lower-Bounds for Distributed-Memory Computations for Mass Spectrometry based Omics Data
Fahad Saeed, Muhammad Haseeb, SS Iyengar

TL;DR
This paper establishes fundamental communication lower bounds for parallel algorithms in mass spectrometry-based omics data analysis, highlighting the gap in current methods and emphasizing the need for communication-efficient algorithms for large-scale biological data processing.
Contribution
It proves the theoretical communication bounds for existing and optimal parallel algorithms in proteomics, and validates these bounds through analysis and experiments, urging development of provably optimal methods.
Findings
Existing algorithms do not achieve the communication lower bounds.
Optimal strategies can significantly reduce communication costs.
Current methods exhibit sub-optimal speedups with more processors.
Abstract
Mass spectrometry (MS) based omics data analysis require significant time and resources. To date, few parallel algorithms have been proposed for deducing peptides from mass spectrometry-based data. However, these parallel algorithms were designed, and developed when the amount of data that needed to be processed was smaller in scale. In this paper, we prove that the communication bound that is reached by the \emph{existing} parallel algorithms is , where and are the dimensions of the theoretical database matrix, and are dimensions of spectra, and is the number of processors. We further prove that communication-optimal strategy with fast-memory can achieve but is not achieved by any existing parallel proteomics algorithms till date. To validate our claim, we performed a meta-analysis…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
