An Efficient Algorithm for Clustering of Large-Scale Mass Spectrometry   Data

Fahad Saeed; Trairak Pisitkun; Mark A. Knepper; Jason D.; Hoffert

arXiv:1301.0834·cs.DS·January 8, 2013

An Efficient Algorithm for Clustering of Large-Scale Mass Spectrometry Data

Fahad Saeed, Trairak Pisitkun, Mark A. Knepper, Jason D., Hoffert

PDF

TL;DR

The paper introduces CAMS, an efficient clustering algorithm for large-scale mass spectrometry data that improves spectral clustering accuracy and reduces computational time using a novel F-set metric and graph theoretic framework.

Contribution

CAMS is a new clustering algorithm that enhances sensitivity and confidence in spectral assignment for large mass spectrometry datasets, using a novel F-set metric and graph theory.

Findings

01

High clustering accuracy on real datasets

02

Significant reduction in computational time

03

Improved spectral interpretation for low S/N spectra

Abstract

High-throughput spectrometers are capable of producing data sets containing thousands of spectra for a single biological sample. These data sets contain a substantial amount of redundancy from peptides that may get selected multiple times in a LC-MS/MS experiment. In this paper, we present an efficient algorithm, CAMS (Clustering Algorithm for Mass Spectra) for clustering mass spectrometry data which increases both the sensitivity and confidence of spectral assignment. CAMS utilizes a novel metric, called F-set, that allows accurate identification of the spectra that are similar. A graph theoretic framework is defined that allows the use of F-set metric efficiently for accurate cluster identifications. The accuracy of the algorithm is tested on real HCD and CID data sets with varying amounts of peptides. Our experiments show that the proposed algorithm is able to cluster spectra with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.