Running PeptideProphet Separately on Replicates Improves Peptide   Identification Results

Chao Yang; Zengyou He; Weichuan Yu

arXiv:1211.6198·q-bio.QM·December 4, 2012

Running PeptideProphet Separately on Replicates Improves Peptide Identification Results

Chao Yang, Zengyou He, Weichuan Yu

PDF

Open Access

TL;DR

Running PeptideProphet separately on each replicate and then combining results enhances peptide identification accuracy in shotgun proteomics, leveraging the Bagging principle for better spectrum coverage and statistical power.

Contribution

This paper introduces a novel approach of applying PeptideProphet separately on replicates and combining results, improving peptide identification over the traditional merging method.

Findings

01

Consistent improvement on standard protein dataset

02

Enhanced results on Human and Yeast datasets

03

Demonstrates effectiveness of replicate-specific analysis

Abstract

Limited spectrum coverage is a problem in shotgun proteomics. Replicates are generated to improve the spectrum coverage. When integrating peptide identification results obtained from replicates, the state-of-the-art algorithm PeptideProphet combines Peptide-Spectrum Matches (PSMs) before building the statistical model to calculate peptide probabilities. In this paper, we find the connection between merging results of replicates and Bagging, which is a standard routine to improve the power of statistical methods. Following Bagging's philosophy, we propose to run PeptideProphet separately on each replicate and combine the outputs to obtain the final peptide probabilities. In our experiments, we show that the proposed routine can improve PeptideProphet consistently on a standard protein dataset, a Human dataset and a Yeast dataset.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Proteomics Techniques and Applications · Machine Learning in Bioinformatics · Gene expression and cancer classification