Improving the Results of De novo Peptide Identification via Tandem Mass Spectrometry Using a Genetic Programming-based Scoring Function for Re-ranking Peptide-Spectrum Matches
Samaneh Azari, Bing Xue, Mengjie Zhang, Lifeng Peng

TL;DR
This paper introduces GP-PSM, a genetic programming-based scoring function that enhances peptide-spectrum match confidence in de novo peptide sequencing, outperforming existing methods like RF and SVR in accuracy and false positive reduction.
Contribution
The study develops a novel GP-based scoring function for PSM evaluation, improving confidence and accuracy in peptide identification from MS/MS data.
Findings
GP-PSM outperforms RF and SVR in discriminating correct PSMs
Increases peptide assignment accuracy by 10%
Reduces false positive rate in peptide identification
Abstract
De novo peptide sequencing algorithms have been widely used in proteomics to analyse tandem mass spectra (MS/MS) and assign them to peptides, but quality-control methods to evaluate the confidence of de novo peptide sequencing are lagging behind. A fundamental part of a quality-control method is the scoring function used to evaluate the quality of peptide-spectrum matches (PSMs). Here, we propose a genetic programming (GP) based method, called GP-PSM, to learn a PSM scoring function for improving the rate of confident peptide identification from MS/MS data. The GP method learns from thousands of MS/MS spectra. Important characteristics about goodness of the matches are extracted from the learning set and incorporated into the GP scoring functions. We compare GP-PSM with two methods including Support Vector Regression (SVR) and Random Forest (RF). The GP method along with RF and SVR,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Proteomics Techniques and Applications · Machine Learning in Bioinformatics · Mass Spectrometry Techniques and Applications
