AdaNovo: Adaptive \emph{De Novo} Peptide Sequencing with Conditional Mutual Information
Jun Xia, Shaorong Chen, Jingbo Zhou, Tianze Ling, Wenjie Du, Sizhe, Liu, Stan Z. Li

TL;DR
AdaNovo introduces an adaptive de novo peptide sequencing framework leveraging conditional mutual information to improve PTM detection and robustness against noisy spectra, achieving state-of-the-art results across multiple species.
Contribution
It presents a novel adaptive training method using CMI for better peptide sequencing, especially for modified amino acids and noisy data.
Findings
Outperforms existing methods on a 9-species benchmark.
Effectively identifies post-translational modifications.
Robust against noise and missing peaks in spectra.
Abstract
Tandem mass spectrometry has played a pivotal role in advancing proteomics, enabling the analysis of protein composition in biological samples. Despite the development of various deep learning methods for identifying amino acid sequences (peptides) responsible for observed spectra, challenges persist in \emph{de novo} peptide sequencing. Firstly, prior methods struggle to identify amino acids with post-translational modifications (PTMs) due to their lower frequency in training data compared to canonical amino acids, further resulting in decreased peptide-level identification precision. Secondly, diverse types of noise and missing peaks in mass spectra reduce the reliability of training data (peptide-spectrum matches, PSMs). To address these challenges, we propose AdaNovo, a novel framework that calculates conditional mutual information (CMI) between the spectrum and each amino…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsChemical Synthesis and Analysis · Glycosylation and Glycoproteins Research · Machine Learning in Bioinformatics
MethodsSparse Evolutionary Training
