Accurate de novo sequencing of the modified proteome with OmniNovo
Yuhan Chen, Shang Qu, Zhiqiang Gao, Yuejin Yang, Xiang Zhang, Sheng Xu, Xinjie Mao, Liujia Qian, Jiaqi Wei, Zijie Qiu, Chenyu You, Lei Bai, Ning Ding, Tiannan Guo, Bowen Zhou, Siqi Sun

TL;DR
OmniNovo is a deep learning framework that enables reference-free, accurate sequencing of both unmodified and modified peptides from tandem mass spectra, significantly expanding proteome analysis capabilities.
Contribution
It introduces a universal deep learning model that learns fragmentation rules for diverse PTMs, overcoming limitations of existing methods and enabling unbiased proteome analysis.
Findings
Identifies 51% more peptides than standard methods at 1% FDR.
Generalizes to unseen biological PTM sites, revealing dark proteome regions.
Achieves state-of-the-art accuracy in PTM peptide sequencing.
Abstract
Post-translational modifications (PTMs) serve as a dynamic chemical language regulating protein function, yet current proteomic methods remain blind to a vast portion of the modified proteome. Standard database search algorithms suffer from a combinatorial explosion of search spaces, limiting the identification of uncharacterized or complex modifications. Here we introduce OmniNovo, a unified deep learning framework for reference-free sequencing of unmodified and modified peptides directly from tandem mass spectra. Unlike existing tools restricted to specific modification types, OmniNovo learns universal fragmentation rules to decipher diverse PTMs within a single coherent model. By integrating a mass-constrained decoding algorithm with rigorous false discovery rate estimation, OmniNovo achieves state-of-the-art accuracy, identifying 51\% more peptides than standard approaches at a 1\%…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Bioinformatics · Advanced Proteomics Techniques and Applications · RNA and protein synthesis mechanisms
