# Modanovo: A Unified Model for Post-translational Modification-Aware De Novo Sequencing Using Experimental Spectra From In Vivo and Synthetic Peptides

**Authors:** Daniela Klaproth-Andrade, Yanik Bruns, Wassim Gabriel, Christian Nix, Valter Bergant, Andreas Pichlmair, Mathias Wilhelm, Julien Gagneur

PMC · DOI: 10.1016/j.mcpro.2025.101501 · Molecular & Cellular Proteomics : MCP · 2025-12-24

## TL;DR

Modanovo is a new model that improves the identification of modified and unmodified peptides from mass spectrometry data using a large dataset of post-translational modifications.

## Contribution

Modanovo extends de novo peptide sequencing to include 19 biologically relevant amino acid–PTM combinations using a transformer-based model.

## Key findings

- Modanovo achieves 92% average precision across PTM combinations and matches Casanovo's performance on unmodified peptides.
- Modanovo outperforms existing PTM-aware models and complements database searches in phosphoproteomics datasets.
- Modanovo identifies new phosphosites in monkeypox virus-infected cells not detected by database search methods.

## Abstract

Post-translational modifications (PTMs) play a central role in cellular regulation and are implicated in numerous diseases. Database searching remains the standard for identifying modified peptides from tandem mass spectra but is hindered by the combinatorial expansion of modification types and sites. De novo peptide sequencing offers an attractive alternative, yet existing methods remain limited to unmodified peptides or a narrow set of PTMs. Here, we curated a large dataset of spectra from endogenous and synthetic peptides from ProteomeTools spanning 19 biologically relevant amino acid-PTM combinations, covering phosphorylation, acetylation, and ubiquitination. We used this dataset to develop Modanovo, an extension of the Casanovo transformer architecture for de novo peptide sequencing. Modanovo achieved robust performance across these amino acid-PTM combinations (median area under the precision-coverage curve 0.92), while maintaining performance on unmodified peptides (0.93), nearly identical to Casanovo (0.94). The model outperformed π-PrimeNovo-PTM and InstaNovo-P and showed increased precision and complementarity to the database search tool MSFragger. Robustness was confirmed across independent datasets, particularly at peptide lengths frequently represented in the curated dataset. Applied to a phosphoproteomics dataset from monkeypox virus-infected cells, Modanovo recovered numerous confident peptides not reported by database search, including new viral phosphosites supported by spectral evidence, thereby demonstrating its complementarity to database-driven identification approaches. These results establish Modanovo as a broadly applicable model for comprehensive de novo sequencing of both modified and unmodified peptides.

•Large dataset of PSM covering 19 biologically relevant amino acid-PTM combinations.•De novo peptide sequencing model with 92% average precision across PTMs.•Beats previous models on their restricted PTM set and matches Casanovo on unmodified.•Beats database search in both 19 amino acid-PTM-restricted and open-search modes.•Reveals new P-sites complementing database search on monkeypox virus-infected cells.

Large dataset of PSM covering 19 biologically relevant amino acid-PTM combinations.

De novo peptide sequencing model with 92% average precision across PTMs.

Beats previous models on their restricted PTM set and matches Casanovo on unmodified.

Beats database search in both 19 amino acid-PTM-restricted and open-search modes.

Reveals new P-sites complementing database search on monkeypox virus-infected cells.

Modanovo is a transformer-based de novo peptide sequencing model that expands Casanovo to identify both modified and unmodified peptides. Trained on a large dataset spanning 19 biologically relevant amino acid–PTM combinations, it achieves robust performance across phosphorylation, acetylation, and ubiquitination while maintaining strong performance on unmodified peptides. Modanovo outperforms existing PTM-aware de novo peptide sequencing methods and complements database searches, enabling confident recovery of peptides and novel phosphosites, thereby providing a broadly applicable framework for comprehensive PTM-inclusive de novo sequencing.

## Linked entities

- **Diseases:** monkeypox (MONDO:0002594)

## Full-text entities

- **Chemicals:** Modanovo (-)
- **Species:** Monkeypox virus (no rank) [taxon 10244]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12860953/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12860953/full.md

## References

61 references — full list in the complete paper: https://tomesphere.com/paper/PMC12860953/full.md

---
Source: https://tomesphere.com/paper/PMC12860953