# Peptide Mapping for Sequence Confirmation of Therapeutic Proteins and Recombinant Vaccine Antigens by High-Resolution Mass Spectrometry: Software Limitations, Pitfalls, and Lessons Learned

**Authors:** Mateusz Dobrowolski, Małgorzata Urbaniak, Tadeusz Pietrucha

PMC · DOI: 10.3390/ijms26209962 · International Journal of Molecular Sciences · 2025-10-13

## TL;DR

This paper discusses how software used in mass spectrometry can misidentify protein sequences, emphasizing the need for expert review to avoid errors in therapeutic protein analysis.

## Contribution

The paper highlights specific software limitations in peptide mapping and provides examples of misidentifications in therapeutic proteins and SARS-CoV-2 spike variants.

## Key findings

- Commercial software misidentifies peptides due to isobaric dipeptides like SA vs. GT in antibody analysis.
- Artificial succinylation and incorrect deamidation site assignments occur in SARS-CoV-2 spike protein analysis.
- Manual expert review is essential to distinguish true sequence variants from software artifacts.

## Abstract

Peptide mapping is a well-established method for confirming the identity of therapeutic proteins as part of batch release testing and product characterization for regulatory filings. Traditionally based on enzymatic digestion followed by reversed-phase liquid chromatography and UV detection, the method has evolved with technological advancements to incorporate mass spectrometry (MS), enabling more detailed structural insights. Residue-level confirmation of amino acid sequences requires MS/MS fragmentation, which produces large amounts of data that must be processed using specialized software. In regulated environments, the use of academic algorithms is often limited by validation requirements, making it necessary to rely on commercially approved tools, although their built-in scoring systems have limitations that can affect sequence assignment accuracy. Here, we present representative examples of incorrect peptide assignments generated by commercial software. In antibody sequence analysis, misidentifications resulted from isobaric and near-isobaric dipeptides (e.g., SA vs. GT). Additional examples from the analysis of SARS-CoV-2 spike protein variants revealed software-induced artifacts, including artificial succinylation of aspartic acid residues to compensate for sequence mismatches, and incorrect deamidation site assignments due to misinterpretation of isotopic peaks. These findings underscore the necessity for expert manual review of MS/MS data, even when using validated commercial platforms, and highlight the molecular challenges in distinguishing true sequence variants from software-driven artifacts.

## Linked entities

- **Diseases:** SARS-CoV-2 (MONDO:0100096)

## Full-text entities

- **Chemicals:** dipeptides (MESH:D004151), aspartic acid (MESH:D001224), SA (MESH:D000077145)
- **Species:** Severe acute respiratory syndrome coronavirus 2 (no rank) [taxon 2697049]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12563827/full.md

## Figures

13 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12563827/full.md

## References

52 references — full list in the complete paper: https://tomesphere.com/paper/PMC12563827/full.md

---
Source: https://tomesphere.com/paper/PMC12563827