# PIPI-C: A Combinatorial Optimization Framework for Identifying Post-translational Modification Hot-spots in Mass Spectrometry Data

**Authors:** Shengzhi Lai, Shuaijian Dai, Peize Zhao, Chen Zhou, Ning Li, Weichuan Yu

PMC · DOI: 10.1016/j.mcpro.2025.101494 · 2025-12-23

## TL;DR

PIPI-C is a new computational tool that identifies complex post-translational modification patterns in cancer mass spectrometry data, revealing disease-related regulatory mechanisms.

## Contribution

PIPI-C introduces a mixed integer linear programming model to efficiently detect high-order PTM combinations, overcoming previous computational limitations.

## Key findings

- PIPI-C detects PTM combinations in large-scale cancer datasets with superior performance.
- 50% of LSCC UPSPs contain two or more PTMs, including known crosstalk patterns.
- Upregulated PTM combinations in COAD and GBM align with literature-supported relevance.

## Abstract

Post-translational modifications (PTMs) are pivotal in cellular regulations, and their crosstalk is related to various diseases such as cancer. Given the prevalence of PTM crosstalk within close amino acid ranges, identifying peptides with multiple PTMs is essential. However, this task is an NP-hard combinatorial problem with exponential complexity, posing significant challenges for existing analysis methods. Here, we introduce PIPI-C (PTM-Invariant Peptide Identification with a Combinatorial model), a novel search engine that addresses this challenge through a mixed integer linear programming (MILP) model, thereby overcoming the limitations of existing approaches that struggle with high-order PTM combinations. Rigorous validation across diverse datasets confirms PIPI-C’s superior performance in detecting PTM combinations. When applied to over 72 million mass spectra of three human cancers—lung squamous cell carcinoma (LSCC), colorectal adenocarcinoma (COAD), and glioblastoma (GBM)—PIPI-C reveals significantly upregulated PTM combinations. In LSCC, 50% of 860 upregulated unique PTM site patterns (UPSPs) (when comparing cancer vs. normal samples) carried at least two PTMs, including literature-supported crosstalks such as di-methylation with trifluoroleucine substitution and amidation with proline-to-valine substitution. Similar findings in COAD and GBM highlight PIPI-C’s utility in uncovering cancer-relevant PTM combination landscapes. Overall, PIPI-C provides a robust mathematical framework for decoding complex PTM patterns, advancing our understanding of PTM-driven cellular processes in diseases.

•A mixed integer linear program to identify peptides with multiple PTMs.•Detection of PTM combinations across large-scale cancer mass spectrometry datasets.•Fifty percent of LSCC UPSPs contain 2 or more PTMs, including known cross talk patterns.•Uncovering PTM combinations in COAD and GBM with literature-supported relevance.•A framework for decoding PTM-driven regulatory mechanisms in cancer biology.

A mixed integer linear program to identify peptides with multiple PTMs.

Detection of PTM combinations across large-scale cancer mass spectrometry datasets.

Fifty percent of LSCC UPSPs contain 2 or more PTMs, including known cross talk patterns.

Uncovering PTM combinations in COAD and GBM with literature-supported relevance.

A framework for decoding PTM-driven regulatory mechanisms in cancer biology.

Post-translational modification (PTM) cross talk plays a critical role in disease biology, yet identifying peptides with multiple PTMs remains computationally challenging. We present PIPI-C, a novel search engine using mixed-integer linear programming to resolve high-order PTM combinations. Validated across diverse datasets, PIPI-C reveals significantly upregulated PTM combinations in lung, colon, and brain cancers. Its ability to detect complex PTM patterns provides new insights into cancer-specific regulatory mechanisms and offers a powerful framework for decoding PTM-driven cellular processes.

## Linked entities

- **Diseases:** cancer (MONDO:0004992), lung squamous cell carcinoma (MONDO:0005097), colorectal adenocarcinoma (MONDO:0005008), glioblastoma (MONDO:0018177)
- **Species:** Homo sapiens (taxon 9606)

## Full-text entities

- **Diseases:** lung squamous cell carcinoma (MESH:D002294), COAD (MESH:D003110), GBM (MESH:D005909), LSCC (MESH:D018307), cancer (MESH:D009369)
- **Chemicals:** PIPI-C (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

14 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12834927/full.md

---
Source: https://tomesphere.com/paper/PMC12834927