# Retention order prediction of peptides containing non-proteinogenic amino acids

**Authors:** Shohei Nakamukai, Eisuke Hayakawa, Tetsuya Mori, Yuji Ise, Masami Yokota Hirai, Masanori Arita

PMC · DOI: 10.1093/bioadv/vbaf246 · Bioinformatics Advances · 2025-10-08

## TL;DR

This paper introduces a method to predict the retention order of peptides with non-proteinogenic amino acids using machine learning, aiding drug development and natural product identification.

## Contribution

A novel approach using ranking SVM and fingerprint analysis to predict retention order of PNPAs without requiring absolute retention time data.

## Key findings

- The model achieved over 0.9 accuracy in predicting retention order of PNPAs.
- SHAP analysis showed methyl group-related fingerprints were most influential in predictions.
- The method enables effective screening of candidate compounds from LC/MS data of unconventional extracts.

## Abstract

Peptides containing non-proteinogenic amino acids (PNPAs) are promising targets in drug development for their unique pharmacological properties. The lack of their mass spectra or retention data has been hindering PNPA research, where accurate assessment of their retention time in chromatography is crucial for identifying structures and characterizing functions. Conventional methods are often ineffective due to limited amount of data. This study aims to predict their retention order, not absolute time, from structures by using data from peptides and small molecules. This approach can advance natural product identification and drug research.

Our model uses the Ranking Support Vector Machine, and successfully predicted the retention order of PNPA with an accuracy of over 0.9. Counting fingerprints and MIX fingerprint, which combines four types of fingerprints, were used as explanatory variables. To suppress the multi-collinearity, principal component analysis was applied to reduce spurious fingerprints. SHAP value analysis revealed that one component, derived from methyl groups, contributed most for the prediction. Overall, order prediction can effectively find candidate compounds from LC/MS data from non-conventional biological extracts.

https://github.com/ShoheiNakamukai/RO_prediction_of_PNPA/tree/main.

## Full-text entities

- **Genes:** SHROOM4 (shroom family member 4) [NCBI Gene 57477] {aka MRXSSDS, SHAP, shrm4}
- **Chemicals:** PNPA (-), amino acids (MESH:D000596)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12643230/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12643230/full.md

## References

30 references — full list in the complete paper: https://tomesphere.com/paper/PMC12643230/full.md

---
Source: https://tomesphere.com/paper/PMC12643230