# DPAS: disease-associated peptide anomaly score for identifying pathogenic peptides via one-class learning

**Authors:** Zoya Khalid, Razia Khalid, Osman Ugur Sezerman

PMC · DOI: 10.1038/s41598-026-40099-0 · Scientific Reports · 2026-02-15

## TL;DR

This paper introduces DPAS, a new method to identify disease-related peptides using machine learning without needing negative examples.

## Contribution

The novelty lies in using one-class learning and a new scoring metric DPAS for peptide biomarker discovery.

## Key findings

- Autoencoders outperformed OCSVM and Isolation Forest in identifying disease-associated peptides.
- DPAS combines anomaly scores and feature importance for interpretable peptide ranking.

## Abstract

Predicting disease-associated peptides is a challenging task in bioinformatics, mostly hindered by the lack of reliable negative datasets, leading to biased predictions. In this study, we propose a one-class classification approach that focuses exclusively on positive-labeled data. We employed three classifiers namely One-Class Support Vector Machines (OCSVM), Isolation Forest, and Autoencoders to classify disease-associated peptides, with Autoencoders yielding the best results. The Autoencoders trained on the positive dataset effectively differentiated the inliers from outliers which is further evaluated by mean reconstruction errors. Our method combines various sequence based features together. This framework provides an efficient solution for predicting disease-associated peptides that also overcomes the traditional binary classification approaches. To enhance interpretability and peptide prioritization, we introduce a new scoring metric Disease Peptide Anomaly Score (DPAS) which combines model-derived anomaly scores with feature importance values obtained using SHAP (SHapley Additive exPlanations). DPAS facilitates the ranking of peptides based on their likelihood of being disease-associated, offering a robust and interpretable approach for peptide biomarker discovery.

The online version contains supplementary material available at 10.1038/s41598-026-40099-0.

## Full-text entities

- **Genes:** AP2B1 (adaptor related protein complex 2 subunit beta 1) [NCBI Gene 163] {aka ADTB2, AP105B, AP2-BETA, CLAPB1}, PKD2 (polycystin 2, transient receptor potential cation channel) [NCBI Gene 5311] {aka APKD2, PC2, PKD4, Pc-2, TRPP2}, ITIH2 (inter-alpha-trypsin inhibitor heavy chain 2) [NCBI Gene 3698] {aka H2P, ITI-HC2, SHAP}, PCSK1 (proprotein convertase subtilisin/kexin type 1) [NCBI Gene 5122] {aka BMIQ12, NEC1, PC1, PC1/3, PC3, SPC3}, GLYAT (glycine-N-acyltransferase) [NCBI Gene 10249] {aka ACGNAT, GAT}
- **Diseases:** Disease (MESH:D004194), hemolysis (MESH:D006461), inflammatory (MESH:D007249)
- **Chemicals:** Dipeptide (MESH:D004151), Amino acid (MESH:D000596), nucleotide (MESH:D009711), metal (MESH:D008670)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12996630/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12996630/full.md

## References

4 references — full list in the complete paper: https://tomesphere.com/paper/PMC12996630/full.md

---
Source: https://tomesphere.com/paper/PMC12996630