# PON-Del predictor for sequence retaining protein deletions

**Authors:** Haoyang Zhang, Muhammad Kabir, Mauno Vihinen, Mohammad Sadegh Taghizadeh, Nir Ben-Tal, Nir Ben-Tal, Nir Ben-Tal

PMC · DOI: 10.1371/journal.pcbi.1014020 · PLOS Computational Biology · 2026-02-25

## TL;DR

This paper introduces PON-Del, a new tool that predicts the impact of protein deletions, including variants of uncertain significance, which were previously ignored.

## Contribution

PON-Del is the first method to include variants of uncertain significance in the prediction of sequence-retaining protein deletions.

## Key findings

- PON-Del outperforms previous methods in predicting the effects of short protein deletions.
- The tool uses a gradient boosting approach trained on a comprehensive dataset of verified deletions.
- PON-Del provides both binary and three-state predictions, including variants of uncertain significance.

## Abstract

Protein deletions are frequent among both disease-causing and tolerated variants. Several mechanisms at the DNA, RNA and protein levels can lead to deletions. Many deletions are misclassified in the literature and databases, especially when the mRNA is degraded by the cellular quality-control mechanism. We developed a novel predictor for sequence retaining protein deletions, i.e., variants that do not alter the sequence downstream of the deletion site. We collected an extensive dataset of verified protein deletions, each described by a comprehensive set of context, content, position, and gene-based features. We evaluated both statistical and deep learning algorithms and selected a gradient boosting–based approach to develop the PON-Del predictor for short, 1–10 amino acid, sequence-retaining deletions. Variants are typically classified into two categories: either pathogenic or benign. However, there is always a third class of variants: variants of uncertain significance (VUSs), which have been ignored by all previous methods. PON-Del is the first deletion interpretation method that includes VUSs. It provides two outputs, binary and three-state prediction with VUSs. The performance of PON-Del was superior to that of previous methods. The tool is freely available at https://structure.bmc.lu.se/pon_del/.

Protein deletions are frequent among both disease-causing and tolerated variants, and are caused by several mechanisms at the DNA, RNA and protein levels. The reliable prediction of the effects of deletions is challenging. We developed a predictor for sequence retaining protein deletions, variants that do not alter the sequence beyond the deletion site. We collected an extensive dataset of verified protein deletions, and a comprehensive set of features to describe them. We evaluated seven algorithms and selected a gradient boosting–based approach to develop the PON-Del predictor for short, 1–10 amino acid, sequence-retaining deletions. Variants have typically been classified as pathogenic or benign. This practice misses the third category: variants of uncertain significance (VUSs). PON-Del is the first deletion interpretation method that includes VUSs. The performance of PON-Del was superior to that of previous methods. The tool is freely available at https://structure.bmc.lu.se/pon_del/.

## Full-text entities

- **Genes:** BTK (Bruton tyrosine kinase) [NCBI Gene 695] {aka AGMX1, AT, ATK, BPK, IGHD3, IMD1}, PON1 (paraoxonase 1) [NCBI Gene 5444] {aka ESA, MVCD5, PON}, SHROOM4 (shroom family member 4) [NCBI Gene 57477] {aka MRXSSDS, SHAP, shrm4}
- **Diseases:** VUS (MESH:D065309), XLA (MESH:C537409), DL (MESH:D007859)
- **Chemicals:** AAindex (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12959651/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12959651/full.md

## References

65 references — full list in the complete paper: https://tomesphere.com/paper/PMC12959651/full.md

---
Source: https://tomesphere.com/paper/PMC12959651