# Ubigo-X: Protein ubiquitination site prediction using ensemble learning with image-based feature representation and weighted voting

**Authors:** Disline Manli Tantoh, Jen-Chieh Yu, Ching-Hsuan Chien, Wei-Yi Yeh, Yen-Wei Chu

PMC · DOI: 10.1016/j.csbj.2025.07.025 · Computational and Structural Biotechnology Journal · 2025-07-14

## TL;DR

Ubigo-X is a new tool that predicts ubiquitination sites in proteins using advanced machine learning techniques and outperforms existing tools in accuracy.

## Contribution

The novel integration of image-based feature representation and weighted voting in ubiquitination site prediction.

## Key findings

- Ubigo-X achieved an AUC of 0.85, ACC of 0.79, and MCC of 0.58 on balanced test data.
- It outperformed existing tools in MCC for both balanced and imbalanced data.
- The tool is species-neutral and accessible at http://merlin.nchu.edu.tw/ubigox/.

## Abstract

Accurate ubiquitination identification is crucial in biological function analysis. We developed Ubigo-X, a novel protein ubiquitination prediction tool. Our training data, sourced from the Protein Lysine Modification Database (PLMD 3.0), comprised 53,338 ubiquitination and 71,399 non-ubiquitination sites, retained after CD-HIT and CD-HIT-2d sequence filtering. Three sub-models: Single-Type sequence-based features (Single-Type SBF), k-mer sequence-based features (Co-Type SBF), and structure-based and function-based features (S-FBF), were developed. Single-Type SBF used amino acid composition (AAC), amino acid index (AAindex), and one-hot encoding; Co-Type SBF used Single-Type SBF via k-mer encoding; and S-FBF used secondary structure, relative solvent accessibility (RSA)/absolute solvent-accessible area (ASA), and signal peptide cleavage sites. S-FBF was trained using XGBoost, while Single-Type SBF and Co-Type SBF were transformed into image-based features and trained using Resnet34. Ubigo-X was developed by combining the three models via a weighted voting strategy. Independent testing using PhosphoSitePlus data (65,421 ubiquitination and 61,222 non-ubiquitination sites) retained after filtering yielded 0.85, 0.79, and 0.58 for area under the curve (AUC), accuracy (ACC), and Matthews correlation coefficient (MCC), respectively. Further testing on imbalanced PhosphoSitePlus data (1:8 positive-to-negative sample ratio) yielded 0.94 AUC, 0.85 ACC, and 0.55 MCC. Using the GPS-Uber data, the AUC, ACC, and MCC were 0.81, 0.59, and 0.27, respectively. In conclusion, Ubigo-X outperformed existing tools in MCC (for both balanced and unbalanced data) and AUC and ACC (for balanced data), highlighting the efficacy of integrating image-based feature representation and weighted voting in ubiquitination prediction. Ubigo-X is a potential species-neutral ubiquitination site prediction tool, accessible at http://merlin.nchu.edu.tw/ubigox/.

## Full-text entities

- **Chemicals:** amino acid (MESH:D000596)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12303043/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12303043/full.md

## References

61 references — full list in the complete paper: https://tomesphere.com/paper/PMC12303043/full.md

---
Source: https://tomesphere.com/paper/PMC12303043