# StackGlyEmbed: prediction of N-linked glycosylation sites using protein language models

**Authors:** Md Muhaiminul Islam Nafi, M Saifur Rahman

PMC · DOI: 10.1093/bioadv/vbaf146 · Bioinformatics Advances · 2025-06-28

## TL;DR

StackGlyEmbed is a new machine learning model that accurately predicts N-linked glycosylation sites in proteins using advanced language models.

## Contribution

The novel contribution is a stacking ensemble model using protein language model embeddings for improved glycosylation site prediction.

## Key findings

- StackGlyEmbed achieves 98.2% sensitivity in predicting N-linked glycosylation sites.
- The model outperforms existing state-of-the-art methods in multiple evaluation metrics.
- It uses a combination of SVM, XGB, and KNN learners in a stacking ensemble.

## Abstract

N-linked glycosylation is one of the most basic post-translational modifications (PTMs) where oligosaccharides covalently bond with Asparagine (N). These are found in the conserved regions like N-X-S or N-X-T where X can be any residue except Proline (P). Prediction of N-linked glycosylation sites has great importance as these PTMs play a vital role in many biological processes and functionalities. Experimental methods, such as mass spectrometry, for detecting N-linked glycosylation sites are very expensive. Therefore, the prediction of N-linked glycosylation sites has become an important research field.

In this work, we propose StackGlyEmbed, a stacking ensemble machine learning model, to computationally predict N-linked glycosylation sites. We have explored embeddings from several protein language models and built the stacking ensemble using Support Vector Machine (SVM), Extreme Gradient Boosting (XGB) and K-nearest Neighbor (KNN) learners in the base layer, with a second SVM model in the meta layer. StackGlyEmbed achieves 98.2% sensitivity, 92.5% balanced accuracy, 89.1% F1-score and 82.6% Matthew’s correlation coefficient in independent testing, outperforming the existing state-of-the-art methods.

StackGlyEmbed is freely available at: https://github.com/nafcoder/StackGlyEmbed.

## Full-text entities

- **Diseases:** N-linked glycosylation (MESH:C536108)
- **Chemicals:** N (MESH:D009584), Proline (MESH:D011392), oligosaccharides (MESH:D009844), Asparagine (MESH:D001216)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12237515/full.md

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12237515/full.md

## References

61 references — full list in the complete paper: https://tomesphere.com/paper/PMC12237515/full.md

---
Source: https://tomesphere.com/paper/PMC12237515