Predicting O-GlcNAcylation Sites in Mammalian Proteins with Transformers and RNNs Trained with a New Loss Function

Pedro Seber

arXiv:2402.17131·cs.LG·September 18, 2025·2 cites

Predicting O-GlcNAcylation Sites in Mammalian Proteins with Transformers and RNNs Trained with a New Loss Function

Pedro Seber

PDF

Open Access

TL;DR

This paper introduces an improved RNN model with a novel loss function for predicting O-GlcNAcylation sites in proteins, achieving state-of-the-art performance and addressing previous models' limitations in generalization and usability.

Contribution

The study develops a new weighted focal differentiable MCC loss function and demonstrates its effectiveness in enhancing RNN performance for O-GlcNAcylation site prediction.

Findings

01

RNN with new loss outperforms previous models

02

Achieves F1 score of 38.88% and MCC of 38.20%

03

Model generalizes well on independent test set

Abstract

O-GlcNAcylation, a subtype of glycosylation, has the potential to be an important target for therapeutics, but methods to reliably predict O-GlcNAcylation sites had not been available until 2023; a 2021 review correctly noted that published models were insufficient and failed to generalize. Moreover, many are no longer usable. In 2023, a considerably better recurrent neural network (RNN) model was published. This article creates improved models by using a new loss function, which we call the weighted focal differentiable MCC. RNN models trained with this new loss display superior performance to models trained using the weighted cross-entropy loss; this new function can also be used to fine-tune trained models. An RNN trained with this loss achieves state-of-the-art performance in O-GlcNAcylation site prediction with an F $_{1}$ score of 38.88% and an MCC of 38.20% on an independent test…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGlycosylation and Glycoproteins Research · Galectins and Cancer Biology · Machine Learning in Bioinformatics