Predicting O-GlcNAcylation Sites in Mammalian Proteins with Transformers and RNNs Trained with a New Loss Function
Pedro Seber

TL;DR
This paper introduces an improved RNN model with a novel loss function for predicting O-GlcNAcylation sites in proteins, achieving state-of-the-art performance and addressing previous models' limitations in generalization and usability.
Contribution
The study develops a new weighted focal differentiable MCC loss function and demonstrates its effectiveness in enhancing RNN performance for O-GlcNAcylation site prediction.
Findings
RNN with new loss outperforms previous models
Achieves F1 score of 38.88% and MCC of 38.20%
Model generalizes well on independent test set
Abstract
O-GlcNAcylation, a subtype of glycosylation, has the potential to be an important target for therapeutics, but methods to reliably predict O-GlcNAcylation sites had not been available until 2023; a 2021 review correctly noted that published models were insufficient and failed to generalize. Moreover, many are no longer usable. In 2023, a considerably better recurrent neural network (RNN) model was published. This article creates improved models by using a new loss function, which we call the weighted focal differentiable MCC. RNN models trained with this new loss display superior performance to models trained using the weighted cross-entropy loss; this new function can also be used to fine-tune trained models. An RNN trained with this loss achieves state-of-the-art performance in O-GlcNAcylation site prediction with an F score of 38.88% and an MCC of 38.20% on an independent test…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGlycosylation and Glycoproteins Research · Galectins and Cancer Biology · Machine Learning in Bioinformatics
