Predicting protein–carbohydrate binding sites: a deep learning approach integrating protein language model embeddings and structural features
Md Muhaiminul Islam Nafi, M Saifur Rahman

TL;DR
This paper introduces DeepCPBSite, a deep learning model that predicts where proteins bind to carbohydrates, using language model embeddings and structural features to improve accuracy.
Contribution
The novel contribution is the development of DeepCPBSite, an ensemble deep learning model that integrates protein language model embeddings and structural features for predicting protein–carbohydrate binding sites.
Findings
DeepCPBSite achieved 78.7% balanced accuracy and 59.6% sensitivity on the TS53 dataset.
It outperformed existing methods like DeepGlycanSite by 1.16% in balanced accuracy and 2.94% in sensitivity.
The model's F1, MCC, and AUPR scores showed improvements of up to 60.21% compared to state-of-the-art methods.
Abstract
Protein–carbohydrate interactions play an important role in many biological processes and functions, like inflammation, signal transduction, and cell adhesion. In our work, we will study non-covalent carbohydrate binding sites. In this paper, we aim to build a deep-learning model to predict non-covalent protein–carbohydrate binding sites. We were motivated by the fact that experimental approaches for predicting these sites are expensive. So, computational tools are necessary for identifying these interactions. We explored several sequence-based features as well as structural features. We also leveraged protein language model embeddings. We analyzed different architectures and selected the most suitable deep learning architecture for our finalized prediction model, DeepCPBSite. DeepCPBSite is an ensemble model that combines three separate models with three approaches (random…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14
Figure 15Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Bioinformatics · Bioinformatics and Genomic Networks · Protein Structure and Dynamics
