GlycanML: A Multi-Task and Multi-Structure Benchmark for Glycan Machine Learning
Minghao Xu, Yunteng Geng, Yihang Zhang, Ling Yang, Jian Tang, Wentao, Zhang

TL;DR
GlycanML establishes a comprehensive benchmark for glycan property prediction using diverse tasks, representations, and multi-task learning, advancing machine learning applications in glycan research.
Contribution
This work introduces the first standardized benchmark for glycan property prediction, including diverse tasks, representations, and multi-task learning frameworks.
Findings
Multi-relational GNNs outperform other models.
Multi-task learning enhances prediction accuracy.
Sequence and graph representations are both effective.
Abstract
Glycans are basic biomolecules and perform essential functions within living organisms. The rapid increase of functional glycan data provides a good opportunity for machine learning solutions to glycan understanding. However, there still lacks a standard machine learning benchmark for glycan property and function prediction. In this work, we fill this blank by building a comprehensive benchmark for Glycan Machine Learning (GlycanML). The GlycanML benchmark consists of diverse types of tasks including glycan taxonomy prediction, glycan immunogenicity prediction, glycosylation type prediction, and protein-glycan interaction prediction. Glycans can be represented by both sequences and graphs in GlycanML, which enables us to extensively evaluate sequence-based models and graph neural networks (GNNs) on benchmark tasks. Furthermore, by concurrently performing eight glycan taxonomy prediction…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsGlycosylation and Glycoproteins Research · Machine Learning in Bioinformatics · Advanced Proteomics Techniques and Applications
