Self-Supervised Graph Transformer on Large-Scale Molecular Data
Yu Rong, Yatao Bian, Tingyang Xu, Weiyang Xie, Ying Wei, Wenbing, Huang, Junzhou Huang

TL;DR
This paper introduces GROVER, a self-supervised graph transformer model trained on large-scale unlabelled molecular data, significantly improving molecular property prediction accuracy in drug discovery.
Contribution
The paper presents GROVER, a novel self-supervised graph transformer architecture trained on 10 million molecules, achieving state-of-the-art results in molecular property prediction.
Findings
GROVER improves prediction accuracy by over 6% on average.
Self-supervised pre-training enhances molecular representation quality.
Large-scale training with 100 million parameters boosts performance.
Abstract
How to obtain informative representations of molecules is a crucial prerequisite in AI-driven drug design and discovery. Recent researches abstract molecules as graphs and employ Graph Neural Networks (GNNs) for molecular representation learning. Nevertheless, two issues impede the usage of GNNs in real scenarios: (1) insufficient labeled molecules for supervised training; (2) poor generalization capability to new-synthesized molecules. To address them both, we propose a novel framework, GROVER, which stands for Graph Representation frOm self-superVised mEssage passing tRansformer. With carefully designed self-supervised tasks in node-, edge- and graph-level, GROVER can learn rich structural and semantic information of molecules from enormous unlabelled molecular data. Rather, to encode such complex information, GROVER integrates Message Passing Networks into the Transformer-style…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsComputational Drug Discovery Methods · Machine Learning in Materials Science · Protein Structure and Dynamics
