Self-Supervised Graph Transformer on Large-Scale Molecular Data

Yu Rong; Yatao Bian; Tingyang Xu; Weiyang Xie; Ying Wei; Wenbing; Huang; Junzhou Huang

arXiv:2007.02835·q-bio.BM·October 30, 2020·415 cites

Self-Supervised Graph Transformer on Large-Scale Molecular Data

Yu Rong, Yatao Bian, Tingyang Xu, Weiyang Xie, Ying Wei, Wenbing, Huang, Junzhou Huang

PDF

Open Access 3 Repos 1 Video

TL;DR

This paper introduces GROVER, a self-supervised graph transformer model trained on large-scale unlabelled molecular data, significantly improving molecular property prediction accuracy in drug discovery.

Contribution

The paper presents GROVER, a novel self-supervised graph transformer architecture trained on 10 million molecules, achieving state-of-the-art results in molecular property prediction.

Findings

01

GROVER improves prediction accuracy by over 6% on average.

02

Self-supervised pre-training enhances molecular representation quality.

03

Large-scale training with 100 million parameters boosts performance.

Abstract

How to obtain informative representations of molecules is a crucial prerequisite in AI-driven drug design and discovery. Recent researches abstract molecules as graphs and employ Graph Neural Networks (GNNs) for molecular representation learning. Nevertheless, two issues impede the usage of GNNs in real scenarios: (1) insufficient labeled molecules for supervised training; (2) poor generalization capability to new-synthesized molecules. To address them both, we propose a novel framework, GROVER, which stands for Graph Representation frOm self-superVised mEssage passing tRansformer. With carefully designed self-supervised tasks in node-, edge- and graph-level, GROVER can learn rich structural and semantic information of molecules from enormous unlabelled molecular data. Rather, to encode such complex information, GROVER integrates Message Passing Networks into the Transformer-style…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Self-Supervised Graph Transformer on Large-Scale Molecular Data· slideslive

Taxonomy

TopicsComputational Drug Discovery Methods · Machine Learning in Materials Science · Protein Structure and Dynamics