TL;DR
MolCLR introduces a self-supervised contrastive learning framework using graph neural networks to improve molecular property prediction, leveraging large unlabeled datasets to enhance generalization and achieve state-of-the-art results.
Contribution
This work presents a novel contrastive learning approach for GNNs on molecular graphs, utilizing new augmentation techniques and large unlabeled data for improved molecular representations.
Findings
Significantly improves GNN performance on molecular benchmarks.
Achieves state-of-the-art results after fine-tuning.
Learns chemically meaningful molecular embeddings.
Abstract
Molecular Machine Learning (ML) bears promise for efficient molecule property prediction and drug discovery. However, labeled molecule data can be expensive and time-consuming to acquire. Due to the limited labeled data, it is a great challenge for supervised-learning ML models to generalize to the giant chemical space. In this work, we present MolCLR: Molecular Contrastive Learning of Representations via Graph Neural Networks (GNNs), a self-supervised learning framework that leverages large unlabeled data (~10M unique molecules). In MolCLR pre-training, we build molecule graphs and develop GNN encoders to learn differentiable representations. Three molecule graph augmentations are proposed: atom masking, bond deletion, and subgraph removal. A contrastive estimator maximizes the agreement of augmentations from the same molecule while minimizing the agreement of different molecules.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsContrastive Learning
