MolCAP: Molecular Chemical reActivity pretraining and   prompted-finetuning enhanced molecular representation learning

Yu Wang; JingJie Zhang; Junru Jin; and Leyi Wei

arXiv:2306.09187·q-bio.BM·June 16, 2023·1 cites

MolCAP: Molecular Chemical reActivity pretraining and prompted-finetuning enhanced molecular representation learning

Yu Wang, JingJie Zhang, Junru Jin, and Leyi Wei

PDF

Open Access

TL;DR

MolCAP introduces a chemical reactivity-based pretraining and prompted fine-tuning approach that significantly enhances molecular representation learning, outperforming traditional methods across diverse biomedical datasets and revealing the potential of reactivity information in drug discovery.

Contribution

MolCAP is the first to incorporate chemical reactivity knowledge into molecular pretraining, improving performance and interpretability in molecular property prediction tasks.

Findings

01

Outperforms traditional pretraining methods on 13 datasets

02

Enables basic GNNs to achieve state-of-the-art results

03

Reactivity information reveals dataset biases and enhances insights

Abstract

Molecular representation learning (MRL) is a fundamental task for drug discovery. However, previous deep-learning (DL) methods focus excessively on learning robust inner-molecular representations by mask-dominated pretraining framework, neglecting abundant chemical reactivity molecular relationships that have been demonstrated as the determining factor for various molecular property prediction tasks. Here, we present MolCAP to promote MRL, a graph pretraining Transformer based on chemical reactivity (IMR) knowledge with prompted finetuning. Results show that MolCAP outperforms comparative methods based on traditional molecular pretraining framework, in 13 publicly available molecular datasets across a diversity of biomedical tasks. Prompted by MolCAP, even basic graph neural networks are capable of achieving surprising performance that outperforms previous models, indicating the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputational Drug Discovery Methods · Machine Learning in Materials Science · Protein Structure and Dynamics

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Label Smoothing · Absolute Position Encodings · Layer Normalization · Residual Connection · Softmax · Byte Pair Encoding