MolCAP: Molecular Chemical reActivity pretraining and prompted-finetuning enhanced molecular representation learning
Yu Wang, JingJie Zhang, Junru Jin, and Leyi Wei

TL;DR
MolCAP introduces a chemical reactivity-based pretraining and prompted fine-tuning approach that significantly enhances molecular representation learning, outperforming traditional methods across diverse biomedical datasets and revealing the potential of reactivity information in drug discovery.
Contribution
MolCAP is the first to incorporate chemical reactivity knowledge into molecular pretraining, improving performance and interpretability in molecular property prediction tasks.
Findings
Outperforms traditional pretraining methods on 13 datasets
Enables basic GNNs to achieve state-of-the-art results
Reactivity information reveals dataset biases and enhances insights
Abstract
Molecular representation learning (MRL) is a fundamental task for drug discovery. However, previous deep-learning (DL) methods focus excessively on learning robust inner-molecular representations by mask-dominated pretraining framework, neglecting abundant chemical reactivity molecular relationships that have been demonstrated as the determining factor for various molecular property prediction tasks. Here, we present MolCAP to promote MRL, a graph pretraining Transformer based on chemical reactivity (IMR) knowledge with prompted finetuning. Results show that MolCAP outperforms comparative methods based on traditional molecular pretraining framework, in 13 publicly available molecular datasets across a diversity of biomedical tasks. Prompted by MolCAP, even basic graph neural networks are capable of achieving surprising performance that outperforms previous models, indicating the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Drug Discovery Methods · Machine Learning in Materials Science · Protein Structure and Dynamics
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Label Smoothing · Absolute Position Encodings · Layer Normalization · Residual Connection · Softmax · Byte Pair Encoding
