Improving Molecular Pretraining with Complementary Featurizations
Yanqiao Zhu, Dingshuo Chen, Yuanqi Du, Yingze Wang, Qiang Liu, Shu Wu

TL;DR
This paper introduces MOCO, a molecular pretraining framework that combines multiple complementary featurizations like SMILES, 2D graphs, and 3D geometries, leading to improved performance in molecular property prediction tasks.
Contribution
The paper presents a novel framework, MOCO, that effectively integrates diverse molecular featurizations to enhance pretraining and downstream task performance.
Findings
MOCO outperforms models using single featurizations.
Different featurizations encode distinct chemical information.
Combining featurizations improves molecular property prediction accuracy.
Abstract
Molecular pretraining, which learns molecular representations over massive unlabeled data, has become a prominent paradigm to solve a variety of tasks in computational chemistry and drug discovery. Recently, prosperous progress has been made in molecular pretraining with different molecular featurizations, including 1D SMILES strings, 2D graphs, and 3D geometries. However, the role of molecular featurizations with their corresponding neural architectures in molecular pretraining remains largely unexamined. In this paper, through two case studies -- chirality classification and aromatic ring counting -- we first demonstrate that different featurization techniques convey chemical information differently. In light of this observation, we propose a simple and effective MOlecular pretraining framework with COmplementary featurizations (MOCO). MOCO comprehensively leverages multiple…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Computational Drug Discovery Methods · Chemical Synthesis and Analysis
MethodsBatch Normalization · InfoNCE · Momentum Contrast
