Extracting Molecular Properties from Natural Language with Multimodal Contrastive Learning
Romain Lacombe, Andrew Gaut, Jeff He, David L\"udeke, Kateryna, Pistunova

TL;DR
This paper explores how to transfer molecular property knowledge from natural language descriptions to graph representations using contrastive learning, improving property prediction accuracy in computational biochemistry.
Contribution
It introduces a novel contrastive learning approach aligning text and graph representations, along with new graph augmentation strategies, to enhance molecular property prediction.
Findings
Achieved +4.26% AUROC improvement over graph-only models
Gained +1.54% AUROC over existing contrastive models
Enhanced text retrieval with neural relevance scoring
Abstract
Deep learning in computational biochemistry has traditionally focused on molecular graphs neural representations; however, recent advances in language models highlight how much scientific knowledge is encoded in text. To bridge these two modalities, we investigate how molecular property information can be transferred from natural language to graph representations. We study property prediction performance gains after using contrastive learning to align neural graph representations with representations of textual descriptions of their characteristics. We implement neural relevance scoring strategies to improve text retrieval, introduce a novel chemically-valid molecular graph augmentation strategy inspired by organic reactions, and demonstrate improved performance on downstream MoleculeNet property classification tasks. We achieve a +4.26% AUROC gain versus models pre-trained on the graph…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Drug Discovery Methods · Machine Learning in Materials Science · Machine Learning in Bioinformatics
MethodsContrastive Learning · ALIGN
