Molecule-Morphology Contrastive Pretraining for Transferable Molecular Representation
Cuong Q. Nguyen, Dante Pertusi, Kim M. Branson

TL;DR
This paper introduces MoCoP, a contrastive pretraining framework that leverages large-scale cellular morphology data to enhance molecular property prediction models, showing consistent improvements across multiple datasets and regimes.
Contribution
The authors develop MoCoP, a novel multi-modal contrastive pretraining method that integrates cellular morphology data with molecular graphs for better QSAR modeling.
Findings
MoCoP improves GNN performance on ChEMBL20 tasks.
Pretrained GNNs show 2.6% and 6.3% AUPRC improvements on GSK data.
Scaling MoCoP to large datasets enhances molecular property prediction.
Abstract
Image-based profiling techniques have become increasingly popular over the past decade for their applications in target identification, mechanism-of-action inference, and assay development. These techniques have generated large datasets of cellular morphologies, which are typically used to investigate the effects of small molecule perturbagens. In this work, we extend the impact of such dataset to improving quantitative structure-activity relationship (QSAR) models by introducing Molecule-Morphology Contrastive Pretraining (MoCoP), a framework for learning multi-modal representation of molecular graphs and cellular morphologies. We scale MoCoP to approximately 100K molecules and 600K morphological profiles using data from the JUMP-CP Consortium and show that MoCoP consistently improves performances of graph neural networks (GNNs) on molecular property prediction tasks in ChEMBL20 across…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Drug Discovery Methods · Machine Learning in Materials Science · Bioinformatics and Genomic Networks
