GeomCLIP: Contrastive Geometry-Text Pre-training for Molecules
Teng Xiao, Chao Cui, Huaisheng Zhu, Vasant G. Honavar

TL;DR
GeomCLIP introduces a multi-modal pre-training framework that aligns 3D molecular structures with biomedical texts, significantly improving molecular property prediction, retrieval, and captioning tasks.
Contribution
It creates a large dataset of paired geometric structures and texts and proposes a novel pre-training method combining alignment and denoising tasks.
Findings
Enhanced performance in molecular property prediction
Effective zero-shot text-molecule retrieval
Improved 3D molecule captioning
Abstract
Pretraining molecular representations is crucial for drug and material discovery. Recent methods focus on learning representations from geometric structures, effectively capturing 3D position information. Yet, they overlook the rich information in biomedical texts, which detail molecules' properties and substructures. With this in mind, we set up a data collection effort for 200K pairs of ground-state geometric structures and biomedical texts, resulting in a PubChem3D dataset. Based on this dataset, we propose the GeomCLIP framework to enhance for multi-modal representation learning from molecular structures and biomedical text. During pre-training, we design two types of tasks, i.e., multimodal representation alignment and unimodal denoising pretraining, to align the 3D geometric encoder with textual information and, at the same time, preserve its original representation power.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMathematics, Computing, and Information Processing
MethodsSparse Evolutionary Training · ALIGN · Focus
