MolBind: Multimodal Alignment of Language, Molecules, and Proteins

Teng Xiao; Chao Cui; Huaisheng Zhu; and Vasant G. Honavar

arXiv:2403.08167·cs.LG·February 5, 2025·1 cites

MolBind: Multimodal Alignment of Language, Molecules, and Proteins

Teng Xiao, Chao Cui, Huaisheng Zhu, and Vasant G. Honavar

PDF

Open Access

TL;DR

MolBind is a novel multi-modal framework that aligns language, molecules, and proteins in a shared space, improving zero-shot learning in drug discovery by integrating four modalities with a new dataset.

Contribution

The paper introduces MolBind, a multi-modal contrastive learning framework that unifies diverse biological data modalities and provides a high-quality dataset for pre-training.

Findings

01

Superior zero-shot performance across multiple tasks

02

Effective semantic alignment of diverse modalities

03

Introduction of a new multi-modal dataset MolBind-M4

Abstract

Recent advancements in biology and chemistry have leveraged multi-modal learning, integrating molecules and their natural language descriptions to enhance drug discovery. However, current pre-training frameworks are limited to two modalities, and designing a unified network to process different modalities (e.g., natural language, 2D molecular graphs, 3D molecular conformations, and 3D proteins) remains challenging due to inherent gaps among them. In this work, we propose MolBind, a framework that trains encoders for multiple modalities through contrastive learning, mapping all modalities to a shared feature space for multi-modal semantic alignment. To facilitate effective pre-training of MolBind on multiple modalities, we also build and collect a high-quality dataset with four modalities, MolBind-M4, including graph-language, conformation-language, graph-conformation, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBiomedical Text Mining and Ontologies