Multimodal Molecular Pretraining via Modality Blending
Qiying Yu, Yudi Zhang, Yuyan Ni, Shikun Feng, Yanyan Lan, Hao Zhou,, Jingjing Liu

TL;DR
MoleBLEND introduces a novel atomic-relation level self-supervised learning approach that effectively aligns and integrates 2D and 3D molecular structures, enhancing molecular understanding for drug discovery.
Contribution
It proposes a fine-grained, relation-based modality blending method that unifies multiple learning objectives into a cohesive framework for molecular representation.
Findings
Achieves state-of-the-art results on molecular benchmarks.
Effectively aligns 2D and 3D structures at atomic level.
Provides theoretical insights via mutual-information maximization.
Abstract
Self-supervised learning has recently gained growing interest in molecular modeling for scientific tasks such as AI-assisted drug discovery. Current studies consider leveraging both 2D and 3D molecular structures for representation learning. However, relying on straightforward alignment strategies that treat each modality separately, these methods fail to exploit the intrinsic correlation between 2D and 3D representations that reflect the underlying structural characteristics of molecules, and only perform coarse-grained molecule-level alignment. To derive fine-grained alignment and promote structural molecule understanding, we introduce an atomic-relation level "blend-then-predict" self-supervised learning approach, MoleBLEND, which first blends atom relations represented by different modalities into one unified relation matrix for joint encoding, then recovers modality-specific…
Peer Reviews
Decision·ICLR 2024 poster
1. The idea of using a relation matrix to unify the 2D and 3D information for molecular representation learning is novel. 2. Theoretical analysis provide more insights to the proposed method. 3. The paper is well writen and structured and easy to follow.
1. The information gathered in the relation matrix is quite limited and much information in the original structure is lost, especially those in the 3D structure. The matrix construction is quite similar to the work of "One transformer can understand both 2d & 3d molecular data" published in ICLR 2023. 2. Ablation studies on blending two masks should be provided. 3. Some details of the experimental setup is missing.
- The key idea is clear and straightforward: to use the attention module to help augment the 2D-3D atom-relation for molecule pretraining. - The theoretical proof is interesting.
- The motivations are not clearly claimed or supported. - For instance, on Page 2, the authors say that they “observe that although appearing visually distinct … are intrinsically equivalent as they are essentially different manifestations of the same atoms and their relationships”. What does “equivalent” mean here? A lot of 2D-3D pretraining methods start by saying such two modalities are complementary to each other. - Additionally, on Page 2, what is the motivation to feed both modalit
1. The paper is well-written and easy to understand. 2. The authors conducted extensive experiments on both 2D and 3D molecule tasks and showed their good performance.
1. The authors did not discuss the training time cost of different pretraining methods. 2. A series of pertaining baselines are missed in related works and comparisons. For example: [1] Xu M, Wang H, Ni B, et al. Self-supervised graph-level representation learning with local and global structure. ICML 21. [2] Zhang Z, Liu Q, Wang H, et al. Motif-based graph self-supervised learning for molecular property prediction. NeurIPS 21. [3] Zaidi S, Schaarschmidt M, Martens J, et al. Pre-training vi
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Computational Drug Discovery Methods · Web Data Mining and Analysis
