Multimodal Fusion with Relational Learning for Molecular Property Prediction
Zhengyang Zhou, Yunrui Li, Pengyu Hong, Hao Xu

TL;DR
This paper introduces MMFRL, a novel multimodal fusion framework with relational learning for molecular property prediction, improving accuracy, interpretability, and enabling task-specific optimization in drug discovery.
Contribution
The paper presents a new multimodal fusion approach with relational learning, systematically investigates fusion stages, and demonstrates superior performance on MoleculeNet benchmarks.
Findings
MMFRL outperforms existing methods on MoleculeNet benchmarks.
Relational learning enhances embedding initialization for molecular representations.
Different fusion stages have distinct advantages and limitations.
Abstract
Graph based molecular representation learning is essential for accurately predicting molecular properties in drug discovery and materials science; however, it faces significant challenges due to the intricate relationships among molecules and the limited chemical knowledge utilized during training. While contrastive learning is often employed to handle molecular relationships, its reliance on binary metrics is insufficient for capturing the complexity of these interactions. Multimodal fusion has gained attention for property reasoning, but previous work has explored only a limited range of modalities, and the optimal stages for fusing different modalities in molecular property tasks remain underexplored. In this paper, we introduce MMFRL (Multimodal Fusion with Relational Learning for Molecular Property Prediction), a novel framework designed to overcome these limitations. Our method…
Peer Reviews
Decision·ICLR 2025 Conference Withdrawn Submission
- The method effectively employs a continuous loss function for multimodal relational learning in molecular property prediction, delivering promising results. - The method introduces three distinct multimodal fusion approaches – early, intermediate, and late stage – offering alternative strategies for integrating different modalities and illustrating the strengths and weakness of each.
Major comments - While the idea of modifying the loss function for efficient relational learning sounds novel, the explanation on the specific modification of the loss function does not effectively support the novelty of the proposed method, as the objective function is a rather standard one widely used in the community. - Although the paper presents three approaches to multimodal fusion - early, intermediate, and late-stage fusion - it is not clear which of the three stage fusion strategies ca
1. This paper addresses the important topic of pre-trained molecular representation learning. The authors comprehensively consider multiple modalities such as Fingerprint, SMILES, NMR, and Image, and systematically explore the effects of modality fusion at different stages (early, intermediate, late) to understand their benefits and limitations. 2. The authors evaluate the model’s performance across various molecular property prediction datasets.
1.The paper states that while contrastive learning is often used to handle molecular relationships, its reliance on binary metrics is insufficient for capturing the complexity of these interactions. However, this research question is not novel in the context of molecular pretraining. Prior work, such as MoleBERT [1], has already discussed this issue in the context of molecular graphs and proposed corresponding solutions. Unfortunately, the authors did not reference this previous work.
1. The writing of the paper is clear and comprehensible. 2. The visualization design in Figure 2 is meaningful and clear, effectively illustrating the advantages of multimodal fusion.
1. Regarding the key contribution emphasized in the paper—the measurement of inter-molecular relationships—it is noted that the smooth contrastive learning metric has been previously proposed in various studies. The paper appears to merely transfer this metric to the molecular domain for application. 2. Concerning the work on multimodal fusion, the focus of the paper is on exploring the optimal fusion stage. However, it seems that most existing fusion works also perform fusion at this identified
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Drug Discovery Methods · Machine Learning in Materials Science
