Structure-Aware Fusion with Progressive Injection for Multimodal Molecular Representation Learning
Zihao Jing, Yan Sun, Yan Yi Li, Sugitha Janarthanan, Alana Deng, Pingzhao Hu

TL;DR
MuMo introduces a structured multimodal fusion framework for molecular representation that combines 2D and 3D information with progressive injection, significantly improving robustness and performance across diverse molecular tasks.
Contribution
The paper presents MuMo, a novel fusion framework with a structured pipeline and progressive injection mechanism, enhancing stability and effectiveness in multimodal molecular modeling.
Findings
Achieves 2.7% average improvement over baselines on 29 tasks
Ranks first on 22 benchmark tasks, including a 27% boost on LD50
Demonstrates robustness to 3D conformer noise
Abstract
Multimodal molecular models often suffer from 3D conformer unreliability and modality collapse, limiting their robustness and generalization. We propose MuMo, a structured multimodal fusion framework that addresses these challenges in molecular representation through two key strategies. To reduce the instability of conformer-dependent fusion, we design a Structured Fusion Pipeline (SFP) that combines 2D topology and 3D geometry into a unified and stable structural prior. To mitigate modality collapse caused by naive fusion, we introduce a Progressive Injection (PI) mechanism that asymmetrically integrates this prior into the sequence stream, preserving modality-specific modeling while enabling cross-modal enrichment. Built on a state space backbone, MuMo supports long-range dependency modeling and robust information propagation. Across 29 benchmark tasks from Therapeutics Data Commons…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
