MolProphecy: Bridging Medicinal Chemists' Knowledge and Molecular Pre-Trained Models via a Multi-Modal Framework
Jianping Zhao, Qiong Zhou, Tian Wang, Yusi Fan, Qian Yang, Li Jiao, Chang Liu, Zhehao Guo, Qi Lu, Fengfeng Zhou, Ruochi Zhang

TL;DR
MolProphecy introduces a multi-modal framework that combines expert chemist knowledge, simulated via ChatGPT, with molecular structure data to enhance property prediction accuracy and interpretability in drug discovery.
Contribution
It presents a novel human-in-the-loop multi-modal approach that integrates chemist reasoning with molecular features using a large language model and graph-based data.
Findings
Achieves 15% RMSE reduction on FreeSolv dataset.
Improves AUROC by 5.39% on BACE dataset.
Enhances interpretability by combining expert knowledge with structural features.
Abstract
MolProphecy is a human-in-the-loop (HITL) multi-modal framework designed to integrate chemists' domain knowledge into molecular property prediction models. While molecular pre-trained models have enabled significant gains in predictive accuracy, they often fail to capture the tacit, interpretive reasoning central to expert-driven molecular design. To address this, MolProphecy employs ChatGPT as a virtual chemist to simulate expert-level reasoning and decision-making. The generated chemist knowledge is embedded by the large language model (LLM) as a dedicated knowledge representation and then fused with graph-based molecular features through a gated cross-attention mechanism, enabling joint reasoning over human-derived and structural features. Evaluated on four benchmark datasets (FreeSolv, BACE, SIDER, and ClinTox), MolProphecy outperforms state-of-the-art (SOTA) models, achieving a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
