MolProphecy: Bridging Medicinal Chemists' Knowledge and Molecular Pre-Trained Models via a Multi-Modal Framework

Jianping Zhao; Qiong Zhou; Tian Wang; Yusi Fan; Qian Yang; Li Jiao; Chang Liu; Zhehao Guo; Qi Lu; Fengfeng Zhou; Ruochi Zhang

arXiv:2507.02932·cs.LG·July 8, 2025

MolProphecy: Bridging Medicinal Chemists' Knowledge and Molecular Pre-Trained Models via a Multi-Modal Framework

Jianping Zhao, Qiong Zhou, Tian Wang, Yusi Fan, Qian Yang, Li Jiao, Chang Liu, Zhehao Guo, Qi Lu, Fengfeng Zhou, Ruochi Zhang

PDF

TL;DR

MolProphecy introduces a multi-modal framework that combines expert chemist knowledge, simulated via ChatGPT, with molecular structure data to enhance property prediction accuracy and interpretability in drug discovery.

Contribution

It presents a novel human-in-the-loop multi-modal approach that integrates chemist reasoning with molecular features using a large language model and graph-based data.

Findings

01

Achieves 15% RMSE reduction on FreeSolv dataset.

02

Improves AUROC by 5.39% on BACE dataset.

03

Enhances interpretability by combining expert knowledge with structural features.

Abstract

MolProphecy is a human-in-the-loop (HITL) multi-modal framework designed to integrate chemists' domain knowledge into molecular property prediction models. While molecular pre-trained models have enabled significant gains in predictive accuracy, they often fail to capture the tacit, interpretive reasoning central to expert-driven molecular design. To address this, MolProphecy employs ChatGPT as a virtual chemist to simulate expert-level reasoning and decision-making. The generated chemist knowledge is embedded by the large language model (LLM) as a dedicated knowledge representation and then fused with graph-based molecular features through a gated cross-attention mechanism, enabling joint reasoning over human-derived and structural features. Evaluated on four benchmark datasets (FreeSolv, BACE, SIDER, and ClinTox), MolProphecy outperforms state-of-the-art (SOTA) models, achieving a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.