Uni-Mol3: A Multi-Molecular Foundation Model for Advancing Organic Reaction Modeling
Lirong Wu, Junjie Wang, Zhifeng Gao, Xiaohong Ji, Rong Zhu, Xinyu Li, Linfeng Zhang, Guolin Ke, Weinan E

TL;DR
Uni-Mol3 is a hierarchical deep learning framework that models multi-molecular reactions by encoding 3D molecular structures into a language, pre-training on molecular and reaction data, and fine-tuning for diverse reaction tasks, outperforming existing methods.
Contribution
Introduces Uni-Mol3, a novel multi-molecular reaction modeling framework combining hierarchical encoding, multi-stage pre-training, and prompt-aware fine-tuning for improved reaction prediction.
Findings
Outperforms existing methods on 10 datasets across 4 tasks.
Effectively models complex organic reactions with high accuracy.
Demonstrates strong generalizability in multi-task prediction.
Abstract
Organic reaction, the foundation of modern chemical industry, is crucial for new material development and drug discovery. However, deciphering reaction mechanisms and modeling multi-molecular relationships remain formidable challenges due to the complexity of molecular dynamics. While several state-of-the-art models like Uni-Mol2 have revolutionized single-molecular representation learning, their extension to multi-molecular systems, where chemical reactions inherently occur, has been underexplored. This paper introduces Uni-Mol3, a novel deep learning framework that employs a hierarchical pipeline for multi-molecular reaction modeling. At its core, Uni-Mol3 adopts a multi-scale molecular tokenizer (Mol-Tokenizer) that encodes 3D structures of molecules and other features into discrete tokens, creating a 3D-aware molecular language. The framework innovatively combines two pre-training…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Computational Drug Discovery Methods · Asymmetric Hydrogenation and Catalysis
