Enhancing Chemical Reaction and Retrosynthesis Prediction with Large Language Model and Dual-task Learning
Xuan Lin, Qingrui Liu, Hongxin Xiang, Daojian Zeng and, Xiangxiang Zeng

TL;DR
ChemDual leverages large-scale instruction data and dual-task learning with an enhanced LLaMA model to significantly improve chemical reaction and retrosynthesis prediction accuracy, demonstrating strong potential for drug discovery applications.
Contribution
The paper introduces ChemDual, a novel LLM framework that jointly optimizes reaction and retrosynthesis prediction using dual-task learning and a large instruction dataset.
Findings
Achieves state-of-the-art results on Mol-Instruction and USPTO-50K datasets.
Outperforms existing single-task and open-source LLM approaches.
Generates compounds with diverse, strong protein binding affinity.
Abstract
Chemical reaction and retrosynthesis prediction are fundamental tasks in drug discovery. Recently, large language models (LLMs) have shown potential in many domains. However, directly applying LLMs to these tasks faces two major challenges: (i) lacking a large-scale chemical synthesis-related instruction dataset; (ii) ignoring the close correlation between reaction and retrosynthesis prediction for the existing fine-tuning strategies. To address these challenges, we propose ChemDual, a novel LLM framework for accurate chemical synthesis. Specifically, considering the high cost of data acquisition for reaction and retrosynthesis, ChemDual regards the reaction-and-retrosynthesis of molecules as a related recombination-and-fragmentation process and constructs a large-scale of 4.4 million instruction dataset. Furthermore, ChemDual introduces an enhanced LLaMA, equipped with a multi-scale…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWater Quality Monitoring and Analysis · Machine Learning in Materials Science · Text and Document Classification Technologies
MethodsFragmentation · LLaMA
