ChemBART: A Pre-trained BART Model Assisting Organic Chemistry Analysis
Kenan Li, Yijian Zhang, Jin Wang, Haipeng Gan, Zeying Sun, Xiaoguang Lei, Hao Dong

TL;DR
ChemBART is a pre-trained language model based on BART that unifies multiple chemical analysis tasks, including synthesis planning and property prediction, demonstrating improved efficiency and accuracy validated through wet-lab experiments.
Contribution
Introduces ChemBART, a multi-task, reaction-focused pre-trained model that advances chemical synthesis planning and analysis beyond single-task approaches.
Findings
Achieves ~30% yield improvement over benchmarks.
Enables multi-step synthesis route design.
Validates predictions through wet-lab experiments.
Abstract
Recent advances in large language models (LLMs) have demonstrated transformative potential across diverse fields. While LLMs have been applied to molecular simplified molecular input line entry system (SMILES) in computer-aided synthesis planning (CASP), existing methodologies typically address single tasks, such as precursor prediction. We introduce ChemBART, a SMILES-based LLM pre-trained on chemical reactions, which enables a unified model for multiple downstream chemical tasks--achieving the paradigm of "one model, one pre-training, multiple tasks." By leveraging outputs from a mask-filling pre-training task on reaction expressions, ChemBART effectively solves a variety of chemical problems, including precursor/reagent generation, temperature-yield regression, molecular property classification, and optimizing the policy and value functions within a reinforcement learning framework,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Asymmetric Hydrogenation and Catalysis · Computational Drug Discovery Methods
