ChemDFM-R: A Chemical Reasoning LLM Enhanced with Atomized Chemical Knowledge
Zihan Zhao, Ziping Wan, Lu Chen, Xuanze Lin, Shiyang Yu, Situo Zhang, Da Ma, Zichen Zhu, Danyang Zhang, Huayang Wang, Zhongyang Dai, Liyang Wen, Bo Chen, Xin Chen, Kai Yu

TL;DR
ChemDFM-R is a novel chemical reasoning LLM enhanced with atomized chemical knowledge, improving interpretability, reasoning, and performance on chemical tasks compared to existing models.
Contribution
The paper introduces ChemDFM-R, a chemical reasoning LLM trained with a new atomized chemical knowledge dataset and a four-stage training pipeline, advancing chemical AI capabilities.
Findings
Achieves state-of-the-art performance on chemical benchmarks.
Provides interpretable, rationale-driven outputs.
Outperforms both general and domain-specific LLMs.
Abstract
Atomized chemical knowledge, such as functional group information of molecules and reactions, plays a pivotal intermediate role in the reasoning process that connects molecular structures with their properties and reactivities. While large language models (LLMs) have achieved impressive progress, the absence of atomized chemical knowledge results in their superficial understanding of chemistry and limited chemical reasoning capabilities. In this work, to tackle this problem, we develop a Chemical Reasoning LLM, ChemDFM-R. We first construct a comprehensive dataset of atomized chemical knowledge, ChemFG, annotating the presence of functional groups in molecules and the changes of functional groups during chemical reactions, to enhance the model's understanding of the fundamental principles and internal logic of chemistry. Then, we propose a mixed-source distillation method that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
