ChemDFM-R: A Chemical Reasoning LLM Enhanced with Atomized Chemical Knowledge

Zihan Zhao; Ziping Wan; Lu Chen; Xuanze Lin; Shiyang Yu; Situo Zhang; Da Ma; Zichen Zhu; Danyang Zhang; Huayang Wang; Zhongyang Dai; Liyang Wen; Bo Chen; Xin Chen; Kai Yu

arXiv:2507.21990·cs.CE·April 15, 2026

ChemDFM-R: A Chemical Reasoning LLM Enhanced with Atomized Chemical Knowledge

Zihan Zhao, Ziping Wan, Lu Chen, Xuanze Lin, Shiyang Yu, Situo Zhang, Da Ma, Zichen Zhu, Danyang Zhang, Huayang Wang, Zhongyang Dai, Liyang Wen, Bo Chen, Xin Chen, Kai Yu

PDF

TL;DR

ChemDFM-R is a novel chemical reasoning LLM enhanced with atomized chemical knowledge, improving interpretability, reasoning, and performance on chemical tasks compared to existing models.

Contribution

The paper introduces ChemDFM-R, a chemical reasoning LLM trained with a new atomized chemical knowledge dataset and a four-stage training pipeline, advancing chemical AI capabilities.

Findings

01

Achieves state-of-the-art performance on chemical benchmarks.

02

Provides interpretable, rationale-driven outputs.

03

Outperforms both general and domain-specific LLMs.

Abstract

Atomized chemical knowledge, such as functional group information of molecules and reactions, plays a pivotal intermediate role in the reasoning process that connects molecular structures with their properties and reactivities. While large language models (LLMs) have achieved impressive progress, the absence of atomized chemical knowledge results in their superficial understanding of chemistry and limited chemical reasoning capabilities. In this work, to tackle this problem, we develop a Chemical Reasoning LLM, ChemDFM-R. We first construct a comprehensive dataset of atomized chemical knowledge, ChemFG, annotating the presence of functional groups in molecules and the changes of functional groups during chemical reactions, to enhance the model's understanding of the fundamental principles and internal logic of chemistry. Then, we propose a mixed-source distillation method that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.