Beyond Chemical QA: Evaluating LLM's Chemical Reasoning with Modular Chemical Operations
Hao Li, He Cao, Bin Feng, Yanjun Shao, Xiangru Tang, Zhiyuan Yan, Li Yuan, Yonghong Tian, Yu Li

TL;DR
This paper introduces ChemCoTBench, a framework for evaluating large language models' chemical reasoning through modular, step-by-step operations that mimic mathematical proofs, aiming to improve AI's role in chemical discovery.
Contribution
It proposes a novel reasoning framework that formalizes chemical problem-solving into transparent workflows using modular operations, bridging the gap between abstract reasoning and practical chemical tasks.
Findings
Models show improved reasoning on molecular optimization tasks.
ChemCoTBench provides structured datasets and evaluation metrics.
Baseline evaluations demonstrate the framework's effectiveness.
Abstract
While large language models (LLMs) with Chain-of-Thought (CoT) reasoning excel in mathematics and coding, their potential for systematic reasoning in chemistry, a domain demanding rigorous structural analysis for real-world tasks like drug design and reaction engineering, remains untapped. Current benchmarks focus on simple knowledge retrieval, neglecting step-by-step reasoning required for complex tasks such as molecular optimization and reaction prediction. To address this, we introduce ChemCoTBench, a reasoning framework that bridges molecular structure understanding with arithmetic-inspired operations, including addition, deletion, and substitution, to formalize chemical problem-solving into transparent, step-by-step workflows. By treating molecular transformations as modular "chemical operations", the framework enables slow-thinking reasoning, mirroring the logic of mathematical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsSoftware Engineering Research
MethodsFocus
