MolQuest: A Benchmark for Agentic Evaluation of Abductive Reasoning in Chemical Structure Elucidation

Taolin Han; Shuang Wu; Jinghang Wang; Yuhao Zhou; Renquan Lv; Bing Zhao; Wei Hu

arXiv:2603.25253·cs.CL·March 27, 2026

MolQuest: A Benchmark for Agentic Evaluation of Abductive Reasoning in Chemical Structure Elucidation

Taolin Han, Shuang Wu, Jinghang Wang, Yuhao Zhou, Renquan Lv, Bing Zhao, Wei Hu

PDF

Open Access

TL;DR

MolQuest introduces an agent-based benchmark for evaluating LLMs' abductive reasoning in chemical structure elucidation through multi-turn, experimental data-driven tasks, revealing significant performance gaps in current models.

Contribution

This work presents MolQuest, a novel interactive framework for assessing LLMs' scientific reasoning in chemistry, emphasizing multi-step experimental planning and hypothesis refinement.

Findings

01

State-of-the-art models achieve ~50% accuracy

02

Most models perform below 30% accuracy

03

Current models show limited strategic scientific reasoning

Abstract

Large language models (LLMs) hold considerable potential for advancing scientific discovery, yet systematic assessment of their dynamic reasoning in real-world research remains limited. Current scientific evaluation benchmarks predominantly rely on static, single-turn Question Answering (QA) formats, which are inadequate for measuring model performance in complex scientific tasks that require multi-step iteration and experimental interaction. To address this gap, we introduce MolQuest, a novel agent-based evaluation framework for molecular structure elucidation built upon authentic chemical experimental data. Unlike existing datasets, MolQuest formalizes molecular structure elucidation as a multi-turn interactive task, requiring models to proactively plan experimental steps, integrate heterogeneous spectral sources (e.g., NMR, MS), and iteratively refine structural hypotheses. This…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Materials Science · Computational Drug Discovery Methods · Topic Modeling