ReactBench: A Benchmark for Topological Reasoning in MLLMs on Chemical Reaction Diagrams
Qiang Xu, Shengyuan Bai, Yu Wang, He Cao, Leqing Chen, Yuanyuan Liu, Bin Feng, Zijing Liu, Yu Li

TL;DR
ReactBench is a new benchmark designed to evaluate and expose the limitations of Multimodal Large Language Models in understanding complex topological structures in chemical reaction diagrams, highlighting a significant reasoning gap.
Contribution
The paper introduces ReactBench, a comprehensive benchmark with 1,618 QA pairs to assess structural reasoning in MLLMs on chemical diagrams, revealing a major performance gap and guiding future improvements.
Findings
MLLMs perform significantly worse on structural reasoning tasks than on semantic tasks.
A performance gap exceeding 30% was observed between different task types.
Controlled ablations show the bottleneck is in reasoning, not perception.
Abstract
Multimodal Large Language Models (MLLMs) excel at recognizing individual visual elements and reasoning over simple linear diagrams. However, when faced with complex topological structures involving branching paths, converging flows, and cyclic dependencies, their reasoning capabilities degrade sharply, even on tasks as basic as counting endpoints. Existing benchmarks fail to probe this gap, focusing on semantic comprehension rather than structural reasoning. We introduce ReactBench, a benchmark that reveals fundamental limitations in structural reasoning through chemical reaction diagrams. These real-world scientific diagrams offer an ideal testbed because they naturally span diverse structures from linear chains to cyclic graphs, while requiring both precise local recognition and coherent global reasoning. Our benchmark comprises 1,618 expert-annotated QA pairs across four hierarchical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
