ChemVA: Advancing Large Language Models on Chemical Reaction Diagrams Understanding

Mingyang Rao; Kehua Feng; Zhihui Zhu; Jiangzhen Fu; Hao Yu; Keyan Ding; and Huajun Chen

arXiv:2605.17214·cs.AI·May 19, 2026

ChemVA: Advancing Large Language Models on Chemical Reaction Diagrams Understanding

Mingyang Rao, Kehua Feng, Zhihui Zhu, Jiangzhen Fu, Hao Yu, Keyan Ding, and Huajun Chen

PDF

TL;DR

ChemVA introduces a framework that enhances large language models' ability to interpret chemical reaction diagrams by bridging visual and semantic gaps, significantly improving recognition and reasoning accuracy.

Contribution

The paper presents ChemVA, a novel framework combining visual anchoring and semantic alignment to improve LLM understanding of chemical diagrams, addressing key bottlenecks.

Findings

01

Achieves 92.0% structural recognition accuracy on OCRD-Bench.

02

Delivers approximately 20 percentage points performance gain across 9 LLMs.

03

Enables open-weight models to match proprietary SOTA in chemical reasoning.

Abstract

While Large Language Models (LLMs) have revolutionized scientific text processing, they exhibit a significant capability gap when interpreting chemical reaction diagrams. We identify two fundamental bottlenecks restricting current systems: a Visual Deficit, where generic vision encoders struggle to resolve the strict topological connectivity of dense molecular graphs, and a Semantic Disconnect, where standard linear strings, such as SMILES, fail to effectively activate the model's latent chemical reasoning. To bridge these gaps, we propose the Chemical Visual Activation (ChemVA) framework, which employs a Visual Anchor mechanism to ground functional groups via hybrid-granularity detection, followed by a semantic alignment approach that translates visual features into entity names to maximize knowledge activation in LLMs. We evaluate our approach on OCRD-Bench, a newly constructed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.