ChemVA: Advancing Large Language Models on Chemical Reaction Diagrams Understanding
Mingyang Rao, Kehua Feng, Zhihui Zhu, Jiangzhen Fu, Hao Yu, Keyan Ding, and Huajun Chen

TL;DR
ChemVA introduces a framework that enhances large language models' ability to interpret chemical reaction diagrams by bridging visual and semantic gaps, significantly improving recognition and reasoning accuracy.
Contribution
The paper presents ChemVA, a novel framework combining visual anchoring and semantic alignment to improve LLM understanding of chemical diagrams, addressing key bottlenecks.
Findings
Achieves 92.0% structural recognition accuracy on OCRD-Bench.
Delivers approximately 20 percentage points performance gain across 9 LLMs.
Enables open-weight models to match proprietary SOTA in chemical reasoning.
Abstract
While Large Language Models (LLMs) have revolutionized scientific text processing, they exhibit a significant capability gap when interpreting chemical reaction diagrams. We identify two fundamental bottlenecks restricting current systems: a Visual Deficit, where generic vision encoders struggle to resolve the strict topological connectivity of dense molecular graphs, and a Semantic Disconnect, where standard linear strings, such as SMILES, fail to effectively activate the model's latent chemical reasoning. To bridge these gaps, we propose the Chemical Visual Activation (ChemVA) framework, which employs a Visual Anchor mechanism to ground functional groups via hybrid-granularity detection, followed by a semantic alignment approach that translates visual features into entity names to maximize knowledge activation in LLMs. We evaluate our approach on OCRD-Bench, a newly constructed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
