GTR-CoT: Graph Traversal as Visual Chain of Thought for Molecular Structure Recognition
Jingchao Wang, Yifan He, Haote Yang, Jiang Wu, Lingli Ge, Xingjian Wei, Yinfan Wang, Linye Li, Huijie Ao, Chengjin Liu, Bin Wang, Lijun Wu, Conghui He

TL;DR
GTR-CoT introduces a graph traversal visual chain of thought mechanism and a data-centric principle to improve molecular structure recognition from images, especially for complex and hand-drawn molecules, by combining graph parsing with reinforcement learning.
Contribution
The paper proposes GTR-VL, a novel vision-language model with a graph traversal reasoning mechanism and reinforcement learning for weak supervision, advancing molecular structure recognition accuracy.
Findings
Outperforms existing models on printed and hand-drawn datasets.
Develops GTR-1.3M, a large-scale instruction-tuning dataset.
Introduces MolRec-Bench, a benchmark for graph-parsing accuracy in OCSR.
Abstract
Optical Chemical Structure Recognition (OCSR) is essential for converting molecular images into machine-readable formats. While recent vision-language models (VLMs) have shown promise, their image-captioning approach often struggles with complex molecular structures and inconsistent annotations. To address these issues, we introduce GTR-VL, featuring two key innovations: (1) the \textit{Graph Traversal as Visual Chain of Thought} mechanism that emulates human reasoning by incrementally parsing molecular graphs through sequential atom-bond predictions, and (2) the data-centric \textit{Faithfully Recognize What You've Seen} principle, which aligns abbreviated structures in images with their expanded annotations. For hand-drawn OCSR tasks, where datasets lack graph annotations and only provide final SMILES, we apply reinforcement learning using the GRPO method, introducing reward…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Machine Learning in Materials Science · Advanced Graph Neural Networks
