ChemVLR: Prioritizing Reasoning in Perception for Chemical Vision-Language Understanding

Xuanle Zhao; Xinyuan Cai; Xiang Cheng; Xiuyi Chen; Bo Xu

arXiv:2604.06685·cs.CL·April 9, 2026

ChemVLR: Prioritizing Reasoning in Perception for Chemical Vision-Language Understanding

Xuanle Zhao, Xinyuan Cai, Xiang Cheng, Xiuyi Chen, Bo Xu

PDF

1 Repo 2 Models

TL;DR

ChemVLR is a novel chemical vision-language model that emphasizes explicit reasoning by analyzing visual inputs with detailed chemical descriptors, leading to interpretable solutions and state-of-the-art performance.

Contribution

Introduces ChemVLR, a chemical VLM that prioritizes reasoning through fine-grained analysis and a new training framework, outperforming existing models.

Findings

01

ChemVLR surpasses leading proprietary and open-source models in chemical visual reasoning tasks.

02

The model produces explicit, interpretable reasoning paths for complex chemical problems.

03

A large-scale dataset of 760k samples was curated for training and evaluation.

Abstract

While Vision-Language Models (VLMs) have demonstrated significant potential in chemical visual understanding, current models are predominantly optimized for direct visual question-answering tasks. This paradigm often results in "black-box" systems that fail to utilize the inherent capability of Large Language Models (LLMs) to infer underlying reaction mechanisms. In this work, we introduce ChemVLR, a chemical VLM designed to prioritize reasoning within the perception process. Unlike conventional chemical VLMs, ChemVLR analyzes visual inputs in a fine-grained manner by explicitly identifying granular chemical descriptors, such as functional groups, prior to generating answers. This approach ensures the production of explicit and interpretable reasoning paths for complex visual chemical problems. To facilitate this methodology, we implement a cross-modality reverse-engineering strategy,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xxlllz/ChemVLR
github

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.