TinyChemVL: Advancing Chemical Vision-Language Models via Efficient Visual Token Reduction and Complex Reaction Tasks

Xuanle Zhao; Shuxin Zeng; Xinyuan Cai; Xiang Cheng; Duzhen Zhang; Xiuyi Chen; Bo Xu

arXiv:2511.06283·cs.CV·November 27, 2025

TinyChemVL: Advancing Chemical Vision-Language Models via Efficient Visual Token Reduction and Complex Reaction Tasks

Xuanle Zhao, Shuxin Zeng, Xinyuan Cai, Xiang Cheng, Duzhen Zhang, Xiuyi Chen, Bo Xu

PDF

Open Access 1 Video

TL;DR

TinyChemVL is a new chemical vision-language model that uses visual token reduction and reaction-level tasks to improve efficiency and reasoning in chemical image understanding, outperforming previous models with fewer parameters.

Contribution

The paper introduces TinyChemVL, a novel chemical VLM that employs visual token reduction and reaction-level tasks, significantly enhancing efficiency and reasoning over prior models.

Findings

01

TinyChemVL achieves superior performance on molecular and reaction tasks.

02

It outperforms ChemVLM while using only 1/16th of visual tokens.

03

The model demonstrates faster inference and training speeds.

Abstract

While Vision Language Models (VLMs) have demonstrated remarkable capabilities in general visual understanding, their application in the chemical domain has been limited, with previous works predominantly focusing on text and thus overlooking critical visual information, such as molecular structures. Current approaches that directly adopt standard VLMs for chemical tasks suffer from two primary issues: (i) computational inefficiency of processing entire chemical images with non-informative backgrounds. (ii) a narrow scope on molecular-level tasks that restricts progress in chemical reasoning. In this work, we propose \textbf{TinyChemVL}, an efficient and powerful chemical VLM that leverages visual token reduction and reaction-level tasks to improve model efficiency and reasoning capacity. Also, we propose \textbf{ChemRxn-V}, a reaction-level benchmark for assessing vision-based reaction…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

TinyChemVL: Advancing Chemical Vision-Language Models via Efficient Visual Token Reduction and Complex Reaction Tasks· underline

Taxonomy

TopicsMachine Learning in Materials Science · Multimodal Machine Learning Applications · Computational Drug Discovery Methods