START: Spatial and Textual Learning for Chart Understanding

Zhuoming Liu; Xiaofeng Gao; Feiyang Niu; Qiaozi Gao; Liu Liu; Robinson Piramuthu

arXiv:2512.07186·cs.CV·December 9, 2025

START: Spatial and Textual Learning for Chart Understanding

Zhuoming Liu, Xiaofeng Gao, Feiyang Niu, Qiaozi Gao, Liu Liu, Robinson Piramuthu

PDF

Open Access 1 Models 4 Datasets

TL;DR

START introduces a novel approach combining spatial and textual learning to improve chart understanding in multimodal large language models, using a new dataset and benchmark for evaluation.

Contribution

The paper proposes START, a method integrating chart-element grounding and chart-to-code generation, along with a new dataset and benchmark for comprehensive chart understanding.

Findings

01

START achieves significant performance improvements over baseline models.

02

The START-Dataset enables effective training of spatial and textual chart understanding.

03

START surpasses previous state-of-the-art methods on benchmark evaluations.

Abstract

Chart understanding is crucial for deploying multimodal large language models (MLLMs) in real-world scenarios such as analyzing scientific papers and technical reports. Unlike natural images, charts pair a structured visual layout (spatial property) with an underlying data representation (textual property) -- grasping both is essential for precise, fine-grained chart reasoning. Motivated by this observation, we propose START, the Spatial and Textual learning for chART understanding. Specifically, we introduce (i) chart-element grounding and (ii) chart-to-code generation to strengthen an MLLM's understanding of both chart visual layout and data details. To facilitate spatial and textual learning, we propose the START-Dataset generated with a novel data-generation pipeline that first leverages an MLLM to translate real chart images into executable chart code, recovering the underlying…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
zhuomingliu/START
model· 5 dl
5 dl

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Multimodal Machine Learning Applications · Data Visualization and Analytics