ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation

Xuanle Zhao; Xianzhen Luo; Qi Shi; Chi Chen; Shuo Wang; Zhiyuan Liu; Maosong Sun

arXiv:2501.06598·cs.AI·July 3, 2025

ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation

Xuanle Zhao, Xianzhen Luo, Qi Shi, Chi Chen, Shuo Wang, Zhiyuan Liu, Maosong Sun

PDF

1 Repo 1 Models 3 Datasets

TL;DR

ChartCoder is a novel multimodal large language model designed specifically for chart-to-code generation, addressing key challenges with new datasets and methods to improve code accuracy and completeness.

Contribution

It introduces ChartCoder, the first dedicated chart-to-code MLLM, along with a large-scale dataset and a step-by-step generation method to enhance performance.

Findings

01

Outperforms existing open-source MLLMs on chart-to-code benchmarks.

02

Achieves better chart restoration and code executability with only 7B parameters.

03

Demonstrates the effectiveness of the Snippet-of-Thought method.

Abstract

Multimodal Large Language Models (MLLMs) have demonstrated remarkable capabilities in chart understanding tasks. However, interpreting charts with textual descriptions often leads to information loss, as it fails to fully capture the dense information embedded in charts. In contrast, parsing charts into code provides lossless representations that can effectively contain all critical details. Although existing open-source MLLMs have achieved success in chart understanding tasks, they still face two major challenges when applied to chart-to-code tasks: (1) Low executability and poor restoration of chart details in the generated code and (2) Lack of large-scale and diverse training data. To address these challenges, we propose \textbf{ChartCoder}, the first dedicated chart-to-code MLLM, which leverages Code LLMs as the language backbone to enhance the executability of the generated code.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

thunlp/ChartCoder
pytorchOfficial

Models

🤗
xxxllz/ChartCoder
model· 29 dl· ♡ 3
29 dl♡ 3

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.