SynChart: Synthesizing Charts from Language Models

Mengchen Liu; Qixiu Li; Dongdong Chen; Dong Chen; Jianmin Bao,; Yunsheng Li

arXiv:2409.16517·cs.AI·September 26, 2024

SynChart: Synthesizing Charts from Language Models

Mengchen Liu, Qixiu Li, Dongdong Chen, Dong Chen, Jianmin Bao,, Yunsheng Li

PDF

Open Access

TL;DR

This paper introduces SynChart, a large-scale dataset for chart understanding, and demonstrates that training a 4.2B parameter model on this data achieves near-GPT-4V performance on chart question-answering tasks.

Contribution

It presents a new extensive dataset and a specialized model that surpasses existing models in chart understanding without relying on multi-modality training.

Findings

01

The SynChart dataset contains 4 million charts with 75 million annotations.

02

The trained 4.2B model achieves near-GPT-4V performance on ChartQA.

03

The approach demonstrates the potential of LLMs for multi-modality data generation.

Abstract

With the release of GPT-4V(O), its use in generating pseudo labels for multi-modality tasks has gained significant popularity. However, it is still a secret how to build such advanced models from its base large language models (LLMs). This work explores the potential of using LLMs alone for data generation and develop competitive multi-modality models focusing on chart understanding. We construct a large-scale chart dataset, SynChart, which contains approximately 4 million diverse chart images with over 75 million dense annotations, including data tables, code, descriptions, and question-answer sets. We trained a 4.2B chart-expert model using this dataset and achieve near-GPT-4O performance on the ChartQA task, surpassing GPT-4V.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling

MethodsBalanced Selection