ChartLlama: A Multimodal LLM for Chart Understanding and Generation
Yucheng Han, Chi Zhang, Xin Chen, Xu Yang, Zhibin Wang, Gang Yu, Bin, Fu, Hanwang Zhang

TL;DR
ChartLlama is a novel multimodal large language model trained on a high-quality, GPT-4 generated dataset, significantly improving understanding and generation of chart figures across various tasks.
Contribution
The paper introduces a flexible, multi-step data generation process for creating diverse instruction-tuning datasets and presents ChartLlama, a model that outperforms prior methods in chart understanding tasks.
Findings
ChartLlama surpasses previous models in ChartQA, Chart-to-text, and Chart-extraction benchmarks.
The data generation method enables diverse, high-quality training data with low resource costs.
ChartLlama shows significant improvements on a new, comprehensive chart dataset.
Abstract
Multi-modal large language models have demonstrated impressive performances on most vision-language tasks. However, the model generally lacks the understanding capabilities for specific domain data, particularly when it comes to interpreting chart figures. This is mainly due to the lack of relevant multi-modal instruction tuning datasets. In this article, we create a high-quality instruction-tuning dataset leveraging GPT-4. We develop a multi-step data generation process in which different steps are responsible for generating tabular data, creating chart figures, and designing instruction tuning data separately. Our method's flexibility enables us to generate diverse, high-quality instruction-tuning data consistently and efficiently while maintaining a low resource expenditure. Additionally, it allows us to incorporate a wider variety of chart and task types not yet featured in existing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Multimodal Machine Learning Applications · Topic Modeling
MethodsMulti-Head Attention · Attention Is All You Need · Dense Connections · Dropout · Byte Pair Encoding · Softmax · Layer Normalization · Linear Layer · Position-Wise Feed-Forward Layer · Absolute Position Encodings
