TinyChart: Efficient Chart Understanding with Visual Token Merging and Program-of-Thoughts Learning
Liang Zhang, Anwen Hu, Haiyang Xu, Ming Yan, Yichen Xu, Qin Jin, Ji, Zhang, Fei Huang

TL;DR
TinyChart is a compact 3-billion-parameter multimodal model that efficiently understands charts by generating Python programs for calculations and merging vision tokens, achieving state-of-the-art results with high efficiency.
Contribution
The paper introduces TinyChart, a novel efficient multimodal large language model for chart understanding that combines Program-of-Thoughts learning and vision token merging to outperform larger models.
Findings
Achieves state-of-the-art performance on multiple chart understanding benchmarks.
Outperforms larger models like ChartLlama, ChartAst, and GPT-4V on ChartQA.
Demonstrates higher inference throughput due to smaller size and efficient vision encoding.
Abstract
Charts are important for presenting and explaining complex data relationships. Recently, multimodal large language models (MLLMs) have shown remarkable capabilities in various chart understanding tasks. However, the sheer size of these models in terms of parameters and computational requirements limits their use in resource-constrained environments. In this paper, we present TinyChart, an efficient MLLM for chart understanding with only 3B parameters. TinyChart overcomes two key challenges in efficient chart understanding: (1) reduce the burden of learning numerical computations through a Program-of-Thoughts (PoT) learning strategy, which trains the model to generate Python programs for numerical calculations, and (2) reduce lengthy vision feature sequences produced by the vision transformer for high-resolution images through a Vision Token Merging module, which gradually merges most…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Anomaly Detection Techniques and Applications · Video Analysis and Summarization
MethodsAttention Is All You Need · Linear Layer · Layer Normalization · Multi-Head Attention · Dense Connections · Residual Connection · Softmax · Vision Transformer
