ChartM$^3$: Benchmarking Chart Editing with Multimodal Instructions
Donglu Yang, Liang Zhang, Zihao Yue, Liangyu Chen, Yichen Xu, Wenxuan Wang, Qin Jin

TL;DR
This paper introduces ChartM$^3$, a benchmark and training dataset for multimodal chart editing that combines natural language and visual cues, revealing current model limitations and improving capabilities through multimodal training.
Contribution
The paper presents a new benchmark, ChartM$^3$, for multimodal chart editing, and a large-scale training dataset to enhance model performance in interpreting combined visual and textual instructions.
Findings
Current multimodal models struggle with visual indicators.
Fine-tuning on ChartM$^3$-Train improves editing accuracy.
Benchmark enables comprehensive evaluation of multimodal chart editing.
Abstract
Charts are a fundamental visualization format widely used in data analysis across research and industry. While enabling users to edit charts based on high-level intentions is of great practical value, existing methods primarily rely on natural language instructions, which are often too ambiguous to support fine-grained editing. In this work, we introduce a novel paradigm for multimodal chart editing, where user intent is expressed through a combination of natural language and visual indicators that explicitly highlight the elements to be modified. To support this paradigm, we present Chart, a new benchmark for Multimodal chart editing with Multi-level complexity and Multi-perspective evaluation. Chart contains 1,000 samples spanning four levels of editing difficulty. Each sample includes triplets in the form of (chart, code, multimodal instructions). To…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
