ChartREG++: Towards Benchmarking and Improving Chart Referring Expression Grounding under Diverse referring clues and Multi-Target Referring
Tianhao Niu, Ziyu Han, Qingfu Zhu, Wanxiang Che

TL;DR
This paper introduces a comprehensive benchmark for chart referring expression grounding, addressing limitations of prior datasets by supporting multiple localization forms, multiple targets, diverse cues, and chart types, and proposes a novel synthesis pipeline and segmentation model.
Contribution
It presents a new benchmark for chart grounding with diverse features, a code-driven synthesis pipeline for pixel-accurate masks, and a multimodal grounding framework that outperforms baselines.
Findings
Significant performance gap revealed by large models on the benchmark.
Synthesized masks improve instance segmentation accuracy.
The system generalizes well to real-chart grounding tasks.
Abstract
Referring expression grounding is a core problem in visual grounding and is widely used as a diagnostic of spatial grounding and reasoning in vision and language models, yet most prior work focuses on natural images. In contrast, existing chart referring expression grounding-related benchmarks remain limited: (1) they largely adopt bounding boxes, constraining localization precision for fine chart elements (2) they mostly assume a single and two referred target instances, failing to handle multi-instance target references; (3) the language expressions over-rely on textual cues or data-rank clues (4) they cover only a narrow range of chart types. To address these issues, we introduce a chart referring expression grounding benchmark that systematically supports multiple localization forms, multiple referred targets, diverse grounding cues and diverse chart types. Results across…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
