Boosting Text-to-Chart Retrieval through Training with Synthesized Semantic Insights
Yifan Wu, Lutao Yan, Yizhang Zhu, Yenchi Tseng, Yinan Mei, Yong Wang, Jiannan Wang, Nan Tang, and Yuyu Luo

TL;DR
This paper introduces CRBench, a real-world benchmark for text-to-chart retrieval, and proposes ChartFinder, a model leveraging synthesized semantic insights to improve deep semantic understanding and retrieval accuracy.
Contribution
The paper presents CRBench, a large-scale real-world benchmark, and a semantic insights synthesis pipeline, enabling the development of ChartFinder with enhanced semantic reasoning capabilities.
Findings
ChartFinder outperforms existing methods on CRBench with up to 66.9% NDCG@10.
Synthesized semantic insights significantly improve retrieval performance.
The benchmark provides a realistic evaluation environment for text-to-chart retrieval models.
Abstract
Text-to-chart retrieval, enabling users to find relevant charts via natural language queries, has gained significant attention. However, evaluating models in real-world business intelligence (BI) scenarios is challenging, as current benchmarks fail to simulate realistic user queries or test for deep semantic understanding with static chart images.To address this gap, we introduce CRBench, the first real-world BI-sourced benchmark comprising 21,862 charts and 326 queries, utilizing a Target-and-Distractor paradigm to evaluate discriminative retrieval among highly similar candidates. Testing on CRBench reveals that existing methods, which rely primarily on visual features, perform poorly and fail to capture the rich analytical semantics of charts. To address this performance bottleneck, we propose a semantic insights synthesis pipeline that automatically generates three hierarchical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Data Quality and Management · Semantic Web and Ontologies
