Dataset Discovery via Line Charts
Daomin Ji, Hui Luo, Zhifeng Bao, J. Shane Culpepper

TL;DR
This paper introduces a novel method for discovering datasets from large repositories by matching line charts to datasets using a fine-grained cross-modal relevance learning model, supported by a new benchmark and extensive evaluations.
Contribution
The paper proposes FCM, a new approach that accurately matches line charts to datasets, and creates the first benchmark for dataset discovery via line charts.
Findings
FCM surpasses baselines by 30.1% in prec@50
FCM surpasses baselines by 41.0% in ndcg@50
Effective for dataset discovery through line chart queries
Abstract
Line charts are a valuable tool for data analysis and exploration, distilling essential insights from a dataset. However, access to the underlying dataset behind a line chart is rarely readily available. In this paper, we explore a novel dataset discovery problem, dataset discovery via line charts, focusing on the use of line charts as queries to discover datasets within a large data repository that are capable of generating similar line charts. To solve this problem, we propose a novel approach called Fine-grained Cross-modal Relevance Learning Model (FCM), which aims to estimate the relevance between a line chart and a candidate dataset. To achieve this goal, FCM first employs a visual element extractor to extract informative visual elements, i.e., lines and y-ticks, from a line chart. Then, two novel segment-level encoders are adopted to learn representations for a line chart and a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries
