DSEBench: A Test Collection for Explainable Dataset Search with Examples
Qing Shi, Jing He, Qiaosheng Chen, Gong Cheng

TL;DR
This paper introduces DSEBench, a new test collection for evaluating explainable dataset search that combines keyword and dataset similarity queries, with annotations and baseline evaluations.
Contribution
It presents the first test collection supporting dataset and field-level evaluation for explainable dataset search, including annotations and baseline methods.
Findings
DSEBench enables comprehensive evaluation of dataset search methods.
Large language models can generate useful training annotations.
Baseline experiments demonstrate the effectiveness of various retrieval and explanation techniques.
Abstract
Dataset search is a well-established task in the Semantic Web and information retrieval research. Current approaches retrieve datasets either based on keyword queries or by identifying datasets similar to a given target dataset. These paradigms fail when the information need involves both keywords and target datasets. To address this gap, we investigate a generalized task, Dataset Search with Examples (DSE), and extend it to Explainable DSE (ExDSE), which further requires identifying relevant fields of the retrieved datasets. We construct DSEBench, the first test collection that provides high-quality dataset-level and field-level annotations to support the evaluation of DSE and ExDSE, respectively. In addition, we employ a large language model to generate extensive annotations for training purposes. We establish comprehensive baselines on DSEBench by adapting and evaluating a variety of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInformation Retrieval and Search Behavior · Topic Modeling · Semantic Web and Ontologies
