Rethinking Dataset Discovery with DataScout
Rachel Lin, Bhavya Chopra, Wenjing Lin, Shreya Shankar, Madelon Hulsebos, Aditya G. Parameswaran

TL;DR
DataScout is a novel system that enhances dataset discovery by providing AI-driven query reformulation, semantic filtering, and relevance indicators, thereby improving user exploration and understanding of datasets for data science tasks.
Contribution
The paper introduces DataScout, a new dataset search tool that combines AI-assisted query reformulation, semantic filtering, and relevance indicators to improve dataset discovery and user experience.
Findings
Users employ DataScout's features for structured exploration.
DataScout helps users understand the search space better.
Participants found DataScout improved their dataset search process.
Abstract
Dataset Search -- the process of finding appropriate datasets for a given task -- remains a critical yet under-explored challenge in data science workflows. Assessing dataset suitability for a task (e.g., training a classification model) is a multi-pronged affair that involves understanding: data characteristics (e.g. granularity, attributes, size), semantics (e.g., data semantics, creation goals), and relevance to the task at hand. Present-day dataset search interfaces are restrictive -- users struggle to convey implicit preferences and lack visibility into the search space and result inclusion criteria -- making query iteration challenging. To bridge these gaps, we introduce DataScout to proactively steer users through the process of dataset discovery via -- (i) AI-assisted query reformulations informed by the underlying search space, (ii) semantic search and filtering based on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsResearch Data Management Practices · Ethics and Social Impacts of AI · Information Retrieval and Search Behavior
