# Dataset search: a survey

**Authors:** Adriane Chapman, Elena Simperl, Laura Koesten, George, Konstantinidis, Luis-Daniel Ib\'a\~nez-Gonzalez, Emilia Kacprzak and, Paul Groth

arXiv: 1901.00735 · 2022-11-10

## TL;DR

This survey reviews current research and commercial systems for dataset search, highlighting unique challenges, methods, and open problems in retrieving datasets across various online repositories.

## Contribution

It provides a comprehensive overview of the state of dataset search, identifying its unique challenges, methods, and future research directions.

## Key findings

- Dataset search is an emerging research field with unique challenges.
- Existing approaches draw from information retrieval, databases, and tabular search.
- Open problems include improving relevance and accessibility of dataset retrieval.

## Abstract

Generating value from data requires the ability to find, access and make sense of datasets. There are many efforts underway to encourage data sharing and reuse, from scientific publishers asking authors to submit data alongside manuscripts to data marketplaces, open data portals and data communities. Google recently beta released a search service for datasets, which allows users to discover data stored in various online repositories via keyword queries. These developments foreshadow an emerging research field around dataset search or retrieval that broadly encompasses frameworks, methods and tools that help match a user data need against a collection of datasets. Here, we survey the state of the art of research and commercial systems in dataset retrieval. We identify what makes dataset search a research field in its own right, with unique challenges and methods and highlight open problems. We look at approaches and implementations from related areas dataset search is drawing upon, including information retrieval, databases, entity-centric and tabular search in order to identify possible paths to resolve these open problems as well as immediate next steps that will take the field forward.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1901.00735/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/1901.00735/full.md

## References

153 references — full list in the complete paper: https://tomesphere.com/paper/1901.00735/full.md

---
Source: https://tomesphere.com/paper/1901.00735