# A Framework for Evaluating Snippet Generation for Dataset Search

**Authors:** Xiaxia Wang, Jinchi Chen, Shuxin Li, Gong Cheng, Jeff Z. Pan, Evgeny, Kharlamov, Yuzhong Qu

arXiv: 1907.01183 · 2019-07-03

## TL;DR

This paper introduces a quantitative framework for evaluating dataset snippets in search engines, addressing a gap in research by assessing snippet relevance and content coverage through empirical and user studies.

## Contribution

It proposes a novel evaluation framework for dataset snippet quality and adapts existing methods to establish baseline performance.

## Key findings

- The framework effectively measures snippet relevance and coverage.
- Baseline methods show varying performance on real-world datasets.
- User study confirms the framework's validity and usefulness.

## Abstract

Reusing existing datasets is of considerable significance to researchers and developers. Dataset search engines help a user find relevant datasets for reuse. They can present a snippet for each retrieved dataset to explain its relevance to the user's data needs. This emerging problem of snippet generation for dataset search has not received much research attention. To provide a basis for future research, we introduce a framework for quantitatively evaluating the quality of a dataset snippet. The proposed metrics assess the extent to which a snippet matches the query intent and covers the main content of the dataset. To establish a baseline, we adapt four state-of-the-art methods from related fields to our problem, and perform an empirical evaluation based on real-world datasets and queries. We also conduct a user study to verify our findings. The results demonstrate the effectiveness of our evaluation framework, and suggest directions for future research.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1907.01183/full.md

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/1907.01183/full.md

## References

45 references — full list in the complete paper: https://tomesphere.com/paper/1907.01183/full.md

---
Source: https://tomesphere.com/paper/1907.01183