Everything you always wanted to know about a dataset: studies in data summarisation
Laura Koesten, Elena Simperl, Emilia Kacprzak, Tom Blount, Jeni, Tennison

TL;DR
This paper investigates how textual summaries of datasets aid data understanding and discovery, through studies on user information needs and summary characteristics, leading to a template and guidelines for better data summarisation.
Contribution
It provides empirical insights into user data search behaviors and proposes a structured template for creating more meaningful dataset summaries.
Findings
Users seek key dataset attributes for effective data search
A template for dataset summaries improves understanding and retrieval
Guidelines enhance data-search experience
Abstract
Summarising data as text helps people make sense of it. It also improves data discovery, as search algorithms can match this text against keyword queries. In this paper, we explore the characteristics of text summaries of data in order to understand how meaningful summaries look like. We present two complementary studies: a data-search diary study with 69 students, which offers insight into the information needs of people searching for data; and a summarisation study, with a lab and a crowdsourcing component with overall 80 data-literate participants, which produced summaries for 25 datasets. In each study we carried out a qualitative analysis to identify key themes and commonly mentioned dataset attributes, which people consider when searching and making sense of data. The results helped us design a template to create more meaningful textual representations of data, alongside…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
