Evolution of the "long tail" concept for scientific data
Gretchen R. Stahlman, Inna Kouper

TL;DR
This paper reviews how the 'long-tail' concept for scientific data has evolved since 2007, highlighting its implications for data management, curation, and the integration of small, heterogeneous datasets in research ecosystems.
Contribution
It provides a comprehensive analysis of the long-tail concept's evolution, terminology, and utility across scientific data management and information science.
Findings
Long-tail data are often mismanaged due to inadequate practices.
The concept has evolved to encompass diverse data management strategies.
Bridges between LIS and domain-specific data curation enhance understanding.
Abstract
This review paper explores the evolution of discussions about "long-tail" scientific data in the scholarly literature. The "long-tail" concept, originally used to explain trends in digital consumer goods, was first applied to scientific data in 2007 to refer to a vast array of smaller, heterogeneous data collections that cumulatively represent a substantial portion of scientific knowledge. However, these datasets, often referred to as "long-tail data," are frequently mismanaged or overlooked due to inadequate data management practices and institutional support. This paper examines the changing landscape of discussions about long-tail data over time, situated within broader ecosystems of research data management and the natural interplay between "big" and "small" data. The review also bridges discussions on data curation in Library & Information Science (LIS) and domain-specific…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
