Studying Retrievability of Publications and Datasets in an Integrated Retrieval System
Dwaipayan Roy, Zeljko Carevic, Philipp Mayr

TL;DR
This paper examines how easily datasets and publications can be retrieved from a real-world digital library, using retrievability metrics to identify biases and differences in access across document types.
Contribution
It introduces a system-oriented approach to measure retrievability of datasets and publications, including accessibility biases and usefulness metrics, in a real digital library setting.
Findings
Significant diversity in retrievability scores among document types
Use of Lorenz curves and Gini coefficients to visualize access disparities
Empirical evidence of accessibility biases in digital library items
Abstract
In this paper, we investigate the retrievability of datasets and publications in a real-life Digital Library (DL). The measure of retrievability was originally developed to quantify the influence that a retrieval system has on the access to information. Retrievability can also enable DL engineers to evaluate their search engine to determine the ease with which the content in the collection can be accessed. Following this methodology, in our study, we propose a system-oriented approach for studying dataset and publication retrieval. A speciality of this paper is the focus on measuring the accessibility biases of various types of DL items and including a metric of usefulness. Among other metrics, we use Lorenz curves and Gini coefficients to visualize the differences of the two retrievable document types (specifically datasets and publications). Empirical results reported in the paper…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
