Studying Retrievability of Publications and Datasets in an Integrated   Retrieval System

Dwaipayan Roy; Zeljko Carevic; Philipp Mayr

arXiv:2205.00937·cs.IR·July 22, 2022

Studying Retrievability of Publications and Datasets in an Integrated Retrieval System

Dwaipayan Roy, Zeljko Carevic, Philipp Mayr

PDF

TL;DR

This paper examines how easily datasets and publications can be retrieved from a real-world digital library, using retrievability metrics to identify biases and differences in access across document types.

Contribution

It introduces a system-oriented approach to measure retrievability of datasets and publications, including accessibility biases and usefulness metrics, in a real digital library setting.

Findings

01

Significant diversity in retrievability scores among document types

02

Use of Lorenz curves and Gini coefficients to visualize access disparities

03

Empirical evidence of accessibility biases in digital library items

Abstract

In this paper, we investigate the retrievability of datasets and publications in a real-life Digital Library (DL). The measure of retrievability was originally developed to quantify the influence that a retrieval system has on the access to information. Retrievability can also enable DL engineers to evaluate their search engine to determine the ease with which the content in the collection can be accessed. Following this methodology, in our study, we propose a system-oriented approach for studying dataset and publication retrieval. A speciality of this paper is the focus on measuring the accessibility biases of various types of DL items and including a metric of usefulness. Among other metrics, we use Lorenz curves and Gini coefficients to visualize the differences of the two retrievable document types (specifically datasets and publications). Empirical results reported in the paper…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.