Measuring the diversity of data and metadata in digital libraries

Rafael C. Carrasco; Gustavo Candela; Manuel Marco-Such

arXiv:2301.01193·cs.DL·January 4, 2023

Measuring the diversity of data and metadata in digital libraries

Rafael C. Carrasco, Gustavo Candela, Manuel Marco-Such

PDF

Open Access

TL;DR

This paper explores the application of diversity indices to digital libraries for analyzing data and metadata variability, offering a robust way to identify trends and compare content across collections.

Contribution

It introduces the use of biodiversity-inspired diversity indices to measure and analyze the variability of data and metadata in digital libraries.

Findings

01

Diversity indices effectively capture variability in digital library content.

02

These indices can identify trends and differences in topics and metadata coverage.

03

The approach provides a robust alternative to abundance-based measures.

Abstract

Diversity indices have been traditionally used to capture the biodiversity of ecosystems by measuring the effective number of species or groups of species. In contrast to abundance, which is correlated with the amount of data available, diversity indices provide a more robust indicator on the variability of individuals. These types of indices can be employed in the context of digital libraries to identify trends in the distribution of topics, compare the lexica employed by different authors or analyze the coverage of semantic metadata

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpecies Distribution and Climate Change · Semantic Web and Ontologies · Environmental DNA in Biodiversity Studies