Analyzing the Persistence of Referenced Web Resources with Memento

Robert Sanderson; Mark Phillips; Herbert Van de Sompel

arXiv:1105.3459·cs.DL·May 18, 2011·24 cites

Analyzing the Persistence of Referenced Web Resources with Memento

Robert Sanderson, Mark Phillips, Herbert Van de Sompel

PDF

Open Access

TL;DR

This study analyzes the persistence and archival status of web resources cited in scholarly papers from arXiv and UNT, revealing significant resource loss and emphasizing the need for better archiving practices.

Contribution

It presents the largest automated analysis of referenced web resource persistence across two repositories, highlighting differences based on repository type and content.

Findings

01

45% of arXiv URLs still exist but are not archived

02

28% of UNT URLs have been lost

03

Automated processing of over 160,000 URLs

Abstract

In this paper we present the results of a study into the persistence and availability of web resources referenced from papers in scholarly repositories. Two repositories with different characteristics, arXiv and the UNT digital library, are studied to determine if the nature of the repository, or of its content, has a bearing on the availability of the web resources cited by that content. Memento makes it possible to automate discovery of archived resources and to consider the time between the publication of the research and the archiving of the referenced URLs. This automation allows us to process more than 160000 URLs, the largest known such study, and the repository metadata allows consideration of the results by discipline. The results are startling: 45% (66096) of the URLs referenced from arXiv still exist, but are not preserved for future generations, and 28% of resources…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies · Scientific Computing and Data Management · Research Data Management Practices