Examining persistence of European open repository infrastructure and its diffusion in the scholarly record
George Macgregor, Joy Davidson

TL;DR
This study investigates the persistence of European open repositories, revealing that over 20% are 'dead' and examining how their impermanence affects scholarly literature and potential policy solutions.
Contribution
It introduces a novel dataset combining repository registry data and web archives to analyze repository persistence and its impact on scholarly records.
Findings
Over 20% of repositories are 'dead'
Approximately 19,000 scholarly works cite dead repositories
Evidence of 'dead on arrival' referencing in scholarly literature
Abstract
This article seeks to determine the extent to which the principle of persistence is observed by repositories and the organizations that operate them. We also evaluate the impact that negative repository persistence levels may be having on the scholarly record. We do this by interrogating and combining data about European repositories from several repository registries and web scraped sources, including the Internet Archive's Wayback Machine, thereby creating a unique dataset of historic repository locations and their OAI-PMH endpoints. We then use this data as the basis for text mining CORE, a vast corpus of scholarly outputs, to determine the extent to which impersistent European repository content has permeated the scholarly literature. Our findings indicate over a fifth of European repositories (> 20%) could be classified as 'dead', with an even greater proportion (> 40%) of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsResearch Data Management Practices · scientometrics and bibliometrics research · Information Retrieval and Search Behavior
