Disappearing repositories -- taking an infrastructure perspective on the long-term availability of research data
Dorothea Strecker, Heinz Pampel, Rouven Schabinger, Nina Leonie Weisweiler

TL;DR
This paper investigates the long-term availability of research data by analyzing the shutdown of 6.2% of repositories, highlighting risks, strategies, and the impact on data preservation from an infrastructure perspective.
Contribution
It provides the first comprehensive analysis of research data repository shutdowns, including causes, strategies to mitigate data loss, and implications for data permanence.
Findings
6.2% of repositories shut down after median 12 years
44% migrated data to other repositories
12% maintained limited access post-shutdown
Abstract
Currently, there is limited research investigating the phenomenon of research data repositories being shut down, and the impact this has on the long-term availability of data. This paper takes an infrastructure perspective on the preservation of research data by using a registry to identify 191 research data repositories that have been closed and presenting information on the shutdown process. The results show that 6.2 % of research data repositories indexed in the registry were shut down. The risks resulting in repository shutdown are varied. The median age of a repository when shutting down is 12 years. Strategies to prevent data loss at the infrastructure level are pursued to varying extent. 44 % of the repositories in the sample migrated data to another repository, and 12 % maintain limited access to their data collection. However, both strategies are not permanent solutions.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Research Data Management Practices · Data Quality and Management
