A Plan For Curating "Obsolete Data or Resources"
Michael L. Nelson

TL;DR
This paper discusses the challenges of web data preservation, emphasizing the risk of losing currently unknown important information, and reviews potential solutions to improve web archiving practices.
Contribution
It highlights the importance of preserving web data, especially unknown valuable information, and reviews existing issues and proposed solutions for web archiving.
Findings
Web has become the primary medium for cultural discourse.
Current preservation technology lags behind publishing technology.
Proposed solutions can enhance web data archivability.
Abstract
Our cultural discourse is increasingly carried in the web. With the initial emergence of the web many years ago, there was a period where conventional mediums (e.g., music, movies, books, scholarly publications) were primary and the web was a supplementary channel. This has now changed, where the web is often the primary channel, and other publishing mechanisms, if present at all, supplement the web. Unfortunately, the technology for publishing information on the web always outstrips our technology for preservation. My concern is less that we will lose data of known importance (e.g., scientific data, census data), but rather that we will lose data that we do not yet know is important. In this paper I review some of the issues and, where appropriate, proposed solutions for increasing the archivability of the web.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Humanities and Scholarship · Digital and Traditional Archives Management · Language and cultural evolution
