Where Did the Web Archive Go?
Mohamed Aturban, Michael L. Nelson, Michele C. Weigle

TL;DR
This study longitudinally investigates web archives over 14 months, revealing that some archives changed their base URIs without proper updates, causing difficulties in locating and verifying archived pages.
Contribution
It provides the first detailed analysis of web archive stability and change over time, highlighting challenges in maintaining persistent access to archived content.
Findings
Four archives changed base URIs without machine-readable updates.
Out of 1,981 mementos, 537 were impacted by URI or timestamp changes.
20 mementos could not be rediscovered at all.
Abstract
To perform a longitudinal investigation of web archives and detecting variations and changes replaying individual archived pages, or mementos, we created a sample of 16,627 mementos from 17 public web archives. Over the course of our 14-month study (November, 2017 - January, 2019), we found that four web archives changed their base URIs and did not leave a machine-readable method of locating their new base URIs, necessitating manual rediscovery. Of the 1,981 mementos in our sample from these four web archives, 537 were impacted: 517 mementos were rediscovered but with changes in their time of archiving (or Memento-Datetime), HTTP status code, or the string comprising their original URI (or URI-R), and 20 of the mementos could not be found at all.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
