Archiving the Relaxed Consistency Web
Zhiwu Xie, Herbert Van de Sompel, Jinyang Liu, Johann van Reenen,, Ramiro Jordan

TL;DR
This paper examines how relaxed consistency web design impacts web archiving quality, revealing that inconsistencies can persist longer in archives and proposing remedies to mitigate degradation.
Contribution
It introduces a simulation approach to quantify archival quality degradation caused by relaxed consistency web architectures.
Findings
A significant portion of relaxed consistency web archives contain observable inconsistencies.
Inconsistency windows in archives can be longer than those at the data store.
Proposes potential remedies to improve archival quality.
Abstract
The historical, cultural, and intellectual importance of archiving the web has been widely recognized. Today, all countries with high Internet penetration rate have established high-profile archiving initiatives to crawl and archive the fast-disappearing web content for long-term use. As web technologies evolve, established web archiving techniques face challenges. This paper focuses on the potential impact of the relaxed consistency web design on crawler driven web archiving. Relaxed consistent websites may disseminate, albeit ephemerally, inaccurate and even contradictory information. If captured and preserved in the web archives as historical records, such information will degrade the overall archival quality. To assess the extent of such quality degradation, we build a simplified feed-following application and simulate its operation with synthetic workloads. The results indicate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
