The Memento Tracer Framework: Balancing Quality and Scalability for Web Archiving
Martin Klein, Harihar Shankar, Lyudmila Balakireva, Herbert Van de, Sompel

TL;DR
The paper introduces the Memento Tracer framework, which balances high-quality web archiving with scalability, addressing challenges posed by dynamic web content and operational constraints.
Contribution
It presents a novel framework that combines quality and scalability in web archiving, with architecture and evaluation demonstrating its effectiveness.
Findings
Quality is comparable or better than existing frameworks.
Scalability overhead is manageable.
Framework effectively balances quality and scalability.
Abstract
Web archiving frameworks are commonly assessed by the quality of their archival records and by their ability to operate at scale. The ubiquity of dynamic web content poses a significant challenge for crawler-based solutions such as the Internet Archive that are optimized for scale. Human driven services such as the Webrecorder tool provide high-quality archival captures but are not optimized to operate at scale. We introduce the Memento Tracer framework that aims to balance archival quality and scalability. We outline its concept and architecture and evaluate its archival quality and operation at scale. Our findings indicate quality is on par or better compared against established archiving frameworks and operation at scale comes with a manageable overhead.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
