Profiling Web Archival Voids for Memento Routing
Sawood Alam, Michele C. Weigle, Michael L. Nelson

TL;DR
This paper introduces the concept of Archival Voids to identify unarchived URI spaces, enhancing web archive profiling and improving Memento routing accuracy by reducing false positives.
Contribution
It defines and explores Archival Voids, demonstrating how they complement existing holdings profiles to improve web archive profiling accuracy.
Findings
Archival Voids can reduce false positives by over 8%.
Profiles based on access logs improve Memento aggregator accuracy.
Using Voids profiles can significantly enhance web archive search precision.
Abstract
Prior work on web archive profiling were focused on Archival Holdings to describe what is present in an archive. This work defines and explores Archival Voids to establish a means to represent portions of URI spaces that are not present in a web archive. Archival Holdings and Archival Voids profiles can work independently or as complements to each other to maximize the Accuracy of Memento Aggregators. We discuss various sources of truth that can be used to create Archival Voids profiles. We use access logs from Arquivo.pt to create various Archival Voids profiles and analyze them against our MemGator access logs for evaluation. We find that we could have avoided more than 8% of additional False Positives on top of the 60% Accuracy we got from profiling Archival Holdings in our prior work, if Arquivo.pt were to provide an Archival Voids profile based on URIs that were requested hundreds…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
