An Evaluation of Caching Policies for Memento TimeMaps
Justin F. Brunelle, Michael L. Nelson

TL;DR
This paper evaluates caching policies for Memento TimeMaps, analyzing their change patterns over time and proposing an algorithm that optimizes cache freshness and completeness, improving access to archive snapshots.
Contribution
It introduces a new caching algorithm for TimeMaps that leverages their mostly increasing nature and empirically determines optimal TTL settings.
Findings
80.2% of TimeMaps are monotonically increasing
A TTL of 15 days minimizes missed mementos
The proposed caching algorithm reduces archive load
Abstract
As defined by the Memento Framework, TimeMaps are ma-chine-readable lists of time-specific copies -- called "mementos" -- of an archived original resource. In theory, as an archive acquires additional mementos over time, a TimeMap should be monotonically increasing. However, there are reasons why the number of mementos in a TimeMap would decrease, for example: archival redaction of some or all of the mementos, archival restructuring, and transient errors on the part of one or more archives. We study TimeMaps for 4,000 original resources over a three month period, note their change patterns, and develop a caching algorithm for TimeMaps suitable for a reverse proxy in front of a Memento aggregator. We show that TimeMap cardinality is constant or monotonically increasing for 80.2% of all TimeMap downloads observed in the observation period. The goal of the caching algorithm is to exploit…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCaching and Content Delivery · Advanced Data Storage Technologies · Web Data Mining and Analysis
