Routing Memento Requests Using Binary Classifiers

Nicolas J. Bornand; Lyudmila Balakireva; Herbert Van de Sompel

arXiv:1606.09136·cs.DL·June 30, 2016

Routing Memento Requests Using Binary Classifiers

Nicolas J. Bornand, Lyudmila Balakireva, Herbert Van de Sompel

PDF

TL;DR

This paper proposes using binary classifiers based on cached content to efficiently route Memento requests across multiple web archives, significantly reducing requests and response times while maintaining high recall.

Contribution

It introduces a novel approach of archive-specific classifiers for query routing, improving efficiency in Memento aggregators over heuristic methods.

Findings

01

Classifiers reduce requests by 77% compared to brute force

02

Response times decrease by 42%

03

Recall remains high at 0.847

Abstract

The Memento protocol provides a uniform approach to query individual web archives. Soon after its emergence, Memento Aggregator infrastructure was introduced that supports querying across multiple archives simultaneously. An Aggregator generates a response by issuing the respective Memento request against each of the distributed archives it covers. As the number of archives grows, it becomes increasingly challenging to deliver aggregate responses while keeping response times and computational costs under control. Ad-hoc heuristic approaches have been introduced to address this challenge and research has been conducted aimed at optimizing query routing based on archive profiles. In this paper, we explore the use of binary, archive-specific classifiers generated on the basis of the content cached by an Aggregator, to determine whether or not to query an archive for a given URI. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.