Caching HTTP 404 Responses Eliminates Unnecessary Archival Replay Requests
Kritika Garg, Himarsha R. Jayanetti, Sawood Alam, Michele C. Weigle,, Michael L. Nelson

TL;DR
This paper addresses unnecessary web archive traffic caused by recurring HTTP requests from JavaScript on archived pages, proposing a caching solution for 404 responses to reduce resource consumption during replay.
Contribution
It introduces a method to cache HTTP 404 responses during archival replay, significantly reducing unnecessary traffic and resource usage.
Findings
Caching 404 responses prevents repeated requests to the archive.
Implementation reduces network and storage resources during replay.
Approach effectively mitigates high traffic from dynamic archived pages.
Abstract
Upon replay, JavaScript on archived web pages can generate recurring HTTP requests that lead to unnecessary traffic to the web archive. In one example, an archived page averaged more than 1000 requests per minute. These requests are not visible to the user, so if a user leaves such an archived page open in a browser tab, they would be unaware that their browser is continuing to generate traffic to the web archive. We found that web pages that require regular updates (e.g., radio playlists, updates for sports scores, image carousels) are more likely to make such recurring requests. If the resources requested by the web page are not archived, some web archives may attempt to patch the archive by requesting the resources from the live web. If the requested resources are unavailable on the live web, the resources cannot be archived, and the responses remain HTTP 404. Some archived pages…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWeb Data Mining and Analysis · Caching and Content Delivery · Web Application Security Vulnerabilities
