Archiving and Replaying Current Web Advertisements: Challenges and Opportunities

Travis Reid; Alex H. Poole; Hyung Wook Choi; Christopher Rauch; Mat Kelly; Michael L. Nelson; Michele C. Weigle

arXiv:2502.01525·cs.DL·September 24, 2025

Archiving and Replaying Current Web Advertisements: Challenges and Opportunities

Travis Reid, Alex H. Poole, Hyung Wook Choi, Christopher Rauch, Mat Kelly, Michael L. Nelson, Michele C. Weigle

PDF

Open Access

TL;DR

This paper investigates the challenges of archiving and replaying web advertisements, identifying key technical issues and proposing solutions to improve digital preservation of ad content.

Contribution

It systematically analyzes archiving and replay challenges for web ads and offers practical solutions to enhance fidelity and completeness of archived ads.

Findings

01

Identified five key problems in archiving and replaying ads.

02

Proposed updates to fuzzy matching for better ad resource replay.

03

Demonstrated solutions improve ad replay success rates.

Abstract

Although web advertisements represent an inimitable part of digital cultural heritage, serious archiving and replay challenges persist. To explore these challenges, we created a dataset of 279 archived ads. We encountered five problems in archiving and replaying them. For one, prior to August 2023, Internet Archive's Save Page Now service excluded not only well-known ad services' ads, but also URLs with ad related file and directory names. Although after August 2023, Save Page Now still blocked the archiving of ads loaded on a web page, it permitted the archiving of an ad's resources if the user directly archived the URL(s) associated with the ad. Second, Brozzler's incompatibility with Chrome prevented ads from being archived. Third, during crawling and replay sessions, Google's and Amazon's ad scripts generated URLs with different random values. This precluded archived ads' replay.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsWeb Data Mining and Analysis