Archiving and Replaying Current Web Advertisements: Challenges and Opportunities
Travis Reid, Alex H. Poole, Hyung Wook Choi, Christopher Rauch, Mat Kelly, Michael L. Nelson, Michele C. Weigle

TL;DR
This paper investigates the challenges of archiving and replaying web advertisements, identifying key technical issues and proposing solutions to improve digital preservation of ad content.
Contribution
It systematically analyzes archiving and replay challenges for web ads and offers practical solutions to enhance fidelity and completeness of archived ads.
Findings
Identified five key problems in archiving and replaying ads.
Proposed updates to fuzzy matching for better ad resource replay.
Demonstrated solutions improve ad replay success rates.
Abstract
Although web advertisements represent an inimitable part of digital cultural heritage, serious archiving and replay challenges persist. To explore these challenges, we created a dataset of 279 archived ads. We encountered five problems in archiving and replaying them. For one, prior to August 2023, Internet Archive's Save Page Now service excluded not only well-known ad services' ads, but also URLs with ad related file and directory names. Although after August 2023, Save Page Now still blocked the archiving of ads loaded on a web page, it permitted the archiving of an ad's resources if the user directly archived the URL(s) associated with the ad. Second, Brozzler's incompatibility with Chrome prevented ads from being archived. Third, during crawling and replay sessions, Google's and Amazon's ad scripts generated URLs with different random values. This precluded archived ads' replay.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWeb Data Mining and Analysis
