Analyzing Web Archives Through Topic and Event Focused Sub-collections
Gerhard Gossen, Elena Demidova, Thomas Risse

TL;DR
This paper proposes a methodology for extracting and analyzing topic and event-focused sub-collections from large web archives to facilitate societal and historical research.
Contribution
It introduces a framework for creating focused sub-collections from web archives, addressing challenges posed by their size and temporal nature.
Findings
Framework for sub-collection creation
Discussion of opportunities and challenges
Enhanced study of societal developments
Abstract
Web archives capture the history of the Web and are therefore an important source to study how societal developments have been reflected on the Web. However, the large size of Web archives and their temporal nature pose many challenges to researchers interested in working with these collections. In this work, we describe the challenges of working with Web archives and propose the research methodology of extracting and studying sub-collections of the archive focused on specific topics and events. We discuss the opportunities and challenges of this approach and suggest a framework for creating sub-collections.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
