Big Data Science Over the Past Web
Miguel Costa, Julien Masan\`es

TL;DR
This paper discusses how big data science tools and machine learning techniques are increasingly used to analyze web archives, enabling more efficient and insightful longitudinal studies of historical web data.
Contribution
It provides an overview of various big data and machine learning tools applied to web archives for enhanced analysis and research capabilities.
Findings
Big data tools improve scalability of web archive analysis
Machine learning enables deeper insights into historical web data
Web archives benefit from advanced computational techniques for research
Abstract
Web archives preserve unique and historically valuable information. They hold a record of past events and memories published by all kinds of people, such as journalists, politicians and ordinary people who have shared their testimony and opinion on multiple subjects. As a result, researchers such as historians and sociologists have used web archives as a source of information to understand the recent past since the early days of the World Wide Web. The typical way to extract knowledge from a web archive is by using its search functionalities to find and analyse historical content. This can be a slow and superficial process when analysing complex topics, due to the huge amount of data that web archives have been preserving over time. Big data science tools can cope with this order of magnitude, enabling researchers to automatically extract meaningful knowledge from the archived data.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
