Big Data Science Over the Past Web

Miguel Costa; Julien Masan\`es

arXiv:2108.01605·cs.DL·August 4, 2021

Big Data Science Over the Past Web

Miguel Costa, Julien Masan\`es

PDF

TL;DR

This paper discusses how big data science tools and machine learning techniques are increasingly used to analyze web archives, enabling more efficient and insightful longitudinal studies of historical web data.

Contribution

It provides an overview of various big data and machine learning tools applied to web archives for enhanced analysis and research capabilities.

Findings

01

Big data tools improve scalability of web archive analysis

02

Machine learning enables deeper insights into historical web data

03

Web archives benefit from advanced computational techniques for research

Abstract

Web archives preserve unique and historically valuable information. They hold a record of past events and memories published by all kinds of people, such as journalists, politicians and ordinary people who have shared their testimony and opinion on multiple subjects. As a result, researchers such as historians and sociologists have used web archives as a source of information to understand the recent past since the early days of the World Wide Web. The typical way to extract knowledge from a web archive is by using its search functionalities to find and analyse historical content. This can be a slow and superficial process when analysing complex topics, due to the huge amount of data that web archives have been preserving over time. Big data science tools can cope with this order of magnitude, enabling researchers to automatically extract meaningful knowledge from the archived data.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.