Cobwebs from the Past and Present: Extracting Large Social Networks using Internet Archive Data
Miroslav Shaltev, Jan-Hendrik Zab, Philipp Kemkes, Stefan Siersdorfer,, Sergej Zerr

TL;DR
This paper introduces SocGraph, a system for extracting and analyzing large-scale social networks from 2 billion web pages collected over 17 years, enabling study of their evolution.
Contribution
It presents novel methods for constructing large social graphs from web data and an interface for exploring their temporal changes.
Findings
Constructed a social graph from 2 billion web pages.
Enabled analysis of social network evolution over 17 years.
Provided tools for large-scale social relation extraction.
Abstract
Social graph construction from various sources has been of interest to researchers due to its application potential and the broad range of technical challenges involved. The World Wide Web provides a huge amount of continuously updated data and information on a wide range of topics created by a variety of content providers, and makes the study of extracted people networks and their temporal evolution valuable for social as well as computer scientists. In this paper we present SocGraph - an extraction and exploration system for social relations from the content of around 2 billion web pages collected by the Internet Archive over the 17 years time period between 1996 and 2013. We describe methods for constructing large social graphs from extracted relations and introduce an interface to study their temporal evolution.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWeb Data Mining and Analysis · Complex Network Analysis Techniques · Data Quality and Management
