# WikiLinkGraphs: A Complete, Longitudinal and Multi-Language Dataset of   the Wikipedia Link Networks

**Authors:** Cristian Consonni, David Laniado, Alberto Montresor

arXiv: 1902.04298 · 2019-04-05

## TL;DR

This paper introduces a comprehensive, longitudinal dataset of Wikipedia internal link networks across nine languages, capturing editor-intended links over 17 years, enabling diverse research opportunities.

## Contribution

The authors provide a cleaned, detailed dataset of Wikipedia's internal links over 17 years, focusing on editor-added links and addressing challenges like redirects, which was not previously available.

## Key findings

- Dataset covers 9 languages and 17 years.
- Links are filtered to include only editor-added links.
- Descriptive statistics and research opportunities are presented.

## Abstract

Wikipedia articles contain multiple links connecting a subject to other pages of the encyclopedia. In Wikipedia parlance, these links are called internal links or wikilinks. We present a complete dataset of the network of internal Wikipedia links for the $9$ largest language editions. The dataset contains yearly snapshots of the network and spans $17$ years, from the creation of Wikipedia in 2001 to March 1st, 2018. While previous work has mostly focused on the complete hyperlink graph which includes also links automatically generated by templates, we parsed each revision of each article to track links appearing in the main text. In this way we obtained a cleaner network, discarding more than half of the links and representing all and only the links intentionally added by editors. We describe in detail how the Wikipedia dumps have been processed and the challenges we have encountered, including the need to handle special pages such as redirects, i.e., alternative article titles. We present descriptive statistics of several snapshots of this network. Finally, we propose several research opportunities that can be explored using this new dataset.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1902.04298/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/1902.04298/full.md

## References

39 references — full list in the complete paper: https://tomesphere.com/paper/1902.04298/full.md

---
Source: https://tomesphere.com/paper/1902.04298