Predicting Links on Wikipedia with Anchor Text Information
Robin Brochier, Fr\'ed\'eric B\'echet

TL;DR
This paper investigates link prediction in Wikipedia using anchor text information, addressing challenges and evaluating algorithms to improve understanding of how links can be automatically predicted.
Contribution
It introduces a new evaluation methodology, compares multiple algorithms, and proposes baseline models for link prediction based on anchor text in Wikipedia.
Findings
Evaluation sampling methodology for link prediction tasks
Comparison of several link prediction algorithms
Baseline models estimating task difficulty
Abstract
Wikipedia, the largest open-collaborative online encyclopedia, is a corpus of documents bound together by internal hyperlinks. These links form the building blocks of a large network whose structure contains important information on the concepts covered in this encyclopedia. The presence of a link between two articles, materialised by an anchor text in the source page pointing to the target page, can increase readers' understanding of a topic. However, the process of linking follows specific editorial rules to avoid both under-linking and over-linking. In this paper, we study the transductive and the inductive tasks of link prediction on several subsets of the English Wikipedia and identify some key challenges behind automatic linking based on anchor text information. We propose an appropriate evaluation sampling methodology and compare several algorithms. Moreover, we propose baseline…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
