Predicting Links on Wikipedia with Anchor Text Information

Robin Brochier; Fr\'ed\'eric B\'echet

arXiv:2105.11734·cs.IR·May 26, 2021

Predicting Links on Wikipedia with Anchor Text Information

Robin Brochier, Fr\'ed\'eric B\'echet

PDF

TL;DR

This paper investigates link prediction in Wikipedia using anchor text information, addressing challenges and evaluating algorithms to improve understanding of how links can be automatically predicted.

Contribution

It introduces a new evaluation methodology, compares multiple algorithms, and proposes baseline models for link prediction based on anchor text in Wikipedia.

Findings

01

Evaluation sampling methodology for link prediction tasks

02

Comparison of several link prediction algorithms

03

Baseline models estimating task difficulty

Abstract

Wikipedia, the largest open-collaborative online encyclopedia, is a corpus of documents bound together by internal hyperlinks. These links form the building blocks of a large network whose structure contains important information on the concepts covered in this encyclopedia. The presence of a link between two articles, materialised by an anchor text in the source page pointing to the target page, can increase readers' understanding of a topic. However, the process of linking follows specific editorial rules to avoid both under-linking and over-linking. In this paper, we study the transductive and the inductive tasks of link prediction on several subsets of the English Wikipedia and identify some key challenges behind automatic linking based on anchor text information. We propose an appropriate evaluation sampling methodology and compare several algorithms. Moreover, we propose baseline…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.