Anchor Prediction: A Topic Modeling Approach

Jean Dupuy; Adrien Guille; Julien Jacques

arXiv:2205.14631·cs.CL·June 2, 2022

Anchor Prediction: A Topic Modeling Approach

Jean Dupuy, Adrien Guille, Julien Jacques

PDF

TL;DR

This paper introduces CRTM, a novel topic modeling approach for automatically predicting hyperlinks in documents by modeling local context and content, improving hyperlink annotation without external resources.

Contribution

The paper presents CRTM, a new relational topic model specifically designed for anchor prediction, addressing a unique task distinct from traditional link prediction.

Findings

01

CRTM effectively predicts anchors in Wikipedia articles across multiple languages.

02

The model outperforms baseline methods in anchor prediction accuracy.

03

Experiments demonstrate practical usefulness in real-world document networks.

Abstract

Networks of documents connected by hyperlinks, such as Wikipedia, are ubiquitous. Hyperlinks are inserted by the authors to enrich the text and facilitate the navigation through the network. However, authors tend to insert only a fraction of the relevant hyperlinks, mainly because this is a time consuming task. In this paper we address an annotation, which we refer to as anchor prediction. Even though it is conceptually close to link prediction or entity linking, it is a different task that require developing a specific method to solve it. Given a source document and a target document, this task consists in automatically identifying anchors in the source document, i.e words or terms that should carry a hyperlink pointing towards the target document. We propose a contextualized relational topic model, CRTM, that models directed links between documents as a function of the local context…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.