LlamaFur: Learning Latent Category Matrix to Find Unexpected Relations in Wikipedia
Paolo Boldi, Corrado Monti

TL;DR
LlamaFur is a novel method that learns a latent category matrix to identify surprising links in Wikipedia, outperforming existing techniques in accuracy and efficiency by modeling hyperlink relations through category-based link prediction.
Contribution
It introduces a latent category matrix learned via an online Passive-Aggressive algorithm to discover unexpected links efficiently in large hyperlinked datasets.
Findings
Outperforms existing text-based link prediction methods in accuracy.
Processes large graphs with 10^8 links in less than 10 minutes.
Provides higher precision at low recall levels compared to standard link prediction.
Abstract
Besides finding trends and unveiling typical patterns, modern information retrieval is increasingly more interested in the discovery of surprising information in textual datasets. In this work we focus on finding "unexpected links" in hyperlinked document corpora when documents are assigned to categories. To achieve this goal, we model the hyperlinks graph through node categories: the presence of an arc is fostered or discouraged by the categories of the head and the tail of the arc. Specifically, we determine a latent category matrix that explains common links. The matrix is built using a margin-based online learning algorithm (Passive-Aggressive), which makes us able to process graphs with links in less than minutes. We show that our method provides better accuracy than most existing text-based techniques, with higher efficiency and relying on a much smaller amount of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Text and Document Classification Technologies · Wikis in Education and Collaboration
