TL;DR
This paper introduces a language-agnostic method for classifying Wikipedia articles into topics using article links, enabling broad, cross-language analysis of Wikipedia content without language-specific adjustments.
Contribution
The authors propose a novel link-based, language-agnostic classification approach that outperforms previous methods in coverage while maintaining comparable accuracy.
Findings
Matches the performance of language-dependent methods
Provides much greater coverage across Wikipedia articles and languages
Applicable to almost any Wikipedia article regardless of language
Abstract
A major challenge for many analyses of Wikipedia dynamics -- e.g., imbalances in content quality, geographic differences in what content is popular, what types of articles attract more editor discussion -- is grouping the very diverse range of Wikipedia articles into coherent, consistent topics. This problem has been addressed using various approaches based on Wikipedia's category network, WikiProjects, and external taxonomies. However, these approaches have always been limited in their coverage: typically, only a small subset of articles can be classified, or the method cannot be applied across (the more than 300) languages on Wikipedia. In this paper, we propose a language-agnostic approach based on the links in an article for classifying articles into a taxonomy of topics that can be easily applied to (almost) any language and article on Wikipedia. We show that it matches the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
