Analyzing and Visualizing the Semantic Coverage of Wikipedia and Its Authors
Todd Holloway, Miran Bozicevic, Katy B\"orner

TL;DR
This paper analyzes the semantic structure, category age, and author content coverage of English Wikipedia using novel measures, revealing a power-law distribution, clustered semantic organization, and diverse author roles.
Contribution
It introduces new analytical and visualization methods to explore Wikipedia's semantic categories and author contributions, providing insights into its structure and content coverage.
Findings
Category co-occurrences follow a power-law distribution
Semantic structure is highly clustered
Authors have diverse roles and coverage
Abstract
This paper presents a novel analysis and visualization of English Wikipedia data. Our specific interest is the analysis of basic statistics, the identification of the semantic structure and age of the categories in this free online encyclopedia, and the content coverage of its highly productive authors. The paper starts with an introduction of Wikipedia and a review of related work. We then introduce a suite of measures and approaches to analyze and map the semantic structure of Wikipedia. The results show that co-occurrences of categories within individual articles have a power-law distribution, and when mapped reveal the nicely clustered semantic structure of Wikipedia. The results also reveal the content coverage of the article's authors, although the roles these authors play are as varied as the authors themselves. We conclude with a discussion of major results and planned future…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWikis in Education and Collaboration · Natural Language Processing Techniques · Software Engineering Research
