Collaborative thesaurus tagging the Wikipedia way
Jakob Voss

TL;DR
This paper analyzes Wikipedia's category system, revealing it as a hybrid thesaurus combining collaborative tagging and hierarchical classification, with unique structural and statistical properties that distinguish it from other systems.
Contribution
It provides a detailed comparison and analysis of Wikipedia's category system, highlighting its hybrid nature and structural characteristics.
Findings
Wikipedia's category system is a hybrid thesaurus.
It combines collaborative tagging with hierarchical indexing.
Structural analysis shows unique properties of the system.
Abstract
This paper explores the system of categories that is used to classify articles in Wikipedia. It is compared to collaborative tagging systems like del.icio.us and to hierarchical classification like the Dewey Decimal Classification (DDC). Specifics and commonalitiess of these systems of subject indexing are exposed. Analysis of structural and statistical properties (descriptors per record, records per descriptor, descriptor levels) shows that the category system of Wikimedia is a thesaurus that combines collaborative tagging and hierarchical subject indexing in a special way.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWikis in Education and Collaboration · Semantic Web and Ontologies · Natural Language Processing Techniques
