Taxonomy and clustering in collaborative systems: the case of the   on-line encyclopedia Wikipedia

A. Capocci; F. Rao; G. Caldarelli

arXiv:0710.3058·physics.soc-ph·November 13, 2009

Taxonomy and clustering in collaborative systems: the case of the on-line encyclopedia Wikipedia

A. Capocci, F. Rao, G. Caldarelli

PDF

TL;DR

This paper compares imposed classifications and algorithmically detected communities in Wikipedia, revealing similar statistical distributions but different article groupings, highlighting the complexity of clustering in scale-free networks.

Contribution

It demonstrates the statistical similarity between top-down and bottom-up clustering methods in Wikipedia, emphasizing the limitations of power-law distributions as benchmarks.

Findings

01

Community size distributions are statistically similar across methods.

02

Different clustering results suggest power laws are not sufficient for evaluating clustering quality.

03

Power-law behavior is a general feature, not a definitive indicator of clustering validity.

Abstract

In this paper we investigate the nature and structure of the relation between imposed classifications and real clustering in a particular case of a scale-free network given by the on-line encyclopedia Wikipedia. We find a statistical similarity in the distributions of community sizes both by using the top-down approach of the categories division present in the archive and in the bottom-up procedure of community detection given by an algorithm based on the spectral properties of the graph. Regardless the statistically similar behaviour the two methods provide a rather different division of the articles, thereby signaling that the nature and presence of power laws is a general feature for these systems and cannot be used as a benchmark to evaluate the suitability of a clustering method.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.