Wikipedia information flow analysis reveals the scale-free architecture of the Semantic Space
A.P. Masucci, A. Kalampokis, V.M. Egu\'iluz, E. Hern\'andez-Garc\'ia

TL;DR
This study analyzes Wikipedia's semantic network, revealing a scale-free, small-world architecture with complex topological properties, and introduces a stochastic model that captures these features and linguistic laws like Zipf's law.
Contribution
It provides the first detailed topological analysis of Wikipedia's semantic space and proposes a novel stochastic model explaining its scale-free and linguistic properties.
Findings
Semantic space exhibits scale-free and small-world properties.
Cluster size distribution follows a scale-free pattern.
The proposed model captures key statistical features including Zipf's law.
Abstract
In this paper we extract the topology of the semantic space in its encyclopedic acception, measuring the semantic flow between the different entries of the largest modern encyclopedia, Wikipedia, and thus creating a directed complex network of semantic flows. Notably at the percolation threshold the semantic space is characterised by scale-free behaviour at different levels of complexity and this relates the semantic space to a wide range of biological, social and linguistics phenomena. In particular we find that the cluster size distribution, representing the size of different semantic areas, is scale-free. Moreover the topology of the resulting semantic space is scale-free in the connectivity distribution and displays small-world properties. However its statistical properties do not allow a classical interpretation via a generative model based on a simple multiplicative process. After…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
