Complex Networks Measures for Differentiation between Normal and Shuffled Croatian Texts
Domagoj Margan, Ana Me\v{s}trovi\'c, Sanda Martin\v{c}i\'c-Ip\v{s}i\'c

TL;DR
This study investigates Croatian text properties using complex networks, comparing original and shuffled texts, and finds that node selectivity measures can distinguish between the two, despite similar degree and strength distributions.
Contribution
The paper demonstrates that node selectivity in complex networks effectively differentiates original Croatian texts from shuffled versions, highlighting its potential for linguistic analysis.
Findings
Degree distributions are similar in original and shuffled texts.
Strength distributions are preserved due to consistent word frequencies.
Node selectivity values are lower in shuffled texts, enabling differentiation.
Abstract
This paper studies the properties of the Croatian texts via complex networks. We present network properties of normal and shuffled Croatian texts for different shuffling principles: on the sentence level and on the text level. In both experiments we preserved the vocabulary size, word and sentence frequency distributions. Additionally, in the first shuffling approach we preserved the sentence structure of the text and the number of words per sentence. Obtained results showed that degree rank distributions exhibit no substantial deviation in shuffled networks, and strength rank distributions are preserved due to the same word frequencies. Therefore, standard approach to study the structure of linguistic co-occurrence networks showed no clear difference among the topologies of normal and shuffled texts. Finally, we showed that the in- and out- selectivity values from shuffled texts are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
