Average shortest-path length in word-adjacency networks: Chinese versus English
Jakub Dec, Micha{\l} Dolina, Stanis{\l}aw Dro\.zd\.z, Jaros{\l}aw Kwapie\'n, Jin Liu, Tomasz Stanisz

TL;DR
This study compares the topology of word-adjacency networks in Chinese and English literary texts, including punctuation, revealing language-specific differences in network properties and proposing a model to explain these patterns.
Contribution
It introduces a novel analysis of punctuation in word-adjacency networks and compares Chinese and English literary works, providing insights into language-specific network behaviors.
Findings
Including punctuation makes the average shortest path length similar in both languages.
Neglecting punctuation results in larger path lengths for Chinese texts.
A growing network model accurately approximates empirical results.
Abstract
Complex networks provide powerful tools for analyzing and understanding the intricate structures present in various systems, including natural language. Here, we analyze topology of growing word-adjacency networks constructed from Chinese and English literary works written in different periods. Unconventionally, instead of considering dictionary words only, we also include punctuation marks as if they were ordinary words. Our approach is based on two arguments: (1) punctuation carries genuine information related to emotional state, allows for logical grouping of content, provides a pause in reading, and facilitates understanding by avoiding ambiguity, and (2) our previous works have shown that punctuation marks behave like words in a Zipfian analysis and, if considered together with regular words, can improve authorship attribution in stylometric studies. We focus on a functional…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAuthorship Attribution and Profiling · Topic Modeling · Advanced Graph Neural Networks
