Complex networks analysis of language complexity
Diego R. Amancio, Sandra M. Aluisio, Osvaldo N. Oliveira Jr. and, Luciano da F. Costa

TL;DR
This study uses complex network analysis from statistical physics to quantify and distinguish levels of textual complexity, showing that simpler texts have more regular and interconnected network structures.
Contribution
It introduces a novel application of complex network metrics combined with pattern recognition to classify text complexity levels, including the generation of simplified text versions.
Findings
Topological regularity correlates negatively with textual complexity
Simpler texts have decreased distances between concepts in the network
Pattern recognition effectively distinguishes original from simplified texts
Abstract
Methods from statistical physics, such as those involving complex networks, have been increasingly used in quantitative analysis of linguistic phenomena. In this paper, we represented pieces of text with different levels of simplification in co-occurrence networks and found that topological regularity correlated negatively with textual complexity. Furthermore, in less complex texts the distance between concepts, represented as nodes, tended to decrease. The complex networks metrics were treated with multivariate pattern recognition techniques, which allowed us to distinguish between original texts and their simplified versions. For each original text, two simplified versions were generated manually with increasing number of simplification operations. As expected, distinction was easier for the strongly simplified versions, where the most relevant metrics were node strength, shortest…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
