Probing the topological properties of complex networks modeling short written texts
Diego R. Amancio

TL;DR
This study investigates whether small text segments retain meaningful topological properties by analyzing subtexts of novels, revealing that short texts can be effectively characterized using complex network methods, often outperforming full texts in authorship recognition.
Contribution
The paper demonstrates that topological properties of texts are stable in short segments and that short texts can be effectively analyzed with complex network techniques, enhancing authorship recognition.
Findings
Topological measurements are stable in short subtexts.
Short texts can achieve similar or better authorship recognition accuracy.
Complex network analysis of short texts can be extended to time-varying networks.
Abstract
In recent years, graph theory has been widely employed to probe several language properties. More specifically, the so-called word adjacency model has been proven useful for tackling several practical problems, especially those relying on textual stylistic analysis. The most common approach to treat texts as networks has simply considered either large pieces of texts or entire books. This approach has certainly worked well -- many informative discoveries have been made this way -- but it raises an uncomfortable question: could there be important topological patterns in small pieces of texts? To address this problem, the topological properties of subtexts sampled from entire books was probed. Statistical analyzes performed on a dataset comprising 50 novels revealed that most of the traditional topological measurements are stable for short subtexts. When the performance of the authorship…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
