Representation of texts as complex networks: a mesoscopic approach
Henrique F. de Arruda, Filipi N. Silva, Vanessa Q. Marinho, Diego R., Amancio, Luciano da F. Costa

TL;DR
This paper introduces a mesoscopic network model for text analysis that captures semantic and topical structures at a scale between words and entire documents, enhancing understanding of textual content.
Contribution
The authors propose a novel multi-scale network model representing adjacent paragraphs as nodes, enabling analysis of semantic content beyond traditional co-occurrence methods.
Findings
Network model reveals semantic traits of texts.
Model distinguishes real texts from randomized instances.
Application to 'Alice in Wonderland' demonstrates effectiveness.
Abstract
Statistical techniques that analyze texts, referred to as text analytics, have departed from the use of simple word count statistics towards a new paradigm. Text mining now hinges on a more sophisticated set of methods, including the representations in terms of complex networks. While well-established word-adjacency (co-occurrence) methods successfully grasp syntactical features of written texts, they are unable to represent important aspects of textual data, such as its topical structure, i.e. the sequence of subjects developing at a mesoscopic level along the text. Such aspects are often overlooked by current methodologies. In order to grasp the mesoscopic characteristics of semantical content in written texts, we devised a network model which is able to analyze documents in a multi-scale fashion. In the proposed model, a limited amount of adjacent paragraphs are represented as nodes,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
