Topic segmentation via community detection in complex networks
Henrique F. de Arruda, Luciano da F. Costa, Diego R. Amancio

TL;DR
This paper introduces a new network-based method for topic segmentation in texts by detecting communities of semantically related words, outperforming traditional approaches in identifying relevant topics.
Contribution
The paper proposes a novel semantic network representation that captures word relationships to improve topic detection in texts, demonstrated on Wikipedia articles.
Findings
Semantic networks reveal communities of related words
Method outperforms bag-of-words in topic segmentation
High-level semantic representation enhances text analysis
Abstract
Many real systems have been modelled in terms of network concepts, and written texts are a particular example of information networks. In recent years, the use of network methods to analyze language has allowed the discovery of several interesting findings, including the proposition of novel models to explain the emergence of fundamental universal patterns. While syntactical networks, one of the most prevalent networked models of written texts, display both scale-free and small-world properties, such representation fails in capturing other textual features, such as the organization in topics or subjects. In this context, we propose a novel network representation whose main purpose is to capture the semantical relationships of words in a simple way. To do so, we link all words co-occurring in the same semantic context, which is defined in a threefold way. We show that the proposed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
