Topic segmentation via community detection in complex networks

Henrique F. de Arruda; Luciano da F. Costa; Diego R. Amancio

arXiv:1512.01384·cs.CL·June 28, 2016

Topic segmentation via community detection in complex networks

Henrique F. de Arruda, Luciano da F. Costa, Diego R. Amancio

PDF

TL;DR

This paper introduces a new network-based method for topic segmentation in texts by detecting communities of semantically related words, outperforming traditional approaches in identifying relevant topics.

Contribution

The paper proposes a novel semantic network representation that captures word relationships to improve topic detection in texts, demonstrated on Wikipedia articles.

Findings

01

Semantic networks reveal communities of related words

02

Method outperforms bag-of-words in topic segmentation

03

High-level semantic representation enhances text analysis

Abstract

Many real systems have been modelled in terms of network concepts, and written texts are a particular example of information networks. In recent years, the use of network methods to analyze language has allowed the discovery of several interesting findings, including the proposition of novel models to explain the emergence of fundamental universal patterns. While syntactical networks, one of the most prevalent networked models of written texts, display both scale-free and small-world properties, such representation fails in capturing other textual features, such as the organization in topics or subjects. In this context, we propose a novel network representation whose main purpose is to capture the semantical relationships of words in a simple way. To do so, we link all words co-occurring in the same semantic context, which is defined in a threefold way. We show that the proposed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.