A network approach to topic models

Martin Gerlach; Tiago P. Peixoto; Eduardo G. Altmann

arXiv:1708.01677·stat.ML·July 20, 2018

A network approach to topic models

Martin Gerlach, Tiago P. Peixoto, Eduardo G. Altmann

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel network-based approach to topic modeling by representing texts as bipartite networks and applying community detection methods, which improves upon traditional models like LDA in automatic topic detection and hierarchical clustering.

Contribution

It proposes a new framework linking community detection in networks with topic modeling, enabling automatic number of topics and hierarchical clustering.

Findings

01

SBM approach outperforms LDA in model selection

02

Automatically detects the number of topics

03

Provides hierarchical clustering of words and documents

Abstract

One of the main computational and scientific challenges in the modern age is to extract useful information from unstructured texts. Topic models are one popular machine-learning approach which infers the latent topical structure of a collection of documents. Despite their success --- in particular of its most widely used variant called Latent Dirichlet Allocation (LDA) --- and numerous applications in sociology, history, and linguistics, topic models are known to suffer from severe conceptual and practical problems, e.g. a lack of justification for the Bayesian priors, discrepancies with statistical properties of real texts, and the inability to properly choose the number of topics. Here we obtain a fresh view on the problem of identifying topical structures by relating it to the problem of finding communities in complex networks. This is achieved by representing text corpora as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

martingerlach/hSBM_Topicmodel
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComplex Network Analysis Techniques · Topic Modeling · Computational and Text Analysis Methods

MethodsLinear Discriminant Analysis