Topic Discovery via Latent Space Clustering of Pretrained Language Model   Representations

Yu Meng; Yunyi Zhang; Jiaxin Huang; Yu Zhang; Jiawei Han

arXiv:2202.04582·cs.CL·February 10, 2022

Topic Discovery via Latent Space Clustering of Pretrained Language Model Representations

Yu Meng, Yunyi Zhang, Jiaxin Huang, Yu Zhang, Jiawei Han

PDF

1 Repo

TL;DR

This paper introduces a novel framework that leverages pretrained language model embeddings for more coherent and diverse topic discovery, overcoming limitations of traditional topic models by jointly modeling topics and documents in a latent space.

Contribution

The paper proposes a joint latent space learning and clustering approach based on PLM embeddings, providing a simpler yet more effective alternative to traditional topic models.

Findings

01

Generated topics are more coherent and diverse.

02

Model outperforms traditional topic models on benchmark datasets.

03

Provides better topic-wise document representations.

Abstract

Topic models have been the prominent tools for automatic topic discovery from text corpora. Despite their effectiveness, topic models suffer from several limitations including the inability of modeling word ordering information in documents, the difficulty of incorporating external linguistic knowledge, and the lack of both accurate and efficient inference methods for approximating the intractable posterior. Recently, pretrained language models (PLMs) have brought astonishing performance improvements to a wide variety of tasks due to their superior representations of text. Interestingly, there have not been standard approaches to deploy PLMs for topic discovery as better alternatives to topic models. In this paper, we begin by analyzing the challenges of using PLM representations for topic discovery, and then propose a joint latent space learning and clustering framework built upon PLM…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yumeng5/topclus
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.