TL;DR
This paper introduces a novel framework that leverages pretrained language model embeddings for more coherent and diverse topic discovery, overcoming limitations of traditional topic models by jointly modeling topics and documents in a latent space.
Contribution
The paper proposes a joint latent space learning and clustering approach based on PLM embeddings, providing a simpler yet more effective alternative to traditional topic models.
Findings
Generated topics are more coherent and diverse.
Model outperforms traditional topic models on benchmark datasets.
Provides better topic-wise document representations.
Abstract
Topic models have been the prominent tools for automatic topic discovery from text corpora. Despite their effectiveness, topic models suffer from several limitations including the inability of modeling word ordering information in documents, the difficulty of incorporating external linguistic knowledge, and the lack of both accurate and efficient inference methods for approximating the intractable posterior. Recently, pretrained language models (PLMs) have brought astonishing performance improvements to a wide variety of tasks due to their superior representations of text. Interestingly, there have not been standard approaches to deploy PLMs for topic discovery as better alternatives to topic models. In this paper, we begin by analyzing the challenges of using PLM representations for topic discovery, and then propose a joint latent space learning and clustering framework built upon PLM…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
