Probabilistic Topic Modelling with Transformer Representations

Arik Reuter; Anton Thielmann; Christoph Weisser; Benjamin S\"afken,; Thomas Kneib

arXiv:2403.03737·cs.LG·March 7, 2024·2 cites

Probabilistic Topic Modelling with Transformer Representations

Arik Reuter, Anton Thielmann, Christoph Weisser, Benjamin S\"afken,, Thomas Kneib

PDF

Open Access 1 Repo

TL;DR

This paper introduces TNTM, a novel probabilistic topic model that leverages transformer embeddings and variational autoencoders to improve topic coherence and diversity.

Contribution

It unifies transformer-based embedding clustering with probabilistic modeling, enhancing inference speed and flexibility in topic modeling.

Findings

01

Achieves state-of-the-art embedding coherence

02

Maintains high topic diversity

03

Offers improved inference speed

Abstract

Topic modelling was mostly dominated by Bayesian graphical models during the last decade. With the rise of transformers in Natural Language Processing, however, several successful models that rely on straightforward clustering approaches in transformer-based embedding spaces have emerged and consolidated the notion of topics as clusters of embedding vectors. We propose the Transformer-Representation Neural Topic Model (TNTM), which combines the benefits of topic representations in transformer-based embedding spaces and probabilistic modelling. Therefore, this approach unifies the powerful and versatile notion of topics based on transformer embeddings with fully probabilistic modelling, as in models such as Latent Dirichlet Allocation (LDA). We utilize the variational autoencoder (VAE) framework for improved inference speed and modelling flexibility. Experimental results show that our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

arikreuter/tntm
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGeographic Information Systems Studies · Data Quality and Management

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings