Probabilistic Topic Modelling with Transformer Representations
Arik Reuter, Anton Thielmann, Christoph Weisser, Benjamin S\"afken,, Thomas Kneib

TL;DR
This paper introduces TNTM, a novel probabilistic topic model that leverages transformer embeddings and variational autoencoders to improve topic coherence and diversity.
Contribution
It unifies transformer-based embedding clustering with probabilistic modeling, enhancing inference speed and flexibility in topic modeling.
Findings
Achieves state-of-the-art embedding coherence
Maintains high topic diversity
Offers improved inference speed
Abstract
Topic modelling was mostly dominated by Bayesian graphical models during the last decade. With the rise of transformers in Natural Language Processing, however, several successful models that rely on straightforward clustering approaches in transformer-based embedding spaces have emerged and consolidated the notion of topics as clusters of embedding vectors. We propose the Transformer-Representation Neural Topic Model (TNTM), which combines the benefits of topic representations in transformer-based embedding spaces and probabilistic modelling. Therefore, this approach unifies the powerful and versatile notion of topics based on transformer embeddings with fully probabilistic modelling, as in models such as Latent Dirichlet Allocation (LDA). We utilize the variational autoencoder (VAE) framework for improved inference speed and modelling flexibility. Experimental results show that our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGeographic Information Systems Studies · Data Quality and Management
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
