Efficient and Flexible Topic Modeling using Pretrained Embeddings and Bag of Sentences
Johannes Schneider

TL;DR
This paper introduces a novel topic modeling method that uses pre-trained sentence embeddings and a bag of sentences approach, achieving state-of-the-art results with efficient inference and greater flexibility than previous embedding-based models.
Contribution
The authors propose a new topic modeling algorithm combining sentence embeddings with generative models and clustering, offering improved accuracy and flexibility over existing methods.
Findings
Achieves state-of-the-art topic modeling performance.
Provides a fast inference algorithm based on EM and annealing.
Offers greater flexibility for customizing topic-document distributions.
Abstract
Pre-trained language models have led to a new state-of-the-art in many NLP tasks. However, for topic modeling, statistical generative models such as LDA are still prevalent, which do not easily allow incorporating contextual word vectors. They might yield topics that do not align well with human judgment. In this work, we propose a novel topic modeling and inference algorithm. We suggest a bag of sentences (BoS) approach using sentences as the unit of analysis. We leverage pre-trained sentence embeddings by combining generative process models and clustering. We derive a fast inference algorithm based on expectation maximization, hard assignments, and an annealing process. The evaluation shows that our method yields state-of-the art results with relatively little computational demands. Our method is also more flexible compared to prior works leveraging word embeddings, since it provides…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Computational and Text Analysis Methods · Advanced Text Analysis Techniques
MethodsLinear Discriminant Analysis · ALIGN
