Semantic-Driven Topic Modeling Using Transformer-Based Embeddings and Clustering Algorithms
Melkamu Abay Mersha, Mesay Gemeda yigezu, Jugal Kalita

TL;DR
This paper presents a novel semantic-driven topic modeling approach that leverages transformer-based embeddings and clustering to extract more coherent and meaningful topics from document collections.
Contribution
It introduces an end-to-end method combining transformer embeddings with clustering, improving topic coherence over traditional models.
Findings
Outperforms traditional topic models in coherence.
Uses transformer embeddings for better semantic capture.
Produces more meaningful topics than ChatGPT-based methods.
Abstract
Topic modeling is a powerful technique to discover hidden topics and patterns within a collection of documents without prior knowledge. Traditional topic modeling and clustering-based techniques encounter challenges in capturing contextual semantic information. This study introduces an innovative end-to-end semantic-driven topic modeling technique for the topic extraction process, utilizing advanced word and document embeddings combined with a powerful clustering algorithm. This semantic-driven approach represents a significant advancement in topic modeling methodologies. It leverages contextual semantic information to extract coherent and meaningful topics. Specifically, our model generates document embeddings using pre-trained transformer-based language models, reduces the dimensions of the embeddings, clusters the embeddings based on semantic similarity, and generates coherent topics…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational and Text Analysis Methods
