Top2Vec: Distributed Representations of Topics
Dimo Angelov

TL;DR
Top2Vec is a novel topic modeling method that automatically discovers meaningful topics using joint semantic embeddings of documents and words, avoiding many preprocessing steps required by traditional models.
Contribution
It introduces a new approach that leverages distributed semantic embeddings to find topics without needing predefined number of topics or stop-word lists.
Findings
Top2Vec produces more informative and representative topics than traditional models.
The method automatically determines the number of topics.
It simplifies topic modeling by removing preprocessing requirements.
Abstract
Topic modeling is used for discovering latent semantic structure, usually referred to as topics, in a large collection of documents. The most widely used methods are Latent Dirichlet Allocation and Probabilistic Latent Semantic Analysis. Despite their popularity they have several weaknesses. In order to achieve optimal results they often require the number of topics to be known, custom stop-word lists, stemming, and lemmatization. Additionally these methods rely on bag-of-words representation of documents which ignore the ordering and semantics of words. Distributed representations of documents and words have gained popularity due to their ability to capture semantics of words and documents. We present , which leverages joint document and word semantic embedding to find . This model does not require stop-word lists, stemming or lemmatization,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Advanced Text Analysis Techniques · Natural Language Processing Techniques
MethodsLinear Layer · Multi-Head Attention · Dense Connections · Softmax · Attention Dropout · WordPiece · Dropout · Layer Normalization · Attention Is All You Need · Adam
