Nested Variational Autoencoder for Topic Modeling on Microtexts with Word Vectors
Trung Trinh, Tho Quan, Trung Mai

TL;DR
This paper introduces a nested variational autoencoder that leverages word embeddings and neural networks to improve topic modeling on microtexts, achieving better accuracy and faster runtime than traditional models.
Contribution
The paper presents a novel nested variational autoencoder model that incorporates word vectors for enhanced microtext topic modeling with scalable, efficient inference.
Findings
Improved topic coherence on microtexts.
Faster inference compared to traditional LDA.
Effective use of word embeddings in neural variational models.
Abstract
Most of the information on the Internet is represented in the form of microtexts, which are short text snippets such as news headlines or tweets. These sources of information are abundant, and mining these data could uncover meaningful insights. Topic modeling is one of the popular methods to extract knowledge from a collection of documents; however, conventional topic models such as latent Dirichlet allocation (LDA) are unable to perform well on short documents, mostly due to the scarcity of word co-occurrence statistics embedded in the data. The objective of our research is to create a topic model that can achieve great performances on microtexts while requiring a small runtime for scalability to large datasets. To solve the lack of information of microtexts, we allow our method to take advantage of word embeddings for additional knowledge of relationships between words. For speed and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Computational and Text Analysis Methods · Text and Document Classification Technologies
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Linear Discriminant Analysis
