Discovering topics with neural topic models built from PLSA assumptions
Sileye 0. Ba

TL;DR
This paper introduces neural topic models based on PLSA assumptions that effectively discover topics in large text corpora, outperforming traditional LDA models in perplexity and scalability.
Contribution
The paper extends PLSA-based neural models with auto-encoder document embeddings for improved scalability and demonstrates superior performance over LDA on multiple datasets.
Findings
Neural topic models outperform LDA in perplexity.
Auto-encoder embeddings improve scalability for large corpora.
Models effectively capture relevant topics across diverse datasets.
Abstract
In this paper we present a model for unsupervised topic discovery in texts corpora. The proposed model uses documents, words, and topics lookup table embedding as neural network model parameters to build probabilities of words given topics, and probabilities of topics given documents. These probabilities are used to recover by marginalization probabilities of words given documents. For very large corpora where the number of documents can be in the order of billions, using a neural auto-encoder based document embedding is more scalable then using a lookup table embedding as classically done. We thus extended the lookup based document embedding model to continuous auto-encoder based model. Our models are trained using probabilistic latent semantic analysis (PLSA) assumptions. We evaluated our models on six datasets with a rich variety of contents. Conducted experiments demonstrate that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Advanced Text Analysis Techniques · Natural Language Processing Techniques
