n-stage Latent Dirichlet Allocation: A Novel Approach for LDA
Zekeriya Anil Guven, Banu Diri, Tolgahan Cakaloglu

TL;DR
This paper introduces n-stage LDA, an enhancement to traditional LDA that reduces dictionary size and improves effectiveness, demonstrated through multilingual studies and available as open-source code.
Contribution
The paper presents a novel n-stage LDA method that improves LDA's efficiency and language independence by reducing dictionary size, with demonstrated effectiveness on English and Turkish datasets.
Findings
Improved topic modeling performance in multilingual datasets
Reduced dictionary size enhances LDA efficiency
Open-source implementation available for broader use
Abstract
Nowadays, data analysis has become a problem as the amount of data is constantly increasing. In order to overcome this problem in textual data, many models and methods are used in natural language processing. The topic modeling field is one of these methods. Topic modeling allows determining the semantic structure of a text document. Latent Dirichlet Allocation (LDA) is the most common method among topic modeling methods. In this article, the proposed n-stage LDA method, which can enable the LDA method to be used more effectively, is explained in detail. The positive effect of the method has been demonstrated by the applied English and Turkish studies. Since the method focuses on reducing the word count in the dictionary, it can be used language-independently. You can access the open-source code of the method and the example: https://github.com/anil1055/n-stage_LDA
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSentiment Analysis and Opinion Mining · Topic Modeling · Text and Document Classification Technologies
MethodsLinear Discriminant Analysis
