Topic Modeling in Marathi
Sanket Shinde, Raviraj Joshi

TL;DR
This paper explores and compares different topic modeling approaches for Marathi, an Indic language, highlighting that BERT-based models, especially BERTopic with Indic-trained BERT, outperform traditional LDA methods.
Contribution
It provides a comparative analysis of BERT and non-BERT topic modeling approaches specifically tailored for Marathi, addressing a gap in Indic language NLP research.
Findings
BERTopic with Indic BERT outperforms LDA in topic coherence and diversity
Multilingual BERT models show competitive performance for Marathi
The study offers insights into effective NLP techniques for low-resource Indic languages
Abstract
While topic modeling in English has become a prevalent and well-explored area, venturing into topic modeling for Indic languages remains relatively rare. The limited availability of resources, diverse linguistic structures, and unique challenges posed by Indic languages contribute to the scarcity of research and applications in this domain. Despite the growing interest in natural language processing and machine learning, there exists a noticeable gap in the comprehensive exploration of topic modeling methodologies tailored specifically for languages such as Hindi, Marathi, Tamil, and others. In this paper, we examine several topic modeling approaches applied to the Marathi language. Specifically, we compare various BERT and non-BERT approaches, including multilingual and monolingual BERT models, using topic coherence and topic diversity as evaluation metrics. Our analysis provides…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational and Text Analysis Methods · Sentiment Analysis and Opinion Mining · Text and Document Classification Technologies
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Adam · Softmax · Linear Warmup With Linear Decay · Dropout · Weight Decay · WordPiece · Attention Dropout · Layer Normalization
