L3Cube-MahaSent-MD: A Multi-domain Marathi Sentiment Analysis Dataset and Transformer Models
Aabha Pingle, Aditya Vyawahare, Isha Joshi, Rahul Tangsali, Raviraj, Joshi

TL;DR
This paper introduces L3Cube-MahaSent-MD, a comprehensive multi-domain Marathi sentiment analysis dataset with around 60,000 samples across four domains, and evaluates transformer models for sentiment classification.
Contribution
It provides the first multi-domain Marathi sentiment dataset and benchmarks transformer models, highlighting the importance of low-resource multi-domain datasets.
Findings
MahaBERT achieved the highest accuracy among models.
Multi-domain datasets improve sentiment analysis in Marathi.
Cross-domain analysis reveals domain-specific challenges.
Abstract
The exploration of sentiment analysis in low-resource languages, such as Marathi, has been limited due to the availability of suitable datasets. In this work, we present L3Cube-MahaSent-MD, a multi-domain Marathi sentiment analysis dataset, with four different domains - movie reviews, general tweets, TV show subtitles, and political tweets. The dataset consists of around 60,000 manually tagged samples covering 3 distinct sentiments - positive, negative, and neutral. We create a sub-dataset for each domain comprising 15k samples. The MahaSent-MD is the first comprehensive multi-domain sentiment analysis dataset within the Indic sentiment landscape. We fine-tune different monolingual and multilingual BERT models on these datasets and report the best accuracy with the MahaBERT model. We also present an extensive in-domain and cross-domain analysis thus highlighting the need for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSentiment Analysis and Opinion Mining · Hate Speech and Cyberbullying Detection · Multimodal Machine Learning Applications
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Layer Normalization · Attention Dropout · WordPiece · Dense Connections · Adam · Residual Connection
