ANUBHUTI: A Comprehensive Corpus For Sentiment Analysis In Bangla Regional Languages
Swastika Kundu, Autoshi Ibrahim, Mithila Rahman, Tanvir Ahmed

TL;DR
This paper introduces ANUBHUTI, a large, annotated corpus of 10,000 sentences in Bangla regional dialects, designed to advance sentiment analysis in low-resource dialects with diverse content and high annotation quality.
Contribution
The creation of a comprehensive, annotated dataset for Bangla regional dialects, including dual thematic and emotion annotations, with quality assurance for sentiment analysis research.
Findings
High inter-annotator agreement achieved
Dataset covers diverse socio-political content
Enables improved sentiment analysis in Bangla dialects
Abstract
Sentiment analysis for regional dialects of Bangla remains an underexplored area due to linguistic diversity and limited annotated data. This paper introduces ANUBHUTI, a comprehensive dataset consisting of 10,000 sentences manually translated from standard Bangla into four major regional dialects Mymensingh, Noakhali, Sylhet, and Chittagong. The dataset predominantly features political and religious content, reflecting the contemporary socio political landscape of Bangladesh, alongside neutral texts to maintain balance. Each sentence is annotated using a dual annotation scheme: multiclass thematic labeling categorizes sentences as Political, Religious, or Neutral, and multilabel emotion annotation assigns one or more emotions from Anger, Contempt, Disgust, Enjoyment, Fear, Sadness, and Surprise. Expert native translators conducted the translation and annotation, with quality assurance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSentiment Analysis and Opinion Mining · Computational and Text Analysis Methods · Hate Speech and Cyberbullying Detection
