L3Cube-MahaEmotions: A Marathi Emotion Recognition Dataset with Synthetic Annotations using CoTR prompting and Large Language Models
Nidhi Kowtal, Raviraj Joshi

TL;DR
This paper introduces L3Cube-MahaEmotions, a Marathi emotion recognition dataset with synthetic annotations generated via large language models, demonstrating that generic LLMs outperform fine-tuned BERT models in low-resource emotion recognition tasks.
Contribution
The work presents a new Marathi emotion dataset with synthetic annotations using Chain-of-Translation prompting and evaluates LLMs versus BERT models, highlighting the effectiveness of large language models.
Findings
GPT-4 annotations outperform fine-tuned BERT models.
Synthetic labels from LLMs can effectively train emotion recognition models.
Large language models generalize better than BERT in low-resource emotion tasks.
Abstract
Emotion recognition in low-resource languages like Marathi remains challenging due to limited annotated data. We present L3Cube-MahaEmotions, a high-quality Marathi emotion recognition dataset with 11 fine-grained emotion labels. The training data is synthetically annotated using large language models (LLMs), while the validation and test sets are manually labeled to serve as a reliable gold-standard benchmark. Building on the MahaSent dataset, we apply the Chain-of-Translation (CoTR) prompting technique, where Marathi sentences are translated into English and emotion labeled via a single prompt. GPT-4 and Llama3-405B were evaluated, with GPT-4 selected for training data annotation due to superior label quality. We evaluate model performance using standard metrics and explore label aggregation strategies (e.g., Union, Intersection). While GPT-4 predictions outperform fine-tuned BERT…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSentiment Analysis and Opinion Mining · Emotion and Mood Recognition
