EmoMix-3L: A Code-Mixed Dataset for Bangla-English-Hindi Emotion Detection
Nishat Raihan, Dhiman Goswami, Antara Mahmud, Antonios Anastasopoulos,, Marcos Zampieri

TL;DR
This paper introduces EmoMix-3L, a new multi-label emotion detection dataset with code-mixed Bangla, English, and Hindi, and evaluates models showing MuRIL performs best on this complex multilingual data.
Contribution
The paper presents EmoMix-3L, the first dataset with three-language code-mixed data for emotion detection, and benchmarks multiple models including MuRIL.
Findings
MuRIL outperforms other models on EmoMix-3L.
EmoMix-3L is the first dataset of its kind with three-language code-mixing.
Models show varying effectiveness, with MuRIL achieving the best results.
Abstract
Code-mixing is a well-studied linguistic phenomenon that occurs when two or more languages are mixed in text or speech. Several studies have been conducted on building datasets and performing downstream NLP tasks on code-mixed data. Although it is not uncommon to observe code-mixing of three or more languages, most available datasets in this domain contain code-mixed data from only two languages. In this paper, we introduce EmoMix-3L, a novel multi-label emotion detection dataset containing code-mixed data from three different languages. We experiment with several models on EmoMix-3L and we report that MuRIL outperforms other models on this dataset.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition · Sentiment Analysis and Opinion Mining
