XED: A Multilingual Dataset for Sentiment Analysis and Emotion Detection
Emily \"Ohman, Marc P\`amies, Kaisla Kajava, J\"org Tiedemann

TL;DR
XED is a multilingual, fine-grained emotion dataset with annotations for Finnish, English, and 30 additional languages, enabling improved sentiment analysis and emotion detection across diverse languages.
Contribution
The paper introduces XED, a new multilingual emotion dataset with annotations for multiple languages, including low-resource ones, and evaluates its effectiveness with language-specific models.
Findings
XED performs comparably to similar datasets in sentiment analysis.
The dataset covers 32 languages, including low-resource ones.
Evaluation shows XED's utility for emotion detection across languages.
Abstract
We introduce XED, a multilingual fine-grained emotion dataset. The dataset consists of human-annotated Finnish (25k) and English sentences (30k), as well as projected annotations for 30 additional languages, providing new resources for many low-resource languages. We use Plutchik's core emotions to annotate the dataset with the addition of neutral to create a multilabel multiclass dataset. The dataset is carefully evaluated using language-specific BERT models and SVMs to show that XED performs on par with other similar datasets and is therefore a useful tool for sentiment analysis and emotion detection.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSentiment Analysis and Opinion Mining · Topic Modeling · Advanced Text Analysis Techniques
MethodsLinear Layer · Softmax · Dense Connections · WordPiece · Linear Warmup With Linear Decay · Attention Dropout · Weight Decay · Adam · Residual Connection · Dropout
