XED: A Multilingual Dataset for Sentiment Analysis and Emotion Detection

Emily \"Ohman; Marc P\`amies; Kaisla Kajava; J\"org Tiedemann

arXiv:2011.01612·cs.CL·November 9, 2020·6 cites

XED: A Multilingual Dataset for Sentiment Analysis and Emotion Detection

Emily \"Ohman, Marc P\`amies, Kaisla Kajava, J\"org Tiedemann

PDF

Open Access 1 Repo 1 Datasets

TL;DR

XED is a multilingual, fine-grained emotion dataset with annotations for Finnish, English, and 30 additional languages, enabling improved sentiment analysis and emotion detection across diverse languages.

Contribution

The paper introduces XED, a new multilingual emotion dataset with annotations for multiple languages, including low-resource ones, and evaluates its effectiveness with language-specific models.

Findings

01

XED performs comparably to similar datasets in sentiment analysis.

02

The dataset covers 32 languages, including low-resource ones.

03

Evaluation shows XED's utility for emotion detection across languages.

Abstract

We introduce XED, a multilingual fine-grained emotion dataset. The dataset consists of human-annotated Finnish (25k) and English sentences (30k), as well as projected annotations for 30 additional languages, providing new resources for many low-resource languages. We use Plutchik's core emotions to annotate the dataset with the addition of neutral to create a multilabel multiclass dataset. The dataset is carefully evaluated using language-specific BERT models and SVMs to show that XED performs on par with other similar datasets and is therefore a useful tool for sentiment analysis and emotion detection.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Helsinki-NLP/XED
noneOfficial

Datasets

Helsinki-NLP/xed_en_fi
dataset· 287 dl
287 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSentiment Analysis and Opinion Mining · Topic Modeling · Advanced Text Analysis Techniques

MethodsLinear Layer · Softmax · Dense Connections · WordPiece · Linear Warmup With Linear Decay · Attention Dropout · Weight Decay · Adam · Residual Connection · Dropout