EmoBench-UA: A Benchmark Dataset for Emotion Detection in Ukrainian

Daryna Dementieva; Nikolay Babakov; and Alexander Fraser

arXiv:2505.23297·cs.CL·September 29, 2025

EmoBench-UA: A Benchmark Dataset for Emotion Detection in Ukrainian

Daryna Dementieva, Nikolay Babakov, and Alexander Fraser

PDF

Open Access 1 Models 3 Datasets 1 Video

TL;DR

This paper introduces EmoBench-UA, the first annotated dataset for emotion detection in Ukrainian texts, enabling future research in this underexplored area of NLP.

Contribution

It provides a new benchmark dataset for Ukrainian emotion detection and evaluates various approaches, including LLMs and linguistic baselines.

Findings

01

Emotion classification in Ukrainian is challenging.

02

English-trained models perform poorly on Ukrainian data.

03

Highlighting the need for Ukrainian-specific NLP resources.

Abstract

While Ukrainian NLP has seen progress in many texts processing tasks, emotion classification remains an underexplored area with no publicly available benchmark to date. In this work, we introduce EmoBench-UA, the first annotated dataset for emotion detection in Ukrainian texts. Our annotation schema is adapted from the previous English-centric works on emotion detection (Mohammad et al., 2018; Mohammad, 2022) guidelines. The dataset was created through crowdsourcing using the Toloka.ai platform ensuring high-quality of the annotation process. Then, we evaluate a range of approaches on the collected dataset, starting from linguistic-based baselines, synthetic data translated from English, to large language models (LLMs). Our findings highlight the challenges of emotion classification in non-mainstream languages like Ukrainian and emphasize the need for further development of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
ukr-detect/ukr-emotions-classifier
model· 102 dl· ♡ 2
102 dl♡ 2

Datasets

Videos

EmoBench-UA: A Benchmark Dataset for Emotion Detection in Ukrainian· underline

Taxonomy

TopicsSentiment Analysis and Opinion Mining · Mental Health via Writing · Hate Speech and Cyberbullying Detection