EmoNet-Voice: A Fine-Grained, Expert-Verified Benchmark for Speech Emotion Detection

Christoph Schuhmann; Robert Kaczmarczyk; Gollam Rabby; Felix Friedrich; Maurice Kraus; Kourosh Nadi; Huu Nguyen; Kristian Kersting; S\"oren Auer

arXiv:2506.09827·cs.CL·January 6, 2026

EmoNet-Voice: A Fine-Grained, Expert-Verified Benchmark for Speech Emotion Detection

Christoph Schuhmann, Robert Kaczmarczyk, Gollam Rabby, Felix Friedrich, Maurice Kraus, Kourosh Nadi, Huu Nguyen, Kristian Kersting, S\"oren Auer

PDF

Open Access 10 Models 2 Datasets

TL;DR

EmoNet-Voice introduces a large-scale, ethically-sourced speech emotion dataset and benchmark with fine-grained emotion categories, enabling improved emotion recognition models and addressing limitations of existing datasets.

Contribution

We present EmoNet-Voice, a comprehensive multilingual dataset and benchmark with expert-verified, fine-grained emotions, utilizing synthetic voice generation for ethical data collection.

Findings

01

High accuracy in detecting high-arousal emotions (e.g., anger: 95%)

02

Difficulty in distinguishing similar emotions (e.g., sadness vs. distress: 63%)

03

Models trained on synthetic data generalize well to real datasets

Abstract

Speech emotion recognition (SER) systems are constrained by existing datasets that typically cover only 6-10 basic emotions, lack scale and diversity, and face ethical challenges when collecting sensitive emotional states. We introduce EMONET-VOICE, a comprehensive resource addressing these limitations through two components: (1) EmoNet-Voice Big, a 5,000-hour multilingual pre-training dataset spanning 40 fine-grained emotion categories across 11 voices and 4 languages, and (2) EmoNet-Voice Bench, a rigorously validated benchmark of 4,7k samples with unanimous expert consensus on emotion presence and intensity levels. Using state-of-the-art synthetic voice generation, our privacy-preserving approach enables ethical inclusion of sensitive emotions (e.g., pain, shame) while maintaining controlled experimental conditions. Each sample underwent validation by three psychology experts. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition · Sentiment Analysis and Opinion Mining · Mental Health via Writing