EmoNet-Voice: A Fine-Grained, Expert-Verified Benchmark for Speech Emotion Detection
Christoph Schuhmann, Robert Kaczmarczyk, Gollam Rabby, Felix Friedrich, Maurice Kraus, Kourosh Nadi, Huu Nguyen, Kristian Kersting, S\"oren Auer

TL;DR
EmoNet-Voice introduces a large-scale, ethically-sourced speech emotion dataset and benchmark with fine-grained emotion categories, enabling improved emotion recognition models and addressing limitations of existing datasets.
Contribution
We present EmoNet-Voice, a comprehensive multilingual dataset and benchmark with expert-verified, fine-grained emotions, utilizing synthetic voice generation for ethical data collection.
Findings
High accuracy in detecting high-arousal emotions (e.g., anger: 95%)
Difficulty in distinguishing similar emotions (e.g., sadness vs. distress: 63%)
Models trained on synthetic data generalize well to real datasets
Abstract
Speech emotion recognition (SER) systems are constrained by existing datasets that typically cover only 6-10 basic emotions, lack scale and diversity, and face ethical challenges when collecting sensitive emotional states. We introduce EMONET-VOICE, a comprehensive resource addressing these limitations through two components: (1) EmoNet-Voice Big, a 5,000-hour multilingual pre-training dataset spanning 40 fine-grained emotion categories across 11 voices and 4 languages, and (2) EmoNet-Voice Bench, a rigorously validated benchmark of 4,7k samples with unanimous expert consensus on emotion presence and intensity levels. Using state-of-the-art synthetic voice generation, our privacy-preserving approach enables ethical inclusion of sensitive emotions (e.g., pain, shame) while maintaining controlled experimental conditions. Each sample underwent validation by three psychology experts. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗nineninesix/kani-tts-400m-0.3-ptmodel· 293 dl· ♡ 12293 dl♡ 12
- 🤗nineninesix/kani-tts-400m-kymodel· 134 dl· ♡ 4134 dl♡ 4
- 🤗nineninesix/kani-tts-400m-enmodel· 14k dl· ♡ 3914k dl♡ 39
- 🤗nineninesix/kani-tts-400m-armodel· 826 dl· ♡ 5826 dl♡ 5
- 🤗nineninesix/kani-tts-400m-esmodel· 219 dl· ♡ 1219 dl♡ 1
- 🤗nineninesix/kani-tts-400m-demodel· 190 dl· ♡ 2190 dl♡ 2
- 🤗nineninesix/kani-tts-400m-zhmodel· 67 dl· ♡ 167 dl♡ 1
- 🤗nineninesix/kani-tts-400m-komodel· 30 dl· ♡ 630 dl♡ 6
- 🤗nineninesix/kani-tts-400m-ky-kanimodel· 8 dl· ♡ 18 dl♡ 1
- 🤗Mungert/kani-tts-400m-en-GGUFmodel· 110 dl110 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition · Sentiment Analysis and Opinion Mining · Mental Health via Writing
