Proceedings of the ICML 2022 Expressive Vocalizations Workshop and   Competition: Recognizing, Generating, and Personalizing Vocal Bursts

Alice Baird; Panagiotis Tzirakis; Gauthier Gidel; Marco Jiralerspong,; Eilif B. Muller; Kory Mathewson; Bj\"orn Schuller; Erik Cambria; Dacher; Keltner; Alan Cowen

arXiv:2207.06958·cs.SD·August 17, 2022

Proceedings of the ICML 2022 Expressive Vocalizations Workshop and Competition: Recognizing, Generating, and Personalizing Vocal Bursts

Alice Baird, Panagiotis Tzirakis, Gauthier Gidel, Marco Jiralerspong,, Eilif B. Muller, Kory Mathewson, Bj\"orn Schuller, Erik Cambria, Dacher, Keltner, Alan Cowen

PDF

Open Access

TL;DR

This paper presents the proceedings of the ICML 2022 ExVo competition, which advances understanding and generation of vocal bursts for emotional expression through large-scale datasets and multiple challenging tasks.

Contribution

It introduces a comprehensive dataset and three novel competition tracks focusing on emotion recognition, vocal burst generation, and few-shot learning for vocal emotion classification.

Findings

01

Development of multi-task models for emotion and demographic recognition

02

Generation of vocal bursts conveying ten different emotions

03

Effective few-shot learning models for emotion recognition from vocal bursts

Abstract

This is the Proceedings of the ICML Expressive Vocalization (ExVo) Competition. The ExVo competition focuses on understanding and generating vocal bursts: laughs, gasps, cries, and other non-verbal vocalizations that are central to emotional expression and communication. ExVo 2022, included three competition tracks using a large-scale dataset of 59,201 vocalizations from 1,702 speakers. The first, ExVo-MultiTask, requires participants to train a multi-task model to recognize expressed emotions and demographic traits from vocal bursts. The second, ExVo-Generate, requires participants to train a generative model that produces vocal bursts conveying ten different emotions. The third, ExVo-FewShot, requires participants to leverage few-shot learning incorporating speaker identity to train a model for the recognition of 10 emotions conveyed by vocal bursts.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis