Bridging the prosody GAP: Genetic Algorithm with People to efficiently sample emotional prosody
Pol van Rijn, Harin Lee, Nori Jacoby

TL;DR
This paper introduces 'Genetic Algorithm with People' (GAP), a novel method combining human input with genetic algorithms to efficiently sample and study a broad spectrum of emotional prosody in speech.
Contribution
The paper presents a new human-in-the-loop genetic algorithm approach for sampling emotional prosody, enabling large-scale, language-independent emotional speech data collection.
Findings
GAP efficiently samples a wide range of emotional prosody.
GAP achieves results comparable to existing emotional speech corpora.
GAP supports large-scale, cross-cultural research.
Abstract
The human voice effectively communicates a range of emotions with nuanced variations in acoustics. Existing emotional speech corpora are limited in that they are either (a) highly curated to induce specific emotions with predefined categories that may not capture the full extent of emotional experiences, or (b) entangled in their semantic and prosodic cues, limiting the ability to study these cues separately. To overcome this challenge, we propose a new approach called 'Genetic Algorithm with People' (GAP), which integrates human decision and production into a genetic algorithm. In our design, we allow creators and raters to jointly optimize the emotional prosody over generations. We demonstrate that GAP can efficiently sample from the emotional speech space and capture a broad range of emotions, and show comparable results to state-of-the-art emotional speech corpora. GAP is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Speech and Audio Processing · Speech Recognition and Synthesis
