Who Finds This Voice Attractive? A Large-Scale Experiment Using In-the-Wild Data
Hitoshi Suda, Aya Watanabe, Shinnosuke Takamichi

TL;DR
This study presents CocoNut-Humoresque, a large-scale, open-source speech likability corpus with listener ratings and speaker attributes, enabling analysis of factors influencing voice attractiveness.
Contribution
It introduces a new extensive dataset for voice likability research, including diverse speaker attributes and listener ratings, and provides initial analysis of biases and acoustic correlations.
Findings
Gender and age biases in voice likability identified.
Correlation between fundamental frequency, x-vectors, and likability analyzed.
Dataset enables large-scale statistical analysis of voice attractiveness.
Abstract
This paper introduces CocoNut-Humoresque, an open-source large-scale speech likability corpus that includes speech segments and their per-listener likability scores. Evaluating voice likability is essential to designing preferable voices for speech systems, such as dialogue or announcement systems. In this study, we let 885 listeners rate 1800 speech segments of a wide range of speakers regarding their likability. When constructing the corpus, we also collected the multiple speaker attributes: genders, ages, and favorite YouTube videos. Therefore, the corpus enables the large-scale statistical analysis of voice likability regarding both speaker and listener factors. This paper describes the construction methodology and preliminary data analysis to reveal the gender and age biases in voice likability. In addition, the relationship between the likability and two acoustic features, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems
