CoVox: A dataset of contrasting vocalizations
Camila Bruder, Pauline Larrouy-Maestri

TL;DR
CoVox is a dataset of vocalizations in different styles performed by the same singers, designed to study vocal variation in singing and speaking.
Contribution
The dataset introduces a novel, ecologically valid stimulus set with controlled variability in vocal styles and high recognition accuracy by listeners.
Findings
Lay listeners accurately recognized vocalization styles with over 69% correct recognition.
Acoustic profiles of the vocalizations clearly differ based on vocal style.
The dataset includes 1320 vocalizations in Brazilian Portuguese with original lyrics.
Abstract
The human voice is remarkably versatile and can vary greatly in sound depending on how it is used. An increasing number of studies have addressed the differences and similarities between the singing and the speaking voice. However, finding adequate stimuli material that is at the same time controlled and ecologically valid is challenging, and most datasets lack variability in terms of vocal styles performed by the same voice. Here, we describe a curated stimulus set of vocalizations where 22 female singers performed the same melody excerpts in three contrasting singing styles (as a lullaby, as a pop song, and as an opera aria) and spoke the text aloud in two speaking styles (as if speaking to an adult or to an infant). All productions were made with the songs’ original lyrics, in Brazilian Portuguese, and with a/lu/sound. This ecologically valid dataset of 1320 vocalizations was…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPhonetics and Phonology Research · Music and Audio Processing · Speech and Audio Processing
