Towards the Synthesis of Non-speech Vocalizations
Enjamamul Hoq, Ifeoma Nwogu

TL;DR
This paper explores the use of the DiffWave framework to generate high-quality, diverse infant cry sounds from noise, demonstrating its effectiveness in unconditional audio synthesis for non-speech vocalizations.
Contribution
It is the first to apply DiffWave to unconditional infant cry sound generation, showcasing its potential for non-speech vocalization synthesis.
Findings
High-fidelity cry sound generation achieved
Diversity of generated sounds maintained
Effective use of two infant cry datasets
Abstract
In this report, we focus on the unconditional generation of infant cry sounds using the DiffWave framework, which has shown great promise in generating high-quality audio from noise. We use two distinct datasets of infant cries: the Baby Chillanto and the deBarbaro cry dataset. These datasets are used to train the DiffWave model to generate new cry sounds that maintain high fidelity and diversity. The focus here is on DiffWave's capability to handle the unconditional generation task.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInfant Health and Development
MethodsFocus
