Towards the Synthesis of Non-speech Vocalizations

Enjamamul Hoq; Ifeoma Nwogu

arXiv:2410.09360·cs.SD·October 15, 2024

Towards the Synthesis of Non-speech Vocalizations

Enjamamul Hoq, Ifeoma Nwogu

PDF

Open Access

TL;DR

This paper explores the use of the DiffWave framework to generate high-quality, diverse infant cry sounds from noise, demonstrating its effectiveness in unconditional audio synthesis for non-speech vocalizations.

Contribution

It is the first to apply DiffWave to unconditional infant cry sound generation, showcasing its potential for non-speech vocalization synthesis.

Findings

01

High-fidelity cry sound generation achieved

02

Diversity of generated sounds maintained

03

Effective use of two infant cry datasets

Abstract

In this report, we focus on the unconditional generation of infant cry sounds using the DiffWave framework, which has shown great promise in generating high-quality audio from noise. We use two distinct datasets of infant cries: the Baby Chillanto and the deBarbaro cry dataset. These datasets are used to train the DiffWave model to generate new cry sounds that maintain high fidelity and diversity. The focus here is on DiffWave's capability to handle the unconditional generation task.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInfant Health and Development

MethodsFocus