Optimization Techniques for a Physical Model of Human Vocalisation

Mateo C\'amara; Zhiyuan Xu; Yisu Zong; Jos\'e Luis Blanco; Joshua D.; Reiss

arXiv:2309.14761·eess.AS·September 27, 2023

Optimization Techniques for a Physical Model of Human Vocalisation

Mateo C\'amara, Zhiyuan Xu, Yisu Zong, Jos\'e Luis Blanco, Joshua D., Reiss

PDF

Open Access

TL;DR

This paper explores optimization methods for tuning a speech production model to accurately synthesize non-speech sounds like yawns, comparing traditional and neural approaches for effectiveness and efficiency.

Contribution

It introduces a systematic evaluation of various optimization techniques, including neural networks, for parameter tuning of a vocal tract model to replicate non-speech sounds.

Findings

01

Genetic and swarm algorithms outperform least squares in accuracy.

02

Genetic and swarm algorithms are slower but more effective.

03

Optimizer and audio representation combinations significantly affect results.

Abstract

We present a non-supervised approach to optimize and evaluate the synthesis of non-speech audio effects from a speech production model. We use the Pink Trombone synthesizer as a case study of a simplified production model of the vocal tract to target non-speech human audio signals --yawnings. We selected and optimized the control parameters of the synthesizer to minimize the difference between real and generated audio. We validated the most common optimization techniques reported in the literature and a specifically designed neural network. We evaluated several popular quality metrics as error functions. These include both objective quality metrics and subjective-equivalent metrics. We compared the results in terms of total error and computational demand. Results show that genetic and swarm optimizers outperform least squares algorithms at the cost of executing slower and that specific…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Music Technology and Sound Studies