Vocal effort modeling in neural TTS for improving the intelligibility of   synthetic speech in noise

Tuomo Raitio; Petko Petkov; Jiangchuan Li; Muhammed Shifas; Andrea; Davis; Yannis Stylianou

arXiv:2203.10637·eess.AS·March 30, 2022·1 cites

Vocal effort modeling in neural TTS for improving the intelligibility of synthetic speech in noise

Tuomo Raitio, Petko Petkov, Jiangchuan Li, Muhammed Shifas, Andrea, Davis, Yannis Stylianou

PDF

Open Access

TL;DR

This paper introduces a neural TTS approach that models vocal effort variation to enhance synthetic speech intelligibility in noisy environments, by controlling spectral tilt and extrapolating effort levels.

Contribution

It presents a novel spectral tilt conditioning method for neural TTS that enables independent vocal effort control and improves speech intelligibility in noise.

Findings

01

Enhanced intelligibility in noisy conditions

02

Maintained speech quality with effort control

03

Outperformed existing speech enhancement algorithms

Abstract

We present a neural text-to-speech (TTS) method that models natural vocal effort variation to improve the intelligibility of synthetic speech in the presence of noise. The method consists of first measuring the spectral tilt of unlabeled conventional speech data, and then conditioning a neural TTS model with normalized spectral tilt among other prosodic factors. Changing the spectral tilt parameter and keeping other prosodic factors unchanged enables effective vocal effort control at synthesis time independent of other prosodic factors. By extrapolation of the spectral tilt values beyond what has been seen in the original data, we can generate speech with high vocal effort levels, thus improving the intelligibility of speech in the presence of masking noise. We evaluate the intelligibility and quality of normal speech and speech with increased vocal effort in the presence of various…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Acoustic Wave Phenomena Research