Speak Like a Professional: Increasing Speech Intelligibility by   Mimicking Professional Announcer Voice with Voice Conversion

Tuan Vu Ho; Maori Kobayashi; Masato Akagi

arXiv:2206.13021·cs.SD·June 28, 2022

Speak Like a Professional: Increasing Speech Intelligibility by Mimicking Professional Announcer Voice with Voice Conversion

Tuan Vu Ho, Maori Kobayashi, Masato Akagi

PDF

Open Access

TL;DR

This paper enhances speech intelligibility in noisy environments by converting non-professional voices to sound like professional announcers, leveraging voice conversion techniques to improve clarity especially at low SNR levels.

Contribution

It introduces a novel voice conversion approach to mimic professional announcer voices, improving speech clarity in noisy settings, and demonstrates the effectiveness through objective and subjective evaluations.

Findings

01

Converted voices show higher intelligibility than original voices in noise.

02

Professional and non-professional voices form distinct clusters in speaker embedding space.

03

Voice conversion improves speech clarity at low SNR levels.

Abstract

In most of practical scenarios, the announcement system must deliver speech messages in a noisy environment, in which the background noise cannot be cancelled out. The local noise reduces speech intelligibility and increases listening effort of the listener, hence hamper the effectiveness of announcement system. There has been reported that voices of professional announcers are clearer and more comprehensive than that of non-expert speakers in noisy environment. This finding suggests that the speech intelligibility might be related to the speaking style of professional announcer, which can be adapted using voice conversion method. Motivated by this idea, this paper proposes a speech intelligibility enhancement in noisy environment by applying voice conversion method on non-professional voice. We discovered that the professional announcers and non-professional speakers are clusterized…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis