Speak Like a Professional: Increasing Speech Intelligibility by Mimicking Professional Announcer Voice with Voice Conversion
Tuan Vu Ho, Maori Kobayashi, Masato Akagi

TL;DR
This paper enhances speech intelligibility in noisy environments by converting non-professional voices to sound like professional announcers, leveraging voice conversion techniques to improve clarity especially at low SNR levels.
Contribution
It introduces a novel voice conversion approach to mimic professional announcer voices, improving speech clarity in noisy settings, and demonstrates the effectiveness through objective and subjective evaluations.
Findings
Converted voices show higher intelligibility than original voices in noise.
Professional and non-professional voices form distinct clusters in speaker embedding space.
Voice conversion improves speech clarity at low SNR levels.
Abstract
In most of practical scenarios, the announcement system must deliver speech messages in a noisy environment, in which the background noise cannot be cancelled out. The local noise reduces speech intelligibility and increases listening effort of the listener, hence hamper the effectiveness of announcement system. There has been reported that voices of professional announcers are clearer and more comprehensive than that of non-expert speakers in noisy environment. This finding suggests that the speech intelligibility might be related to the speaking style of professional announcer, which can be adapted using voice conversion method. Motivated by this idea, this paper proposes a speech intelligibility enhancement in noisy environment by applying voice conversion method on non-professional voice. We discovered that the professional announcers and non-professional speakers are clusterized…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis
