Enhancing Speech Intelligibility in Text-To-Speech Synthesis using   Speaking Style Conversion

Dipjyoti Paul; Muhammed PV Shifas; Yannis Pantazis; Yannis Stylianou

arXiv:2008.05809·cs.SD·August 14, 2020

Enhancing Speech Intelligibility in Text-To-Speech Synthesis using Speaking Style Conversion

Dipjyoti Paul, Muhammed PV Shifas, Yannis Pantazis, Yannis Stylianou

PDF

1 Repo

TL;DR

This paper introduces a novel TTS system that enhances speech intelligibility in noisy environments by combining speaking style conversion with spectral shaping techniques, achieving significant improvements over existing methods.

Contribution

The study proposes a Lombard-SSDRC TTS system that integrates style transfer and spectral shaping, significantly improving intelligibility in noisy conditions compared to prior approaches.

Findings

01

110-130% improvement in SSIB-Gauss in noise

02

47-140% improvement in competing-speaker noise

03

455% increase in median keyword correction rate

Abstract

The increased adoption of digital assistants makes text-to-speech (TTS) synthesis systems an indispensable feature of modern mobile devices. It is hence desirable to build a system capable of generating highly intelligible speech in the presence of noise. Past studies have investigated style conversion in TTS synthesis, yet degraded synthesized quality often leads to worse intelligibility. To overcome such limitations, we proposed a novel transfer learning approach using Tacotron and WaveRNN based TTS synthesis. The proposed speech system exploits two modification strategies: (a) Lombard speaking style data and (b) Spectral Shaping and Dynamic Range Compression (SSDRC) which has been shown to provide high intelligibility gains by redistributing the signal energy on the time-frequency domain. We refer to this extension as Lombard-SSDRC TTS system. Intelligibility enhancement as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dipjyoti92/TTS-Style-Transfer
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsHighway Layer · Dense Connections · Residual GRU · Griffin-Lim Algorithm · Dropout · Residual Connection · Max Pooling · Softmax · Highway Network · Sigmoid Activation