A Cycle-GAN Approach to Model Natural Perturbations in Speech for ASR Applications
Sri Harsha Dumpala, Imran Sheikh, Rupayan Chakraborty, Sunil Kumar, Kopparapu

TL;DR
This paper introduces a CycleGAN-based front-end to transform naturally perturbed speech into normal speech, significantly enhancing ASR robustness against emotional and physical speaker states.
Contribution
The paper presents a novel CycleGAN approach trained on non-parallel data to improve speech recognition in emotionally and physically perturbed speech conditions.
Findings
ASR performance improves with CycleGAN-processed speech
Effective transformation of laughter and creaky speech
Visualization confirms feature similarity post-transformation
Abstract
Naturally introduced perturbations in audio signal, caused by emotional and physical states of the speaker, can significantly degrade the performance of Automatic Speech Recognition (ASR) systems. In this paper, we propose a front-end based on Cycle-Consistent Generative Adversarial Network (CycleGAN) which transforms naturally perturbed speech into normal speech, and hence improves the robustness of an ASR system. The CycleGAN model is trained on non-parallel examples of perturbed and normal speech. Experiments on spontaneous laughter-speech and creaky-speech datasets show that the performance of four different ASR systems improve by using speech obtained from CycleGAN based front-end, as compared to directly using the original perturbed speech. Visualization of the features of the laughter perturbed speech and those generated by the proposed front-end further demonstrates the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing
MethodsBatch Normalization · Residual Connection · PatchGAN · *Communicated@Fast*How Do I Communicate to Expedia? · Tanh Activation · Residual Block · Instance Normalization · Convolution · HuMan(Expedia)||How do I get a human at Expedia? · Sigmoid Activation
