Robust Speech Recognition Using Generative Adversarial Networks

Anuroop Sriram; Heewoo Jun; Yashesh Gaur; Sanjeev Satheesh

arXiv:1711.01567·cs.CL·November 7, 2017

Robust Speech Recognition Using Generative Adversarial Networks

Anuroop Sriram, Heewoo Jun, Yashesh Gaur, Sanjeev Satheesh

PDF

TL;DR

This paper introduces a scalable GAN-based framework for robust speech recognition that improves invariance to noise without relying on domain-specific assumptions, enhancing far-field speech recognition performance.

Contribution

The paper presents a novel end-to-end GAN-based approach that directly encourages robustness in speech recognition models without domain expertise or simplifying assumptions.

Findings

01

Improved accuracy in simulated far-field speech recognition

02

Encoders learn noise-invariant embeddings

03

No need for specialized front-ends or preprocessing

Abstract

This paper describes a general, scalable, end-to-end framework that uses the generative adversarial network (GAN) objective to enable robust speech recognition. Encoders trained with the proposed approach enjoy improved invariance by learning to map noisy audio to the same embedding space as that of clean audio. Unlike previous methods, the new framework does not rely on domain expertise or simplifying assumptions as are often needed in signal processing, and directly encourages robustness in a data-driven way. We show the new approach improves simulated far-field speech recognition of vanilla sequence-to-sequence models without specialized front-ends or preprocessing.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.