Channel-Aware Domain-Adaptive Generative Adversarial Network for Robust   Speech Recognition

Chien-Chun Wang; Li-Wei Chen; Cheng-Kang Chou; Hung-Shin Lee; Berlin; Chen; Hsin-Min Wang

arXiv:2409.12386·cs.SD·January 9, 2025

Channel-Aware Domain-Adaptive Generative Adversarial Network for Robust Speech Recognition

Chien-Chun Wang, Li-Wei Chen, Cheng-Kang Chou, Hung-Shin Lee, Berlin, Chen, Hsin-Min Wang

PDF

Open Access 1 Repo

TL;DR

This paper introduces a channel-aware GAN-based data simulation method to improve the robustness of speech recognition systems against unseen recording environments, significantly reducing error rates.

Contribution

It proposes a novel approach combining channel extraction and GANs to generate realistic target-domain speech for robust ASR training.

Findings

01

Achieved 20.02% relative CER reduction on HAT corpus.

02

Achieved 9.64% relative CER reduction on TAT corpus.

03

Demonstrated effectiveness of channel-aware data simulation in bridging domain gaps.

Abstract

While pre-trained automatic speech recognition (ASR) systems demonstrate impressive performance on matched domains, their performance often degrades when confronted with channel mismatch stemming from unseen recording environments and conditions. To mitigate this issue, we propose a novel channel-aware data simulation method for robust ASR training. Our method harnesses the synergistic power of channel-extractive techniques and generative adversarial networks (GANs). We first train a channel encoder capable of extracting embeddings from arbitrary audio. On top of this, channel embeddings are extracted using a minimal amount of target-domain data and used to guide a GAN-based speech synthesizer. This synthesizer generates speech that faithfully preserves the phonetic content of the input while mimicking the channel characteristics of the target domain. We evaluate our method on the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jethrowangsir/cada-gan
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing