Fine-tuning of Pre-trained End-to-end Speech Recognition with Generative   Adversarial Networks

Md Akmal Haidar; Mehdi Rezagholizadeh

arXiv:2103.13329·eess.AS·March 25, 2021

Fine-tuning of Pre-trained End-to-end Speech Recognition with Generative Adversarial Networks

Md Akmal Haidar, Mehdi Rezagholizadeh

PDF

TL;DR

This paper proposes a novel adversarial fine-tuning method for pre-trained end-to-end speech recognition models using GANs, improving performance on large datasets like LibriSpeech.

Contribution

It introduces a GAN-based fine-tuning framework for pre-trained ASR models, addressing convergence issues and enhancing recognition accuracy on large corpora.

Findings

01

Outperforms baseline models on LibriSpeech dataset

02

Demonstrates effective adversarial fine-tuning of pre-trained ASR models

03

Shows improved robustness and accuracy in speech recognition

Abstract

Adversarial training of end-to-end (E2E) ASR systems using generative adversarial networks (GAN) has recently been explored for low-resource ASR corpora. GANs help to learn the true data representation through a two-player min-max game. However, training an E2E ASR model using a large ASR corpus with a GAN framework has never been explored, because it might take excessively long time due to high-variance gradient updates and face convergence issues. In this paper, we introduce a novel framework for fine-tuning a pre-trained ASR model using the GAN objective where the ASR model acts as a generator and a discriminator tries to distinguish the ASR output from the real data. Since the ASR model is pre-trained, we hypothesize that the ASR model output (soft distribution vectors) helps to get higher scores from the discriminator and makes the task of the discriminator harder within our GAN…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.