Efficient acoustic feature transformation in mismatched environments   using a Guided-GAN

Walter Heymans; Marelie H. Davel; Charl van Heerden

arXiv:2210.00721·cs.SD·October 7, 2022

Efficient acoustic feature transformation in mismatched environments using a Guided-GAN

Walter Heymans, Marelie H. Davel, Charl van Heerden

PDF

TL;DR

This paper introduces a GAN-based framework to enhance acoustic features for speech recognition in resource-limited, mismatched environments, achieving significant WER reductions with low computational cost.

Contribution

It presents a novel GAN approach that improves acoustic features without parallel data, matching multi-style training performance at lower computational expense.

Findings

01

Achieves 11.5% to 19.7% relative WER reduction

02

Effective with less than one hour of training data

03

No need for parallel training data

Abstract

We propose a new framework to improve automatic speech recognition (ASR) systems in resource-scarce environments using a generative adversarial network (GAN) operating on acoustic input features. The GAN is used to enhance the features of mismatched data prior to decoding, or can optionally be used to fine-tune the acoustic model. We achieve improvements that are comparable to multi-style training (MTR), but at a lower computational cost. With less than one hour of data, an ASR system trained on good quality data, and evaluated on mismatched audio is improved by between 11.5% and 19.7% relative word error rate (WER). Experiments demonstrate that the framework can be very useful in under-resourced environments where training data and computational resources are limited. The GAN does not require parallel training data, because it utilises a baseline acoustic model to provide an additional…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.