Generative Adversarial Training Data Adaptation for Very Low-resource   Automatic Speech Recognition

Kohei Matsuura; Masato Mimura; Shinsuke Sakai; Tatsuya Kawahara

arXiv:2005.09256·eess.AS·August 3, 2020

Generative Adversarial Training Data Adaptation for Very Low-resource Automatic Speech Recognition

Kohei Matsuura, Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara

PDF

Open Access 1 Repo

TL;DR

This paper introduces a CycleGAN-based voice conversion method to adapt training speech data to test speakers, significantly improving ASR performance on low-resource endangered language corpora.

Contribution

It presents a novel speaker adaptation technique using non-parallel voice conversion to enhance ASR accuracy for endangered languages with limited data.

Findings

01

35-60% relative reduction in phone error rate on Ainu corpus

02

40% relative reduction in phone error rate on Mboshi corpus

03

Outperforms conventional unsupervised and multilingual training methods

Abstract

It is important to transcribe and archive speech data of endangered languages for preserving heritages of verbal culture and automatic speech recognition (ASR) is a powerful tool to facilitate this process. However, since endangered languages do not generally have large corpora with many speakers, the performance of ASR models trained on them are considerably poor in general. Nevertheless, we are often left with a lot of recordings of spontaneous speech data that have to be transcribed. In this work, for mitigating this speaker sparsity problem, we propose to convert the whole training speech data and make it sound like the test speaker in order to develop a highly accurate ASR system for this speaker. For this purpose, we utilize a CycleGAN-based non-parallel voice conversion technology to forge a labeled training data that is close to the test speaker's speech. We evaluated this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Kohei-Matsuura/Non-parallel-VC-on-Mboshi
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing