pMCT: Patched Multi-Condition Training for Robust Speech Recognition

Pablo Peso Parada; Agnieszka Dobrowolska; Karthikeyan Saravanan; Mete; Ozay

arXiv:2207.04949·eess.AS·July 12, 2022

pMCT: Patched Multi-Condition Training for Robust Speech Recognition

Pablo Peso Parada, Agnieszka Dobrowolska, Karthikeyan Saravanan, Mete, Ozay

PDF

Open Access

TL;DR

This paper introduces pMCT, a novel training method that enhances speech recognition robustness by mixing clean and distorted speech patches during training, leading to significant improvements in noisy environments.

Contribution

pMCT is a new training approach that uses patch-based audio modification and patching to improve ASR robustness in noisy and reverberant conditions.

Findings

01

pMCT outperforms vanilla MCT on LibriSpeech.

02

pMCT achieves 23.1% relative WER reduction on VOiCES.

03

pMCT enhances robustness in noisy reverberant scenarios.

Abstract

We propose a novel Patched Multi-Condition Training (pMCT) method for robust Automatic Speech Recognition (ASR). pMCT employs Multi-condition Audio Modification and Patching (MAMP) via mixing {\it patches} of the same utterance extracted from clean and distorted speech. Training using patch-modified signals improves robustness of models in noisy reverberant scenarios. Our proposed pMCT is evaluated on the LibriSpeech dataset showing improvement over using vanilla Multi-Condition Training (MCT). For analyses on robust ASR, we employed pMCT on the VOiCES dataset which is a noisy reverberant dataset created using utterances from LibriSpeech. In the analyses, pMCT achieves 23.1% relative WER reduction compared to the MCT.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing