An Investigation into the Effectiveness of Enhancement in ASR Training   and Test for CHiME-5 Dinner Party Transcription

Catalin Zorila; Christoph Boeddeker; Rama Doddipatla; Reinhold; Haeb-Umbach

arXiv:1909.12208·cs.CL·September 27, 2019

An Investigation into the Effectiveness of Enhancement in ASR Training and Test for CHiME-5 Dinner Party Transcription

Catalin Zorila, Christoph Boeddeker, Rama Doddipatla, Reinhold, Haeb-Umbach

PDF

1 Repo

TL;DR

This paper demonstrates that enhancing training data with speech enhancement techniques improves ASR performance on challenging multi-channel dinner party recordings, achieving state-of-the-art results on CHiME-5.

Contribution

It provides extensive evidence that enhancement during training, combined with test enhancement, yields significant WER reductions, surpassing previous augmentation strategies.

Findings

01

Enhancement in training reduces word error rates substantially.

02

Matching enhancement strength in training and test is beneficial.

03

Achieved new state-of-the-art WER on CHiME-5 with a CNN-TDNN model.

Abstract

Despite the strong modeling power of neural network acoustic models, speech enhancement has been shown to deliver additional word error rate improvements if multi-channel data is available. However, there has been a longstanding debate whether enhancement should also be carried out on the ASR training data. In an extensive experimental evaluation on the acoustically very challenging CHiME-5 dinner party data we show that: (i) cleaning up the training data can lead to substantial error rate reductions, and (ii) enhancement in training is advisable as long as enhancement in test is at least as strong as in training. This approach stands in contrast and delivers larger gains than the common strategy reported in the literature to augment the training database with additional artificially degraded speech. Together with an acoustic model topology consisting of initial CNN layers followed by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

fgnt/pb_chime5
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.