Text Augmentation for Language Models in High Error Recognition Scenario

Karel Bene\v{s}; Luk\'a\v{s} Burget

arXiv:2011.06056·cs.CL·November 13, 2020

Text Augmentation for Language Models in High Error Recognition Scenario

Karel Bene\v{s}, Luk\'a\v{s} Burget

PDF

1 Repo

TL;DR

This paper explores data augmentation strategies for training speech recognition language models, finding that global error rate-based augmentation improves WER more effectively than other methods.

Contribution

It introduces a simple global error statistic-based augmentation scheme that outperforms label smoothing and improves speech recognition accuracy.

Findings

01

Global error-based augmentation improves WER from 1.1% to 1.9%.

02

Perplexity on augmented data does not predict final error rate.

03

Simple augmentation scheme outperforms more complex methods.

Abstract

We examine the effect of data augmentation for training of language models for speech recognition. We compare augmentation based on global error statistics with one based on per-word unigram statistics of ASR errors and observe that it is better to only pay attention the global substitution, deletion and insertion rates. This simple scheme also performs consistently better than label smoothing and its sampled variants. Additionally, we investigate into the behavior of perplexity estimated on augmented data, but conclude that it gives no better prediction of the final error rate. Our best augmentation scheme increases the absolute WER improvement from second-pass rescoring from 1.1 % to 1.9 % absolute on the CHiMe-6 challenge.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

BUTSpeechFIT/BrnoLM
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLabel Smoothing