# SpecAugment: A Simple Data Augmentation Method for Automatic Speech   Recognition

**Authors:** Daniel S. Park, William Chan, Yu Zhang, Chung-Cheng Chiu, Barret Zoph,, Ekin D. Cubuk, Quoc V. Le

arXiv: 1904.08779 · 2019-12-04

## TL;DR

SpecAugment introduces a straightforward data augmentation technique applied directly to speech features, significantly improving end-to-end speech recognition performance and setting new state-of-the-art results on multiple benchmarks.

## Contribution

The paper presents a novel, simple augmentation method for speech features that enhances neural network training and achieves state-of-the-art accuracy without complex modeling.

## Key findings

- Achieved 6.8% WER on LibriSpeech test-other without a language model.
- Attained 7.2% WER on Switchboard without a language model.
- Outperformed previous hybrid systems in speech recognition accuracy.

## Abstract

We present SpecAugment, a simple data augmentation method for speech recognition. SpecAugment is applied directly to the feature inputs of a neural network (i.e., filter bank coefficients). The augmentation policy consists of warping the features, masking blocks of frequency channels, and masking blocks of time steps. We apply SpecAugment on Listen, Attend and Spell networks for end-to-end speech recognition tasks. We achieve state-of-the-art performance on the LibriSpeech 960h and Swichboard 300h tasks, outperforming all prior work. On LibriSpeech, we achieve 6.8% WER on test-other without the use of a language model, and 5.8% WER with shallow fusion with a language model. This compares to the previous state-of-the-art hybrid system of 7.5% WER. For Switchboard, we achieve 7.2%/14.6% on the Switchboard/CallHome portion of the Hub5'00 test set without the use of a language model, and 6.8%/14.1% with shallow fusion, which compares to the previous state-of-the-art hybrid system at 8.3%/17.3% WER.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1904.08779/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/1904.08779/full.md

## References

50 references — full list in the complete paper: https://tomesphere.com/paper/1904.08779/full.md

---
Source: https://tomesphere.com/paper/1904.08779