Correction of Automatic Speech Recognition with Transformer   Sequence-to-sequence Model

Oleksii Hrinchuk; Mariya Popova; Boris Ginsburg

arXiv:1910.10697·cs.CL·October 24, 2019

Correction of Automatic Speech Recognition with Transformer Sequence-to-sequence Model

Oleksii Hrinchuk, Mariya Popova, Boris Ginsburg

PDF

TL;DR

This paper presents a Transformer-based post-processing model for automatic speech recognition that significantly improves word error rates, especially in noisy conditions, by leveraging data augmentation and pre-trained weights.

Contribution

Introduces a Transformer sequence-to-sequence model for ASR correction that outperforms traditional methods and approaches neural language model re-scoring performance.

Findings

01

Significant WER reduction on LibriSpeech benchmark.

02

Effective use of data augmentation and pre-trained weights.

03

Outperforms baseline with 6-gram re-scoring.

Abstract

In this work, we introduce a simple yet efficient post-processing model for automatic speech recognition (ASR). Our model has Transformer-based encoder-decoder architecture which "translates" ASR model output into grammatically and semantically correct text. We investigate different strategies for regularizing and optimizing the model and show that extensive data augmentation and the initialization with pre-trained weights are required to achieve good performance. On the LibriSpeech benchmark, our method demonstrates significant improvement in word error rate over the baseline acoustic model with greedy decoding, especially on much noisier dev-other and test-other portions of the evaluation dataset. Our model also outperforms baseline with 6-gram language model re-scoring and approaches the performance of re-scoring with Transformer-XL neural language model.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLinear Layer · Cosine Annealing · Softmax · *Communicated@Fast*How Do I Communicate to Expedia? · Variational Dropout · Adam · Layer Normalization · Dropout · Attention Is All You Need · Multi-Head Attention