Subword Regularization: An Analysis of Scalability and Generalization   for End-to-End Automatic Speech Recognition

Egor Lakomkin; Jahn Heymann; Ilya Sklyar; Simon Wiesler

arXiv:2008.04034·eess.AS·August 11, 2020

Subword Regularization: An Analysis of Scalability and Generalization for End-to-End Automatic Speech Recognition

Egor Lakomkin, Jahn Heymann, Ilya Sklyar, Simon Wiesler

PDF

TL;DR

This paper investigates how subword regularization improves end-to-end speech recognition by enhancing generalization and reducing word errors, especially with large datasets, and analyzes its effects on unseen words and beam diversity.

Contribution

It provides a systematic analysis of subword regularization's impact on scalability and generalization in streaming speech recognition, demonstrating consistent WER improvements across dataset sizes.

Findings

01

Subword regularization yields 2-8% relative WER reduction.

02

It improves recognition of unseen words.

03

Enhances beam diversity in decoding.

Abstract

Subwords are the most widely used output units in end-to-end speech recognition. They combine the best of two worlds by modeling the majority of frequent words directly and at the same time allow open vocabulary speech recognition by backing off to shorter units or characters to construct words unseen during training. However, mapping text to subwords is ambiguous and often multiple segmentation variants are possible. Yet, many systems are trained using only the most likely segmentation. Recent research suggests that sampling subword segmentations during training acts as a regularizer for neural machine translation and speech recognition models, leading to performance improvements. In this work, we conduct a principled investigation on the regularizing effect of the subword segmentation sampling method for a streaming end-to-end speech recognition task. In particular, we evaluate the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.