Sequence-to-sequence Automatic Speech Recognition with Word Embedding   Regularization and Fused Decoding

Alexander H. Liu; Tzu-Wei Sung; Shun-Po Chuang; Hung-yi Lee; and Lin-shan Lee

arXiv:1910.12740·cs.CL·February 6, 2020

Sequence-to-sequence Automatic Speech Recognition with Word Embedding Regularization and Fused Decoding

Alexander H. Liu, Tzu-Wei Sung, Shun-Po Chuang, Hung-yi Lee, and Lin-shan Lee

PDF

1 Repo

TL;DR

This paper explores how integrating pre-trained word embeddings into sequence-to-sequence speech recognition models improves accuracy by regularizing the decoder and enabling semantically aware decoding, with promising results on LibriSpeech.

Contribution

It introduces a novel word embedding regularization method and a fused decoding mechanism for seq-to-seq ASR, enhancing semantic consistency and recognition accuracy.

Findings

01

Pre-trained word embeddings significantly reduce recognition errors.

02

The choice of embedding algorithm impacts performance.

03

The proposed methods achieve improvements with minimal additional cost.

Abstract

In this paper, we investigate the benefit that off-the-shelf word embedding can bring to the sequence-to-sequence (seq-to-seq) automatic speech recognition (ASR). We first introduced the word embedding regularization by maximizing the cosine similarity between a transformed decoder feature and the target word embedding. Based on the regularized decoder, we further proposed the fused decoding mechanism. This allows the decoder to consider the semantic consistency during decoding by absorbing the information carried by the transformed decoder feature, which is learned to be close to the target word embedding. Initial results on LibriSpeech demonstrated that pre-trained word embedding can significantly lower ASR recognition error with a negligible cost, and the choice of word embedding algorithms among Skip-gram, CBOW and BERT is important.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Alexander-H-Liu/End-to-end-ASR-Pytorch
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLinear Layer · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Weight Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections · Adam · WordPiece · Softmax