Gated Recurrent Context: Softmax-free Attention for Online   Encoder-Decoder Speech Recognition

Hyeonseung Lee; Woo Hyun Kang; Sung Jun Cheon; Hyeongju Kim; Nam Soo; Kim

arXiv:2007.05214·eess.AS·January 15, 2021

Gated Recurrent Context: Softmax-free Attention for Online Encoder-Decoder Speech Recognition

Hyeonseung Lee, Woo Hyun Kang, Sung Jun Cheon, Hyeongju Kim, Nam Soo, Kim

PDF

1 Repo

TL;DR

This paper introduces a novel softmax-free online attention mechanism for encoder-decoder speech recognition that eliminates hyperparameter tuning and balances latency with recognition accuracy.

Contribution

It proposes a new online attention method that removes the need for hyperparameter tuning, simplifying training and maintaining competitive performance.

Findings

01

The method controls latency-performance tradeoff via a simple threshold adjustment.

02

Achieves competitive word-error-rates compared to traditional attention methods.

03

No additional hyperparameters are needed during training.

Abstract

Recently, attention-based encoder-decoder (AED) models have shown state-of-the-art performance in automatic speech recognition (ASR). As the original AED models with global attentions are not capable of online inference, various online attention schemes have been developed to reduce ASR latency for better user experience. However, a common limitation of the conventional softmax-based online attention approaches is that they introduce an additional hyperparameter related to the length of the attention window, requiring multiple trials of model training for tuning the hyperparameter. In order to deal with this problem, we propose a novel softmax-free attention method and its modified formulation for online attention, which does not need any additional hyperparameter at the training phase. Through a number of ASR experiments, we demonstrate the tradeoff between the latency and performance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

swigls/GRC
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.