Focus on the present: a regularization method for the ASR source-target   attention layer

Nanxin Chen; Piotr \.Zelasko; Jes\'us Villalba; Najim Dehak

arXiv:2011.01210·eess.AS·November 3, 2020

Focus on the present: a regularization method for the ASR source-target attention layer

Nanxin Chen, Piotr \.Zelasko, Jes\'us Villalba, Najim Dehak

PDF

Open Access

TL;DR

This paper proposes a regularization technique for speech recognition models that uses CTC to improve the focus of source-target attention, leading to significant accuracy gains on standard datasets.

Contribution

A novel CTC-based regularization method is introduced to enhance source-target attention focus in end-to-end speech recognition models.

Findings

01

Up to 7% relative improvement on TED-LIUM 2

02

Up to 13% relative improvement on LibriSpeech

03

Attention heads can predict tokens ahead of time

Abstract

This paper introduces a novel method to diagnose the source-target attention in state-of-the-art end-to-end speech recognition models with joint connectionist temporal classification (CTC) and attention training. Our method is based on the fact that both, CTC and source-target attention, are acting on the same encoder representations. To understand the functionality of the attention, CTC is applied to compute the token posteriors given the attention outputs. We found that the source-target attention heads are able to predict several tokens ahead of the current one. Inspired by the observation, a new regularization method is proposed which leverages CTC to make source-target attention more focused on the frames corresponding to the output token being predicted by the decoder. Experiments reveal stable improvements up to 7\% and 13\% relatively with the proposed regularization on TED-LIUM…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Topic Modeling · Speech and Audio Processing