Local Monotonic Attention Mechanism for End-to-End Speech and Language   Processing

Andros Tjandra; Sakriani Sakti; Satoshi Nakamura

arXiv:1705.08091·cs.CL·November 6, 2017·32 cites

Local Monotonic Attention Mechanism for End-to-End Speech and Language Processing

Andros Tjandra, Sakriani Sakti, Satoshi Nakamura

PDF

Open Access

TL;DR

This paper introduces a local monotonic attention mechanism for encoder-decoder models, improving efficiency and alignment accuracy in sequence tasks like speech recognition and translation.

Contribution

It proposes a novel local monotonic attention mechanism that better fits tasks with sequential and monotonic data, reducing computational costs and improving alignment.

Findings

01

Significant performance improvements on ASR, G2P, and translation tasks.

02

Reduced computational complexity compared to global attention.

03

Effective alignment in long input sequences.

Abstract

Recently, encoder-decoder neural networks have shown impressive performance on many sequence-related tasks. The architecture commonly uses an attentional mechanism which allows the model to learn alignments between the source and the target sequence. Most attentional mechanisms used today is based on a global attention property which requires a computation of a weighted summarization of the whole input sequence generated by encoder states. However, it is computationally expensive and often produces misalignment on the longer input sequence. Furthermore, it does not fit with monotonous or left-to-right nature in several tasks, such as automatic speech recognition (ASR), grapheme-to-phoneme (G2P), etc. In this paper, we propose a novel attention mechanism that has local and monotonic properties. Various ways to control those properties are also explored. Experimental results on ASR, G2P…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis