Local Monotonic Attention Mechanism for End-to-End Speech and Language Processing
Andros Tjandra, Sakriani Sakti, Satoshi Nakamura

TL;DR
This paper introduces a local monotonic attention mechanism for encoder-decoder models, improving efficiency and alignment accuracy in sequence tasks like speech recognition and translation.
Contribution
It proposes a novel local monotonic attention mechanism that better fits tasks with sequential and monotonic data, reducing computational costs and improving alignment.
Findings
Significant performance improvements on ASR, G2P, and translation tasks.
Reduced computational complexity compared to global attention.
Effective alignment in long input sequences.
Abstract
Recently, encoder-decoder neural networks have shown impressive performance on many sequence-related tasks. The architecture commonly uses an attentional mechanism which allows the model to learn alignments between the source and the target sequence. Most attentional mechanisms used today is based on a global attention property which requires a computation of a weighted summarization of the whole input sequence generated by encoder states. However, it is computationally expensive and often produces misalignment on the longer input sequence. Furthermore, it does not fit with monotonous or left-to-right nature in several tasks, such as automatic speech recognition (ASR), grapheme-to-phoneme (G2P), etc. In this paper, we propose a novel attention mechanism that has local and monotonic properties. Various ways to control those properties are also explored. Experimental results on ASR, G2P…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
