Multi-scale Alignment and Contextual History for Attention Mechanism in   Sequence-to-sequence Model

Andros Tjandra; Sakriani Sakti; Satoshi Nakamura

arXiv:1807.08280·cs.CL·July 24, 2018·1 cites

Multi-scale Alignment and Contextual History for Attention Mechanism in Sequence-to-sequence Model

Andros Tjandra, Sakriani Sakti, Satoshi Nakamura

PDF

Open Access

TL;DR

This paper introduces a novel attention mechanism for sequence-to-sequence models that incorporates multi-scale convolution and historical context, significantly enhancing speech recognition and text-to-speech performance.

Contribution

It proposes a new attention extension using multi-scale convolution and historical context, improving sequence-to-sequence model accuracy in speech and text tasks.

Findings

01

Significant performance improvement over standard attention baseline

02

Effective in speech recognition and text-to-speech systems

03

Enhances model accuracy with multi-scale and historical attention features

Abstract

A sequence-to-sequence model is a neural network module for mapping two sequences of different lengths. The sequence-to-sequence model has three core modules: encoder, decoder, and attention. Attention is the bridge that connects the encoder and decoder modules and improves model performance in many tasks. In this paper, we propose two ideas to improve sequence-to-sequence model performance by enhancing the attention module. First, we maintain the history of the location and the expected context from several previous time-steps. Second, we apply multiscale convolution from several previous attention vectors to the current decoder state. We utilized our proposed framework for sequence-to-sequence speech recognition and text-to-speech systems. The results reveal that our proposed extension could improve performance significantly compared to a standard attention baseline.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Speech Recognition and Synthesis · Natural Language Processing Techniques

MethodsConvolution