Pervasive Attention: 2D Convolutional Neural Networks for   Sequence-to-Sequence Prediction

Maha Elbayad; Laurent Besacier; Jakob Verbeek

arXiv:1808.03867·cs.CL·November 2, 2018·53 cites

Pervasive Attention: 2D Convolutional Neural Networks for Sequence-to-Sequence Prediction

Maha Elbayad, Laurent Besacier, Jakob Verbeek

PDF

Open Access 3 Repos

TL;DR

This paper introduces a novel 2D convolutional neural network architecture for sequence-to-sequence prediction, replacing traditional encoder-decoder models with attention mechanisms, resulting in improved performance and simplicity.

Contribution

The paper presents a new convolutional approach that integrates attention-like properties throughout the network, outperforming existing encoder-decoder systems in machine translation.

Findings

01

Outperforms state-of-the-art encoder-decoder models

02

Simpler architecture with fewer parameters

03

Demonstrates effective attention-like behavior throughout

Abstract

Current state-of-the-art machine translation systems are based on encoder-decoder architectures, that first encode the input sequence, and then generate an output sequence based on the input encoding. Both are interfaced with an attention mechanism that recombines a fixed encoding of the source tokens based on the decoder state. We propose an alternative approach which instead relies on a single 2D convolutional neural network across both sequences. Each layer of our network re-codes source tokens on the basis of the output sequence produced so far. Attention-like properties are therefore pervasive throughout the network. Our model yields excellent results, outperforming state-of-the-art encoder-decoder systems, while being conceptually simpler and having fewer parameters.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications