Pervasive Attention: 2D Convolutional Neural Networks for Sequence-to-Sequence Prediction
Maha Elbayad, Laurent Besacier, Jakob Verbeek

TL;DR
This paper introduces a novel 2D convolutional neural network architecture for sequence-to-sequence prediction, replacing traditional encoder-decoder models with attention mechanisms, resulting in improved performance and simplicity.
Contribution
The paper presents a new convolutional approach that integrates attention-like properties throughout the network, outperforming existing encoder-decoder systems in machine translation.
Findings
Outperforms state-of-the-art encoder-decoder models
Simpler architecture with fewer parameters
Demonstrates effective attention-like behavior throughout
Abstract
Current state-of-the-art machine translation systems are based on encoder-decoder architectures, that first encode the input sequence, and then generate an output sequence based on the input encoding. Both are interfaced with an attention mechanism that recombines a fixed encoding of the source tokens based on the decoder state. We propose an alternative approach which instead relies on a single 2D convolutional neural network across both sequences. Each layer of our network re-codes source tokens on the basis of the output sequence produced so far. Attention-like properties are therefore pervasive throughout the network. Our model yields excellent results, outperforming state-of-the-art encoder-decoder systems, while being conceptually simpler and having fewer parameters.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
