You May Not Need Attention
Ofir Press, Noah A. Smith

TL;DR
This paper introduces a novel recurrent neural translation model that eliminates the need for attention mechanisms and separate encoder-decoder architecture, achieving comparable or better performance especially on long sentences.
Contribution
The paper presents an eager, low-latency translation model without attention or separate encoder-decoder, demonstrating competitive results in neural machine translation.
Findings
Performs on par with attention-based models
Outperforms on long sentences
Uses constant memory during decoding
Abstract
In NMT, how far can we get without attention and without separate encoding and decoding? To answer that question, we introduce a recurrent neural translation model that does not use attention and does not have a separate encoder and decoder. Our eager translation model is low-latency, writing target tokens as soon as it reads the first source token, and uses constant memory during decoding. It performs on par with the standard attention-based model of Bahdanau et al. (2014), and better on long sentences.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
