You May Not Need Attention

Ofir Press; Noah A. Smith

arXiv:1810.13409·cs.CL·November 1, 2018·22 cites

You May Not Need Attention

Ofir Press, Noah A. Smith

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel recurrent neural translation model that eliminates the need for attention mechanisms and separate encoder-decoder architecture, achieving comparable or better performance especially on long sentences.

Contribution

The paper presents an eager, low-latency translation model without attention or separate encoder-decoder, demonstrating competitive results in neural machine translation.

Findings

01

Performs on par with attention-based models

02

Outperforms on long sentences

03

Uses constant memory during decoding

Abstract

In NMT, how far can we get without attention and without separate encoding and decoding? To answer that question, we introduce a recurrent neural translation model that does not use attention and does not have a separate encoder and decoder. Our eager translation model is low-latency, writing target tokens as soon as it reads the first source token, and uses constant memory during decoding. It performs on par with the standard attention-based model of Bahdanau et al. (2014), and better on long sentences.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ofirpress/YouMayNotNeedAttention
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification