Area Attention

Yang Li; Lukasz Kaiser; Samy Bengio; Si Si

arXiv:1810.10126·cs.LG·May 11, 2020

Area Attention

Yang Li, Lukasz Kaiser, Samy Bengio, Si Si

PDF

Open Access 1 Repo

TL;DR

Area attention introduces a dynamic, learnable way to attend to groups of adjacent items in memory, enhancing model performance in tasks like translation and image captioning by capturing varying levels of granularity.

Contribution

It proposes a novel area attention mechanism that dynamically learns to attend to contiguous regions, improving upon fixed-granularity attention methods in neural models.

Findings

01

Improves neural machine translation performance.

02

Enhances image captioning results.

03

Works with existing attention architectures without additional parameters.

Abstract

Existing attention mechanisms are trained to attend to individual items in a collection (the memory) with a predefined, fixed granularity, e.g., a word token or an image grid. We propose area attention: a way to attend to areas in the memory, where each area contains a group of items that are structurally adjacent, e.g., spatially for a 2D memory such as images, or temporally for a 1D memory such as natural language sentences. Importantly, the shape and the size of an area are dynamically determined via learning, which enables a model to attend to information with varying granularity. Area attention can easily work with existing model architectures such as multi-head attention for simultaneously attending to multiple areas in the memory. We evaluate area attention on two tasks: neural machine translation (both character and token-level) and image captioning, and improve upon strong…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mikomel/area-attention
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Topic Modeling

MethodsAttention Is All You Need · Softmax · Linear Layer · Multi-Head Attention