Area Attention
Yang Li, Lukasz Kaiser, Samy Bengio, Si Si

TL;DR
Area attention introduces a dynamic, learnable way to attend to groups of adjacent items in memory, enhancing model performance in tasks like translation and image captioning by capturing varying levels of granularity.
Contribution
It proposes a novel area attention mechanism that dynamically learns to attend to contiguous regions, improving upon fixed-granularity attention methods in neural models.
Findings
Improves neural machine translation performance.
Enhances image captioning results.
Works with existing attention architectures without additional parameters.
Abstract
Existing attention mechanisms are trained to attend to individual items in a collection (the memory) with a predefined, fixed granularity, e.g., a word token or an image grid. We propose area attention: a way to attend to areas in the memory, where each area contains a group of items that are structurally adjacent, e.g., spatially for a 2D memory such as images, or temporally for a 1D memory such as natural language sentences. Importantly, the shape and the size of an area are dynamically determined via learning, which enables a model to attend to information with varying granularity. Area attention can easily work with existing model architectures such as multi-head attention for simultaneously attending to multiple areas in the memory. We evaluate area attention on two tasks: neural machine translation (both character and token-level) and image captioning, and improve upon strong…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Topic Modeling
MethodsAttention Is All You Need · Softmax · Linear Layer · Multi-Head Attention
