Location Attention for Extrapolation to Longer Sequences
Yann Dubois, Gautier Dagan, Dieuwke Hupkes, Elia Bruni

TL;DR
This paper investigates how neural networks can better extrapolate to longer sequences in natural language processing by proposing a location-based attention mechanism, supported by empirical results on sequence tasks.
Contribution
It introduces a location attention mechanism that improves models' ability to generalize to longer sequences compared to traditional attention methods.
Findings
Location attention enhances extrapolation to longer sequences.
Models with location attention outperform standard attention in sequence tasks.
The approach sheds light on neural models' limitations and potential solutions for sequence extrapolation.
Abstract
Neural networks are surprisingly good at interpolating and perform remarkably well when the training set examples resemble those in the test set. However, they are often unable to extrapolate patterns beyond the seen data, even when the abstractions required for such patterns are simple. In this paper, we first review the notion of extrapolation, why it is important and how one could hope to tackle it. We then focus on a specific type of extrapolation which is especially useful for natural language processing: generalization to sequences that are longer than the training ones. We hypothesize that models with a separate content- and location-based attention are more likely to extrapolate than those with common attention mechanisms. We empirically support our claim for recurrent seq2seq models with our proposed attention on variants of the Lookup Table task. This sheds light on some…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsTest · Softmax · Sigmoid Activation · Tanh Activation · Location-based Attention
