Extrapolation in NLP
Jeff Mitchell, Pasquale Minervini, Pontus Stenetorp, Sebastian, Riedel

TL;DR
This paper argues that NLP models that capture global structures, like Decomposable Attention and word2vec, are better at extrapolating beyond training data than those focusing on local fit.
Contribution
It demonstrates that models emphasizing global structure are more effective at extrapolation, supported by experiments with Decomposable Attention and word2vec.
Findings
Global-structure models outperform local-fit models in extrapolation tasks
Decomposable Attention and word2vec show better generalization outside training space
Extrapolation success linked to models capturing global data structures
Abstract
We argue that extrapolation to examples outside the training space will often be easier for models that capture global structures, rather than just maximise their local fit to the training data. We show that this is true for two popular models: the Decomposable Attention Model and word2vec.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
