Extrapolation in NLP

Jeff Mitchell; Pasquale Minervini; Pontus Stenetorp; Sebastian; Riedel

arXiv:1805.06648·cs.CL·May 18, 2018

Extrapolation in NLP

Jeff Mitchell, Pasquale Minervini, Pontus Stenetorp, Sebastian, Riedel

PDF

TL;DR

This paper argues that NLP models that capture global structures, like Decomposable Attention and word2vec, are better at extrapolating beyond training data than those focusing on local fit.

Contribution

It demonstrates that models emphasizing global structure are more effective at extrapolation, supported by experiments with Decomposable Attention and word2vec.

Findings

01

Global-structure models outperform local-fit models in extrapolation tasks

02

Decomposable Attention and word2vec show better generalization outside training space

03

Extrapolation success linked to models capturing global data structures

Abstract

We argue that extrapolation to examples outside the training space will often be easier for models that capture global structures, rather than just maximise their local fit to the training data. We show that this is true for two popular models: the Decomposable Attention Model and word2vec.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.