How Do Neural Sequence Models Generalize? Local and Global Context Cues   for Out-of-Distribution Prediction

Anthony Bau; Jacob Andreas

arXiv:2111.03108·cs.CL·November 8, 2021

How Do Neural Sequence Models Generalize? Local and Global Context Cues for Out-of-Distribution Prediction

Anthony Bau, Jacob Andreas

PDF

Open Access

TL;DR

This paper investigates how neural sequence models like RNNs and transformers generalize in out-of-distribution contexts, revealing they interpolate between local and global cues, influenced by noise and regularization, with theoretical insights into this behavior.

Contribution

The paper introduces idealized models of local and global generalization, demonstrates neural models interpolate between these, and provides a theoretical explanation for this behavior based on feature correlations.

Findings

01

Neural models interpolate between local and global generalization.

02

Noise influences the balance between local and global generalization.

03

Theoretical analysis explains the observed interpolation behavior.

Abstract

After a neural sequence model encounters an unexpected token, can its behavior be predicted? We show that RNN and transformer language models exhibit structured, consistent generalization in out-of-distribution contexts. We begin by introducing two idealized models of generalization in next-word prediction: a local context model in which generalization is consistent with the last word observed, and a global context model in which generalization is consistent with the global structure of the input. In experiments in English, Finnish, Mandarin, and random regular languages, we demonstrate that neural language models interpolate between these two forms of generalization: their predictions are well-approximated by a log-linear combination of local and global predictive distributions. We then show that, in some languages, noise mediates the two forms of generalization: noise applied to input…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems