How Do Neural Sequence Models Generalize? Local and Global Context Cues for Out-of-Distribution Prediction
Anthony Bau, Jacob Andreas

TL;DR
This paper investigates how neural sequence models like RNNs and transformers generalize in out-of-distribution contexts, revealing they interpolate between local and global cues, influenced by noise and regularization, with theoretical insights into this behavior.
Contribution
The paper introduces idealized models of local and global generalization, demonstrates neural models interpolate between these, and provides a theoretical explanation for this behavior based on feature correlations.
Findings
Neural models interpolate between local and global generalization.
Noise influences the balance between local and global generalization.
Theoretical analysis explains the observed interpolation behavior.
Abstract
After a neural sequence model encounters an unexpected token, can its behavior be predicted? We show that RNN and transformer language models exhibit structured, consistent generalization in out-of-distribution contexts. We begin by introducing two idealized models of generalization in next-word prediction: a local context model in which generalization is consistent with the last word observed, and a global context model in which generalization is consistent with the global structure of the input. In experiments in English, Finnish, Mandarin, and random regular languages, we demonstrate that neural language models interpolate between these two forms of generalization: their predictions are well-approximated by a log-linear combination of local and global predictive distributions. We then show that, in some languages, noise mediates the two forms of generalization: noise applied to input…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems
