Achieving and Understanding Out-of-Distribution Generalization in Systematic Reasoning in Small-Scale Transformers
Andrew J. Nam, Mustafa Abdool, Trevor Maxfield, James L. McClelland

TL;DR
This paper investigates how small-scale transformers generalize out-of-distribution in systematic reasoning tasks, revealing that managing positional encoding is crucial for robust generalization on complex problems.
Contribution
It demonstrates that controlling positional sensitivity enables small transformers to generalize systematically in reasoning tasks like Sudoku, advancing understanding of OODG in neural networks.
Findings
Proper positional encoding improves OODG performance.
Suppressing absolute position sensitivity enhances generalization.
Training on full distribution of component tasks aids complex problem solving.
Abstract
Out-of-distribution generalization (OODG) is a longstanding challenge for neural networks. This challenge is quite apparent in tasks with well-defined variables and rules, where explicit use of the rules could solve problems independently of the particular values of the variables, but networks tend to be tied to the range of values sampled in their training data. Large transformer-based language models have pushed the boundaries on how well neural networks can solve previously unseen problems, but their complexity and lack of clarity about the relevant content in their training data obfuscates how they achieve such robustness. As a step toward understanding how transformer-based systems generalize, we explore the question of OODG in small scale transformers trained with examples from a known distribution. Using a reasoning task based on the puzzle Sudoku, we show that OODG can occur on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEvolutionary Algorithms and Applications · Neural Networks and Applications · Handwritten Text Recognition Techniques
