Symbolic Brittleness in Sequence Models: on Systematic Generalization in Symbolic Mathematics
Sean Welleck, Peter West, Jize Cao, Yejin Choi

TL;DR
This paper investigates the limitations of neural sequence models in symbolic mathematics, revealing challenges in systematic and out-of-distribution generalization despite strong in-distribution performance.
Contribution
It introduces a methodology for evaluating generalization in symbolic tasks and demonstrates the models' struggles with robustness, compositionality, and out-of-distribution scenarios.
Findings
Models perform well in-distribution but fail systematically out-of-distribution.
Manual test suites and genetic algorithms reveal numerous failure modes.
Highlighting the need for better evaluation methods and model robustness.
Abstract
Neural sequence models trained with maximum likelihood estimation have led to breakthroughs in many tasks, where success is defined by the gap between training and test performance. However, their ability to achieve stronger forms of generalization remains unclear. We consider the problem of symbolic mathematical integration, as it requires generalizing systematically beyond the test set. We develop a methodology for evaluating generalization that takes advantage of the problem domain's structure and access to a verifier. Despite promising in-distribution performance of sequence-to-sequence models in this domain, we demonstrate challenges in achieving robustness, compositionality, and out-of-distribution generalization, through both carefully constructed manual test suites and a genetic algorithm that automatically finds large collections of failures in a controllable manner. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsModel Reduction and Neural Networks · Neural Networks and Applications · Machine Learning in Materials Science
MethodsTest
