Pushing the Limits of Rule Reasoning in Transformers through Natural Language Satisfiability
Kyle Richardson, Ashish Sabharwal

TL;DR
This paper introduces a new methodology for creating challenging natural language satisfiability problems to evaluate and improve transformer models' reasoning abilities, revealing both their strengths and limitations.
Contribution
It proposes a systematic approach to generate hard reasoning datasets based on SAT problem insights, enhancing the evaluation of transformer models' deductive reasoning capabilities.
Findings
Transformers perform surprisingly well on complex NLSat problems with sufficient training.
Models exhibit some scale-invariance, generalizing to larger problem sizes.
Careful training data sampling is essential for better generalization to larger problems.
Abstract
Investigating the reasoning abilities of transformer models, and discovering new challenging tasks for them, has been a topic of much interest. Recent studies have found these models to be surprisingly strong at performing deductive reasoning over formal logical theories expressed in natural language. A shortcoming of these studies, however, is that they do not take into account that logical theories, when sampled uniformly at random, do not necessarily lead to hard instances. We propose a new methodology for creating challenging algorithmic reasoning datasets that focus on natural language satisfiability (NLSat) problems. The key idea is to draw insights from empirical sampling of hard propositional SAT problems and from complexity-theoretic studies of language. This methodology allows us to distinguish easy from hard instances, and to systematically increase the complexity of existing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Logic, Reasoning, and Knowledge
