Case-Based or Rule-Based: How Do Transformers Do the Math?
Yi Hu, Xiaojuan Tang, Haotong Yang, Muhan Zhang

TL;DR
This paper investigates whether transformers use rule-based or case-based reasoning for math problems, finding they rely on case-based reasoning and proposing a fine-tuning method to teach rule-based reasoning, significantly improving generalization in addition tasks.
Contribution
The paper introduces a method called Rule-Following Fine-Tuning (RFFT) to teach transformers explicit rules, enabling better rule-based reasoning and length generalization in math tasks.
Findings
Transformers primarily perform case-based reasoning in math problems.
RFFT significantly improves length generalization in addition tasks.
Fine-tuning with explicit rules boosts accuracy from below 55% to over 95%.
Abstract
Despite the impressive performance in a variety of complex tasks, modern large language models (LLMs) still have trouble dealing with some math problems that are simple and intuitive for humans, such as addition. While we can easily learn basic rules of addition and apply them to new problems of any length, LLMs struggle to do the same. Instead, they may rely on similar cases seen in the training corpus for help. We define these two different reasoning mechanisms as "rule-based reasoning" and "case-based reasoning". Since rule-based reasoning is essential for acquiring systematic generalization ability, we aim to explore exactly whether transformers use rule-based or case-based reasoning for math problems. Through carefully designed intervention experiments on five math tasks, we confirm that transformers are performing case-based reasoning, no matter whether scratchpad is used, which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Text Readability and Simplification · Natural Language Processing Techniques
