Does Learning Mathematical Problem-Solving Generalize to Broader Reasoning?
Ruochen Zhou, Minrui Xu, Shiqi Chen, Junteng Liu, Yunqi Li, Xinxin Lin, Zhengyu Chen, Junxian He

TL;DR
This paper empirically investigates how different mathematical problem-solving training methods affect large language models' ability to generalize to broader reasoning tasks, highlighting the benefits of long chain-of-thought responses.
Contribution
It systematically compares various MPS training approaches, revealing that long chain-of-thought training and rule-based reinforcement learning significantly improve reasoning generalization.
Findings
Continual pretraining on math text partially generalizes to reasoning tasks.
Instruction tuning on short MPS samples often impairs generalization.
Long chain-of-thought responses enhance reasoning across domains.
Abstract
There has been a growing interest in enhancing the mathematical problem-solving (MPS) capabilities of large language models. While the majority of research efforts concentrate on creating specialized models to solve mathematical problems, it remains unknown how learning mathematical problem-solving generalizes to help develop other reasoning abilities. In this paper, we present an empirical investigation into the generalization potential of various MPS training approaches, such as continual pretraining, instruction tuning, and rule-based reinforcement learning across various data sources, including both short and long chain-of-thought (CoT) samples. Evaluation on 5 mathematical and 8 general reasoning benchmarks show that continual pretraining on math text is able to generalize to general reasoning tasks to some extent. In constrast, instruction tuning on conventional, short MPS samples…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
