ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving
Zhibin Gou, Zhihong Shao, Yeyun Gong, Yelong Shen, Yujiu Yang, Minlie, Huang, Nan Duan, Weizhu Chen

TL;DR
ToRA introduces a tool-integrated reasoning framework that enhances mathematical problem-solving by combining language models with external computational tools, achieving state-of-the-art results on multiple datasets.
Contribution
The paper presents ToRA, a novel tool-integrated reasoning agent that significantly improves mathematical reasoning performance by training on interactive tool-use trajectories and output space shaping.
Findings
ToRA models outperform open-source baselines on 10 datasets.
ToRA-7B achieves 44.6% on MATH, surpassing WizardMath-70B.
ToRA-Code-34B exceeds 50% accuracy on MATH, outperforming GPT-4 CoT.
Abstract
Large language models have made significant progress in various language tasks, yet they still struggle with complex mathematics. In this paper, we propose ToRA a series of Tool-integrated Reasoning Agents designed to solve challenging mathematical problems by seamlessly integrating natural language reasoning with the utilization of external tools (e.g., computation libraries and symbolic solvers), thereby amalgamating the analytical prowess of language and the computational efficiency of tools. To train ToRA, we curate interactive tool-use trajectories on mathematical datasets, apply imitation learning on the annotations, and propose output space shaping to further refine models' reasoning behavior. As a result, ToRA models significantly outperform open-source models on 10 mathematical reasoning datasets across all scales with 13%-19% absolute improvements on average. Notably, ToRA-7B…
Peer Reviews
Decision·ICLR 2024 poster
The idea is clear, well-constructed, and well-explained. The figures are excellent and the algorithm is clearly laid out. The resulting models show considerable performance increases under a range of evaluation settings confirming the efficacy of the strategy.
While the authors have presented what worked well, there is a considerable amount to be gleaned from the failure modes. The authors loosely allude to failure cases including geometric problems and program timeouts, and provide single examples in the appendix, but there are surely more interesting patterns. It would be wonderful if the authors could provide more specific examples and comment on more systematic classes of errors beyond these simple categorizations. For example, are there certain p
1.This paper proposes a two-stage training framework that utilizes training data alternating between natural language and code language to enhance the reasoning ability of language models in mathematical reasoning tasks. The experimental results demonstrate the significant improvement of this approach across 10 datasets. 2.The paper is generally well-written and the figures and tables presented are clear and easy to understand.
1.From Figure 5, it can be observed that the performance of the model does not significantly decrease when output space shaping is removed. More experiments are needed to demonstrate whether the performance improvement in this stage is due to this training strategy rather than additional data and more training epochs. 2.Regarding the TORA-corpus proposed in this paper, more detailed information is needed regarding the data construction process, quality evaluation, and dataset statistics.
- The paper is easy to follow - TORA achieves good performance on math datasets
- **Limited of technical novelty**: - Using imitation learning to improve the mathematical reasoning ability of open-source models has been proposed in many recent works, e.g., - Scaling relationship on learning mathematical reasoning with large language models, https://arxiv.org/abs/2308.01825 - WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct, https://arxiv.org/abs/2308.09583 - MetaMath: Bootstrap Your Own Mathematical Question
Code & Models
- 🤗llm-agents/tora-code-7b-v1.0model· 807 dl· ♡ 18807 dl♡ 18
- 🤗llm-agents/tora-7b-v1.0model· 788 dl· ♡ 8788 dl♡ 8
- 🤗llm-agents/tora-code-13b-v1.0model· 771 dl· ♡ 15771 dl♡ 15
- 🤗llm-agents/tora-13b-v1.0model· 775 dl· ♡ 6775 dl♡ 6
- 🤗llm-agents/tora-code-34b-v1.0model· 998 dl· ♡ 14998 dl♡ 14
- 🤗llm-agents/tora-70b-v1.0model· 868 dl· ♡ 21868 dl♡ 21
- 🤗TheBloke/tora-13B-v1.0-GGUFmodel· 144 dl· ♡ 3144 dl♡ 3
- 🤗TheBloke/tora-13B-v1.0-GPTQmodel· 6 dl6 dl
- 🤗TheBloke/tora-13B-v1.0-AWQmodel· 4 dl4 dl
- 🤗TheBloke/tora-70B-v1.0-GGUFmodel· 59 dl· ♡ 259 dl♡ 2
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Educational Games and Gamification · Artificial Intelligence in Games
MethodsMulti-Head Attention · Attention Is All You Need · Dense Connections · Linear Layer · Label Smoothing · Absolute Position Encodings · Adam · Residual Connection · Layer Normalization · Softmax
