A Toolbox, Not a Hammer -- Multi-TAG: Scaling Math Reasoning with Multi-Tool Aggregation
Bohan Yao, Vikas Yadav

TL;DR
Multi-TAG introduces a multi-tool aggregation framework that enables large language models to invoke and combine multiple tools simultaneously during math reasoning, significantly improving accuracy on complex benchmarks without fine-tuning.
Contribution
It proposes a novel, fine-tuning-free framework allowing LLMs to use multiple tools concurrently for complex math reasoning, outperforming existing single-tool methods.
Findings
Outperforms state-of-the-art baselines by 6.0% to 7.5% on multiple benchmarks.
Works with both open-weight and proprietary LLMs without fine-tuning.
Enhances reasoning robustness and accuracy through multi-tool aggregation.
Abstract
Augmenting large language models (LLMs) with external tools is a promising avenue for developing high-performance mathematical reasoning systems. Prior tool-augmented approaches typically finetune an LLM to select and invoke a single tool at each reasoning step and show promising results on simpler math reasoning benchmarks such as GSM8K. However, these approaches struggle with more complex math problems that require precise reasoning over multiple steps. To address this limitation, in this work, we propose Multi-TAG, a Multi-Tool AGgregation-based framework. Instead of relying on a single tool, Multi-TAG guides an LLM to concurrently invoke multiple tools at each reasoning step. It then aggregates their diverse outputs to verify and refine the reasoning process, enhancing solution robustness and accuracy. Notably, Multi-TAG is a finetuning-free, inference-only framework, making it…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Mathematics, Computing, and Information Processing · Multimodal Machine Learning Applications
