TL;DR
OctoTools is a versatile, training-free multi-agent framework that enhances complex reasoning across diverse domains by integrating standardized tools, planning, and execution, outperforming existing methods.
Contribution
It introduces standardized tool cards, a multi-level planner, and an executor, enabling effective, extensible reasoning without additional training across 16 diverse tasks.
Findings
Achieved 9.3% accuracy improvement over GPT-4o.
Outperformed AutoGen, GPT-Functions, and LangChain by up to 10.6%.
Demonstrated robustness and effectiveness in noisy and varied environments.
Abstract
Solving complex reasoning tasks may involve visual understanding, domain knowledge retrieval, numerical calculation, and multi-step reasoning. Existing methods augment large language models (LLMs) with external tools but are restricted to specialized domains, limited tool types, or require additional training data. In this paper, we introduce OctoTools, a training-free, user-friendly, and easily extensible multi-agent framework designed to tackle complex reasoning across diverse domains. OctoTools introduces standardized tool cards to encapsulate tool functionality, a planner for both high-level and low-level planning, and an executor to carry out tool usage. We validate OctoTools' generality across 16 diverse tasks (including MathVista, MMLU-Pro, MedQA, and GAIA-Text), achieving substantial average accuracy gains of 9.3% over GPT-4o. Furthermore, OctoTools also outperforms AutoGen,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
