TL;DR
AutoTIR introduces a reinforcement learning framework that enables large language models to autonomously select and utilize external tools during reasoning, improving performance and generalization across diverse tasks.
Contribution
It presents a novel RL-based approach allowing LLMs to adaptively decide on tool use, moving beyond static, predefined tool strategies.
Findings
AutoTIR outperforms baseline methods on multiple tasks.
It demonstrates strong generalization in tool-use behavior.
The approach enhances reasoning accuracy and efficiency.
Abstract
Large Language Models (LLMs), when enhanced through reasoning-oriented post-training, evolve into powerful Large Reasoning Models (LRMs). Tool-Integrated Reasoning (TIR) further extends their capabilities by incorporating external tools, but existing methods often rely on rigid, predefined tool-use patterns that risk degrading core language competence. Inspired by the human ability to adaptively select tools, we introduce AutoTIR, a reinforcement learning framework that enables LLMs to autonomously decide whether and which tool to invoke during the reasoning process, rather than following static tool-use strategies. AutoTIR leverages a hybrid reward mechanism that jointly optimizes for task-specific answer correctness, structured output adherence, and penalization of incorrect tool usage, thereby encouraging both precise reasoning and efficient tool integration. Extensive evaluations…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
