TL;DR
This paper introduces a novel hierarchical control framework for large language models that decouples tool invocation from execution, improving mathematical reasoning performance across multiple benchmarks.
Contribution
It formalizes the problem of decoupling tool invocation from execution and proposes a hierarchical control framework with a new learning algorithm, IH-GRPO.
Findings
Achieved up to 2.53% absolute improvement on mathematical reasoning benchmarks.
Demonstrated consistent performance gains across various domains.
Proposed a surrogate loss enabling implicit hierarchical policy learning.
Abstract
Large language models (LLMs) have increasingly leveraged tool invocation to enhance their reasoning capabilities. However, existing approaches typically tightly couple tool invocation with immediate execution. Such immediate tool interaction may disrupt the reasoning coherence of LLMs and constrain their expressivity, ultimately degrading reasoning performance. To this end, for the first time, we propose and formalize the problem of decoupling tool invocation from execution during reasoning, and introduce delayed execution with explicit control to enhance tool-integrated reasoning (TIR). Furthermore, we propose a hierarchical control framework and theoretically derive a surrogate loss that enables an implicitly hierarchical policy to learn behavior equivalent to that of an explicit hierarchical policy, leading to the proposed IH-GRPO algorithm. Extensive experiments on IH-GRPO achieve…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
