Implicit Hierarchical GRPO: Decoupling Tool Invocation from Execution for Tool-Integrated Mathematical Reasoning

Li Wang; Xiaohan Wang; Xiaodong Lu; Zipeng Zhang; Jinyang Wu; Jiajun Chai; Wei Lin; Guojun Yin

arXiv:2605.18500·cs.CL·May 19, 2026

Implicit Hierarchical GRPO: Decoupling Tool Invocation from Execution for Tool-Integrated Mathematical Reasoning

Li Wang, Xiaohan Wang, Xiaodong Lu, Zipeng Zhang, Jinyang Wu, Jiajun Chai, Wei Lin, Guojun Yin

PDF

1 Repo

TL;DR

This paper introduces a novel hierarchical control framework for large language models that decouples tool invocation from execution, improving mathematical reasoning performance across multiple benchmarks.

Contribution

It formalizes the problem of decoupling tool invocation from execution and proposes a hierarchical control framework with a new learning algorithm, IH-GRPO.

Findings

01

Achieved up to 2.53% absolute improvement on mathematical reasoning benchmarks.

02

Demonstrated consistent performance gains across various domains.

03

Proposed a surrogate loss enabling implicit hierarchical policy learning.

Abstract

Large language models (LLMs) have increasingly leveraged tool invocation to enhance their reasoning capabilities. However, existing approaches typically tightly couple tool invocation with immediate execution. Such immediate tool interaction may disrupt the reasoning coherence of LLMs and constrain their expressivity, ultimately degrading reasoning performance. To this end, for the first time, we propose and formalize the problem of decoupling tool invocation from execution during reasoning, and introduce delayed execution with explicit control to enhance tool-integrated reasoning (TIR). Furthermore, we propose a hierarchical control framework and theoretically derive a surrogate loss that enables an implicitly hierarchical policy to learn behavior equivalent to that of an explicit hierarchical policy, leading to the proposed IH-GRPO algorithm. Extensive experiments on IH-GRPO achieve…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Lumina04/IH-GRPO-01
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.