Robust Tool Use via Fission-GRPO: Learning to Recover from Execution Errors
Zhiwei Zhang, Fei Zhao, Rui Wang, Zezhong Wang, Bin Liang, Jiakang Wang, Yao Hu, Shaosheng Cao, Kam-Fai Wong

TL;DR
Fission-GRPO is a novel RL framework that enhances large language models' ability to recover from execution errors by learning from on-policy corrective feedback, significantly improving multi-turn tool use accuracy.
Contribution
The paper introduces Fission-GRPO, a new method that converts execution errors into on-policy training signals, enabling models to learn effective recovery strategies from their own failures.
Findings
Fission-GRPO improves error recovery rate by 5.7% on BFCL v4 Multi-Turn.
Overall accuracy increases by 4.0% from 42.75% to 46.75%.
Achieves up to +17.4% gains across multiple benchmarks.
Abstract
Large language models (LLMs) can call tools effectively, yet they remain brittle in multi-turn execution: after a tool-call error, smaller models often fall into repetitive invalid re-invocations instead of interpreting the feedback and recovering. This failure mode persists because current training paradigms do not explicitly teach models how to recover from execution errors. In particular, standard reinforcement learning (RL) collapses rich failure experience into sparse negative rewards, while pre-collected error-correction datasets become mismatched to the policy's evolving failure modes. To bridge this gap, we propose Fission-GRPO, a framework that converts execution errors into on-policy corrective supervision within the RL training loop. Our core mechanism fissions each failed trajectory into a new training instance by augmenting it with diagnostic feedback from a fine-tuned…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
