Learning from the Irrecoverable: Error-Localized Policy Optimization for Tool-Integrated LLM Reasoning

Qiao Liang; Yuke Zhu; Chao Ge; Lei Yang; Ying Shen; Bo Zheng; Sheng Guo

arXiv:2602.09598·cs.CL·February 11, 2026

Learning from the Irrecoverable: Error-Localized Policy Optimization for Tool-Integrated LLM Reasoning

Qiao Liang, Yuke Zhu, Chao Ge, Lei Yang, Ying Shen, Bo Zheng, Sheng Guo

PDF

Open Access

TL;DR

This paper introduces ELPO, a method that improves tool-integrated reasoning in LLMs by localizing irrecoverable errors for better credit assignment, leading to significant performance gains across multiple benchmarks.

Contribution

ELPO is a novel approach that localizes irrecoverable errors in long-horizon reasoning tasks and leverages hierarchical advantage attribution for improved policy optimization.

Findings

01

ELPO outperforms strong RL baselines on TIR benchmarks.

02

ELPO improves Pass@K and Major@K scaling metrics.

03

ELPO enhances rollout ranking quality and tool-call efficiency.

Abstract

Tool-integrated reasoning (TIR) enables LLM agents to solve tasks through planning, tool use, and iterative revision, but outcome-only reinforcement learning in this setting suffers from sparse, delayed rewards and weak step-level credit assignment. In long-horizon TIR trajectories, an early irrecoverable mistake can determine success or failure, making it crucial to localize the first irrecoverable step and leverage it for fine-grained credit assignment. We propose Error-Localized Policy Optimization (ELPO), which localizes the first irrecoverable step via binary-search rollout trees under a fixed rollout budget, converts the resulting tree into stable learning signals through hierarchical advantage attribution, and applies error-localized adaptive clipping to strengthen corrective updates on the critical step and its suffix. Across TIR benchmarks in math, science QA, and code…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI) · Machine Learning and Data Classification