Verified Critical Step Optimization for LLM Agents

Mukai Li; Qingcheng Zeng; Tianqing Fang; Zhenwen Liang; Linfeng Song; Qi Liu; Haitao Mi; Dong Yu

arXiv:2602.03412·cs.CL·April 30, 2026

Verified Critical Step Optimization for LLM Agents

Mukai Li, Qingcheng Zeng, Tianqing Fang, Zhenwen Liang, Linfeng Song, Qi Liu, Haitao Mi, Dong Yu

PDF

TL;DR

The paper introduces Critical Step Optimization (CSO), a new method for improving large language model agents by focusing on verified critical decision points, leading to significant performance gains with less supervision.

Contribution

CSO is a novel approach that targets critical decision steps using verified supervision, starting from failed trajectories and leveraging expert models for high-quality alternatives.

Findings

01

CSO achieves 37% and 26% relative improvements on GAIA-Text-103 and XBench-DeepSearch.

02

It outperforms other post-training methods while supervising only 16% of trajectory steps.

03

The method enhances policy robustness by focusing on verifiable critical decisions.

Abstract

As large language model agents tackle increasingly complex long-horizon tasks, effective post-training becomes critical. Prior work faces fundamental challenges: outcome-only rewards fail to precisely attribute credit to intermediate steps, estimated step-level rewards introduce systematic noise, and Monte Carlo sampling approaches for step reward estimation incur prohibitive computational cost. Inspired by findings that only a small fraction of high-entropy tokens drive effective RL for reasoning, we propose Critical Step Optimization (CSO), which focuses preference learning on verified critical steps, decision points where alternate actions demonstrably flip task outcomes from failure to success. Crucially, our method starts from failed policy trajectories rather than expert demonstrations, directly targeting the policy model's weaknesses. We use a process reward model (PRM) to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.