STEP: Success-Rate-Aware Trajectory-Efficient Policy Optimization

Yuhan Chen; Yuxuan Liu; Long Zhang; Pengzhi Gao; Jian Luan; Wei Liu

arXiv:2511.13091·cs.AI·November 18, 2025

STEP: Success-Rate-Aware Trajectory-Efficient Policy Optimization

Yuhan Chen, Yuxuan Liu, Long Zhang, Pengzhi Gao, Jian Luan, Wei Liu

PDF

Open Access

TL;DR

STEP introduces a success-rate-aware, step-level policy optimization framework that enhances sample efficiency and stability in reinforcement learning by adaptively resampling and refining updates based on task difficulty.

Contribution

It proposes a novel framework that dynamically allocates sampling effort based on success rates and performs step-level optimization, improving over traditional trajectory-level methods.

Findings

01

Significantly improves sample efficiency and training stability.

02

Converges faster and generalizes better under the same sampling budget.

03

Outperforms trajectory-level GRPO in experiments.

Abstract

Multi-turn interaction remains challenging for online reinforcement learning. A common solution is trajectory-level optimization, which treats each trajectory as a single training sample. However, this approach can be inefficient and yield misleading learning signals: it applies uniform sampling across tasks regardless of difficulty, penalizes correct intermediate actions in failed trajectories, and incurs high sample-collection costs. To address these issues, we propose STEP (Success-rate-aware Trajectory-Efficient Policy optimization), a framework that dynamically allocates sampling based on per-task success rates and performs step-level optimization. STEP maintains a smoothed success-rate record to guide adaptive trajectory resampling, allocating more effort to harder tasks. It then computes success-rate-weighted advantages and decomposes trajectories into step-level samples.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Autonomous Vehicle Technology and Safety