STEP: Success-Rate-Aware Trajectory-Efficient Policy Optimization
Yuhan Chen, Yuxuan Liu, Long Zhang, Pengzhi Gao, Jian Luan, Wei Liu

TL;DR
STEP introduces a success-rate-aware, step-level policy optimization framework that enhances sample efficiency and stability in reinforcement learning by adaptively resampling and refining updates based on task difficulty.
Contribution
It proposes a novel framework that dynamically allocates sampling effort based on success rates and performs step-level optimization, improving over traditional trajectory-level methods.
Findings
Significantly improves sample efficiency and training stability.
Converges faster and generalizes better under the same sampling budget.
Outperforms trajectory-level GRPO in experiments.
Abstract
Multi-turn interaction remains challenging for online reinforcement learning. A common solution is trajectory-level optimization, which treats each trajectory as a single training sample. However, this approach can be inefficient and yield misleading learning signals: it applies uniform sampling across tasks regardless of difficulty, penalizes correct intermediate actions in failed trajectories, and incurs high sample-collection costs. To address these issues, we propose STEP (Success-rate-aware Trajectory-Efficient Policy optimization), a framework that dynamically allocates sampling based on per-task success rates and performs step-level optimization. STEP maintains a smoothed success-rate record to guide adaptive trajectory resampling, allocating more effort to harder tasks. It then computes success-rate-weighted advantages and decomposes trajectories into step-level samples.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Autonomous Vehicle Technology and Safety
