Drift-Based Policy Optimization: Native One-Step Policy Learning for Online Robot Control
Yuxuan Gao, Yedong Shen, Shiqi Zhang, Wenhao Yu, Yifan Duan, Jia pan, Jiajia Wu, Jiajun Deng, Yanyong Zhang

TL;DR
This paper introduces a novel one-step generative policy framework for online robot control that internalizes iterative refinement into model training, achieving high-frequency control with performance comparable to multi-step policies.
Contribution
The authors propose Drift-Based Policy (DBP) and Drift-Based Policy Optimization (DBPO), enabling fast, stable, and multimodal policy learning suitable for real-time robotic applications.
Findings
DBP achieves up to 100x faster inference than multi-step diffusion policies.
DBP matches or exceeds performance of multi-step policies on manipulation benchmarks.
DBPO enables stable online policy improvement in real-world robot experiments.
Abstract
Although multi-step generative policies achieve strong performance in robotic manipulation by modeling multimodal action distributions, they require multi-step iterative denoising at inference time. Each action therefore needs tens to hundreds of network function evaluations (NFEs), making them costly for high-frequency closed-loop control and online reinforcement learning (RL). To address this limitation, we propose a two-stage framework for native one-step generative policies that shifts refinement from inference to training. First, we introduce the Drift-Based Policy (DBP), which leverages fixed-point drifting objectives to internalize iterative refinement into the model parameters, yielding a one-step generative backbone by design while preserving multimodal action modeling capacity. Second, we develop Drift-Based Policy Optimization (DBPO), an online RL framework that equips the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
