Drift-Based Policy Optimization: Native One-Step Policy Learning for Online Robot Control

Yuxuan Gao; Yedong Shen; Shiqi Zhang; Wenhao Yu; Yifan Duan; Jia pan; Jiajia Wu; Jiajun Deng; Yanyong Zhang

arXiv:2604.03540·cs.RO·April 22, 2026

Drift-Based Policy Optimization: Native One-Step Policy Learning for Online Robot Control

Yuxuan Gao, Yedong Shen, Shiqi Zhang, Wenhao Yu, Yifan Duan, Jia pan, Jiajia Wu, Jiajun Deng, Yanyong Zhang

PDF

TL;DR

This paper introduces a novel one-step generative policy framework for online robot control that internalizes iterative refinement into model training, achieving high-frequency control with performance comparable to multi-step policies.

Contribution

The authors propose Drift-Based Policy (DBP) and Drift-Based Policy Optimization (DBPO), enabling fast, stable, and multimodal policy learning suitable for real-time robotic applications.

Findings

01

DBP achieves up to 100x faster inference than multi-step diffusion policies.

02

DBP matches or exceeds performance of multi-step policies on manipulation benchmarks.

03

DBPO enables stable online policy improvement in real-world robot experiments.

Abstract

Although multi-step generative policies achieve strong performance in robotic manipulation by modeling multimodal action distributions, they require multi-step iterative denoising at inference time. Each action therefore needs tens to hundreds of network function evaluations (NFEs), making them costly for high-frequency closed-loop control and online reinforcement learning (RL). To address this limitation, we propose a two-stage framework for native one-step generative policies that shifts refinement from inference to training. First, we introduce the Drift-Based Policy (DBP), which leverages fixed-point drifting objectives to internalize iterative refinement into the model parameters, yielding a one-step generative backbone by design while preserving multimodal action modeling capacity. Second, we develop Drift-Based Policy Optimization (DBPO), an online RL framework that equips the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.