Drifting Field Policy: A One-Step Generative Policy via Wasserstein Gradient Flow
Juil Koo, Mingue Park, Jiwon Choi, Yunhong Min, Minhyuk Sung

TL;DR
The paper introduces Drifting Field Policy (DFP), a novel one-step generative policy using Wasserstein gradient flow, achieving state-of-the-art results in manipulation tasks.
Contribution
It presents a new non-ODE, one-step generative policy framework based on Wasserstein gradient flow, with a tractable surrogate loss for efficient learning.
Findings
DFP outperforms ODE-based policies on manipulation benchmarks.
The surrogate loss simplifies policy updates while maintaining performance.
Empirical results demonstrate the effectiveness of the drifting model paradigm.
Abstract
We propose Drifting Field Policy (DFP), a non-ODE one-step generative policy built on the drifting model paradigm. We frame the policy update as a reverse-KL Wasserstein-2 gradient flow toward a soft target policy, so that each DFP update corresponds to a gradient step in probability space. By construction, this gradient is decomposed into an ascent toward higher action-value regions and a score matching with the anchor policy as a trust region. We further derive a simple, tractable surrogate of the otherwise intractable update loss, akin to behavior cloning on top-K critic-selected actions. We find empirically that this mechanism uniquely benefits the drifting backbone owing to its non-ODE parameterization. With one-step inference, DFP achieves state-of-the-art performance on several manipulation tasks across Robomimic and OGBench, outperforming ODE-based policies.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
