WARPD: World model Assisted Reactive Policy Diffusion
Shashank Hegde, Satyajeet Das, Gautam Salhotra, Gaurav S. Sukhatme

TL;DR
WARPD introduces a novel approach to robotic control by generating closed-loop policies directly in parameter space, enabling longer horizons, robustness, and reduced inference costs compared to traditional diffusion-based methods.
Contribution
WARPD is the first method to learn behavioral distributions in parameter space for robotic policies, improving long-horizon robustness and efficiency over existing diffusion policies.
Findings
WARPD outperforms Diffusion Policy in long-horizon tasks.
WARPD achieves multitask performance comparable to DP.
WARPD requires only about 1/45th of the inference FLOPs per step.
Abstract
With the increasing availability of open-source robotic data, imitation learning has become a promising approach for both manipulation and locomotion. Diffusion models are now widely used to train large, generalized policies that predict controls or trajectories, leveraging their ability to model multimodal action distributions. However, this generality comes at the cost of larger model sizes and slower inference, an acute limitation for robotic tasks requiring high control frequencies. Moreover, Diffusion Policy (DP), a popular trajectory-generation approach, suffers from a trade-off between performance and action horizon: fewer diffusion queries lead to larger trajectory chunks, which in turn accumulate tracking errors. To overcome these challenges, we introduce WARPD (World model Assisted Reactive Policy Diffusion), a method that generates closed-loop policies (weights for neural…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
MethodsDiffusion
