Reparameterization Flow Policy Optimization
Hai Zhong, Zhuoran Li, Xun Wang, and Longbo Huang

TL;DR
This paper introduces Reparameterization Flow Policy Optimization (RFO), a novel method that leverages flow policies within the RPG framework to improve sample efficiency and performance in reinforcement learning tasks involving complex dynamics.
Contribution
The paper establishes the connection between flow policies and RPG, proposes RFO with regularization for stability and exploration, and demonstrates its effectiveness on diverse tasks.
Findings
RFO achieves nearly 2x the reward of the baseline on a soft-body quadruped task.
Flow policies can be effectively integrated into RPG without intractable likelihood computations.
RFO outperforms existing methods in various locomotion and manipulation tasks.
Abstract
Reparameterization Policy Gradient (RPG) has emerged as a powerful paradigm for model-based reinforcement learning, enabling high sample efficiency by backpropagating gradients through differentiable dynamics. However, prior RPG approaches have been predominantly restricted to Gaussian policies, limiting their performance and failing to leverage recent advances in generative models. In this work, we identify that flow policies, which generate actions via differentiable ODE integration, naturally align with the RPG framework, a connection not established in prior work. However, naively exploiting this synergy proves ineffective, often suffering from training instability and a lack of exploration. We propose Reparameterization Flow Policy Optimization (RFO). RFO computes policy gradients by backpropagating jointly through the flow generation process and system dynamics, unlocking high…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Human Pose and Action Recognition · Human Motion and Animation
