OGPO: Sample Efficient Full-Finetuning of Generative Control Policies
Sarvesh Patil, Mitsuhiko Nakamoto, Manan Agarwal, Shashwat Saxena, Jesse Zhang, Giri Anantharaman, Cleah Winston, Chaoyi Pan, Douglas Chen, Nai-Chieh Huang, Zeynep Temel, Oliver Kroemer, Sergey Levine, Abhishek Gupta, Hongkai Da, Paarth Shah, Max Simchowitz

TL;DR
OGPO is a novel, sample-efficient off-policy algorithm that fine-tunes generative control policies for robot learning, achieving state-of-the-art results with minimal hyperparameter tuning.
Contribution
Introduces OGPO, a new off-policy finetuning method for GCPs that outperforms existing approaches and can improve poorly-initialized policies without expert data.
Findings
OGPO achieves state-of-the-art performance on manipulation tasks.
OGPO can fine-tune policies with no expert data in the replay buffer.
Proposed stabilizers improve training stability across settings.
Abstract
Generative control policies (GCPs), such as diffusion- and flow-based control policies, have emerged as effective parameterizations for robot learning. This work introduces Off-policy Generative Policy Optimization (OGPO), a sample-efficient algorithm for finetuning GCPs that maintains off-policy critic networks to maximize data reuse and propagate policy gradients through the full generative process of the policy via a modified PPO objective, using critics as the terminal reward. OGPO achieves state-of-the-art performance on manipulation tasks spanning multi-task settings, high-precision insertion, and dexterous control. To our knowledge, it is also the only method that can fine-tune poorly-initialized behavior cloning policies to near full task-success with no expert data in the online replay buffer, and does so with few task-specific hyperparameter tuning. Through extensive empirical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
