Posterior Optimization with Clipped Objective for Bridging Efficiency and Stability in Generative Policy Learning

Yuhui Chen; Haoran Li; Zhennan Jiang; Yuxing Qin; Yuxuan Wan; Weiheng Liu; Dongbin Zhao

arXiv:2604.01860·cs.RO·April 3, 2026

Posterior Optimization with Clipped Objective for Bridging Efficiency and Stability in Generative Policy Learning

Yuhui Chen, Haoran Li, Zhennan Jiang, Yuxing Qin, Yuxuan Wan, Weiheng Liu, Dongbin Zhao

PDF

1 Repo

TL;DR

POCO is a novel RL framework that improves policy training stability and efficiency by formulating policy improvement as a posterior inference problem, effectively scaling to large models and real-world tasks.

Contribution

It introduces POCO, a posterior inference-based RL method with an offline-to-online paradigm, enabling stable, efficient fine-tuning of expressive generative policies for robotics.

Findings

01

POCO prevents catastrophic policy collapse in complex tasks.

02

It outperforms state-of-the-art baselines across benchmarks.

03

Achieves 96.7% success rate on real-world contact-rich tasks.

Abstract

Expressive generative models have advanced robotic manipulation by capturing complex, multi-modal action distributions over temporally extended trajectories. However, fine-tuning these policies via RL remains challenging due to instability and sample inefficiency. We introduce Posterior Optimization with Clipped Objective (POCO), a principled RL framework that formulates policy improvement as a posterior inference problem tailored for temporal action chunks. Through an Expectation-Maximization procedure, POCO distills a reward-weighted implicit posterior into the policy without likelihood estimation. Furthermore, POCO adopts an offline-to-online paradigm that anchors online exploration to pre-trained priors, and its model-agnostic design scales to fine-tune large VLA models without architectural modifications. Evaluations across 7 simulation benchmarks and 4 contact-rich real-world…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://cccedric.github.io/poco
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.