Opinion: Towards Unified Expressive Policy Optimization for Robust Robot Learning

Haidong Huang; Haiyue Zhu. Jiayu Song; Xixin Zhao; Yaohua Zhou; Jiayi Zhang; Yuze Zhai; Xiaocong Li

arXiv:2511.10087·cs.RO·November 14, 2025

Opinion: Towards Unified Expressive Policy Optimization for Robust Robot Learning

Haidong Huang, Haiyue Zhu. Jiayu Song, Xixin Zhao, Yaohua Zhou, Jiayi Zhang, Yuze Zhai, Xiaocong Li

PDF

Open Access

TL;DR

This paper introduces UEPO, a unified generative framework for offline-to-online reinforcement learning in robotics, addressing multimodal behavior coverage and distributional shifts with novel diffusion-based techniques.

Contribution

The paper presents a multi-seed diffusion policy, a dynamic divergence regularization, and a diffusion-based data augmentation, advancing robust policy optimization in robotic learning.

Findings

01

Achieves +5.9% on locomotion tasks

02

Achieves +12.4% on dexterous manipulation

03

Demonstrates strong generalization and scalability

Abstract

Offline-to-online reinforcement learning (O2O-RL) has emerged as a promising paradigm for safe and efficient robotic policy deployment but suffers from two fundamental challenges: limited coverage of multimodal behaviors and distributional shifts during online adaptation. We propose UEPO, a unified generative framework inspired by large language model pretraining and fine-tuning strategies. Our contributions are threefold: (1) a multi-seed dynamics-aware diffusion policy that efficiently captures diverse modalities without training multiple models; (2) a dynamic divergence regularization mechanism that enforces physically meaningful policy diversity; and (3) a diffusion-based data augmentation module that enhances dynamics model generalization. On the D4RL benchmark, UEPO achieves +5.9\% absolute improvement over Uni-O4 on locomotion tasks and +12.4\% on dexterous manipulation,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Domain Adaptation and Few-Shot Learning