REFINE-DP: Diffusion Policy Fine-tuning for Humanoid Loco-manipulation via Reinforcement Learning

Zhaoyuan Gu; Yipu Chen; Zimeng Chai; Alfred Cueva; Thong Nguyen; Yifan Wu; Huishu Xue; Minji Kim; Isaac Legene; Fukang Liu; Matthew Kim; Ayan Barula; Yongxin Chen; Ye Zhao

arXiv:2603.13707·cs.RO·March 19, 2026

REFINE-DP: Diffusion Policy Fine-tuning for Humanoid Loco-manipulation via Reinforcement Learning

Zhaoyuan Gu, Yipu Chen, Zimeng Chai, Alfred Cueva, Thong Nguyen, Yifan Wu, Huishu Xue, Minji Kim, Isaac Legene, Fukang Liu, Matthew Kim, Ayan Barula, Yongxin Chen, Ye Zhao

PDF

Open Access

TL;DR

REFINE-DP introduces a hierarchical reinforcement learning framework that fine-tunes diffusion policies and low-level controllers, significantly improving humanoid loco-manipulation success rates in complex, real-world tasks.

Contribution

The paper presents REFINE-DP, a novel hierarchical RL approach that jointly optimizes diffusion policies and low-level controllers for humanoid robots, addressing distribution mismatch and improving task success.

Findings

01

Achieves over 90% success rate in simulation for loco-manipulation tasks.

02

Enables smooth autonomous operation in real-world dynamic environments.

03

Outperforms pre-trained diffusion policy baselines significantly.

Abstract

Humanoid loco-manipulation requires coordinated high-level motion plans with stable, low-level whole-body execution under complex robot-environment dynamics and long-horizon tasks. While diffusion policies (DPs) show promise for learning from demonstrations, deploying them on humanoids poses critical challenges: the motion planner trained offline is decoupled from the low-level controller, leading to poor command tracking, compounding distribution shift, and task failures. The common approach of scaling demonstration data is prohibitively expensive for high-dimensional humanoid systems. To address this challenge, we present REFINE-DP (REinforcement learning FINE-tuning of Diffusion Policy), a hierarchical framework that jointly optimizes a DP high-level planner and an RL-based low-level loco-manipulation controller. The DP is fine-tuned via a PPO-based diffusion policy gradient to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotic Locomotion and Control · Reinforcement Learning in Robotics · Robot Manipulation and Learning