Score-Based Diffusion Policy Compatible with Reinforcement Learning via   Optimal Transport

Mingyang Sun; Pengxiang Ding; Weinan Zhang; Donglin Wang

arXiv:2502.12631·cs.LG·February 24, 2025

Score-Based Diffusion Policy Compatible with Reinforcement Learning via Optimal Transport

Mingyang Sun, Pengxiang Ding, Weinan Zhang, Donglin Wang

PDF

Open Access 1 Repo

TL;DR

This paper introduces OTPR, a novel method combining diffusion policies with reinforcement learning using optimal transport, improving robustness and performance in complex tasks through online environment interaction.

Contribution

It proposes OTPR, integrating diffusion policies with RL via optimal transport, including masked transport and resampling strategies for stable fine-tuning.

Findings

01

OTPR outperforms existing methods in simulation tasks.

02

Enhanced robustness in complex and sparse-reward environments.

03

Effective combination of imitation learning and reinforcement learning.

Abstract

Diffusion policies have shown promise in learning complex behaviors from demonstrations, particularly for tasks requiring precise control and long-term planning. However, they face challenges in robustness when encountering distribution shifts. This paper explores improving diffusion-based imitation learning models through online interactions with the environment. We propose OTPR (Optimal Transport-guided score-based diffusion Policy for Reinforcement learning fine-tuning), a novel method that integrates diffusion policies with RL using optimal transport theory. OTPR leverages the Q-function as a transport cost and views the policy as an optimal transport map, enabling efficient and stable fine-tuning. Moreover, we introduce masked optimal transport to guide state-action matching using expert keypoints and a compatibility-based resampling strategy to enhance training stability.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sunmmyy/otpr
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTraffic control and management

MethodsDiffusion