Aligning Diffusion Behaviors with Q-functions for Efficient Continuous   Control

Huayu Chen; Kaiwen Zheng; Hang Su; Jun Zhu

arXiv:2407.09024·cs.LG·October 31, 2024

Aligning Diffusion Behaviors with Q-functions for Efficient Continuous Control

Huayu Chen, Kaiwen Zheng, Hang Su, Jun Zhu

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel offline reinforcement learning approach that leverages diffusion models and Q-function alignment to improve continuous control, achieving superior performance with minimal labeled data.

Contribution

It proposes Efficient Diffusion Alignment (EDA), a new method combining diffusion models with Q-function alignment for improved offline continuous control.

Findings

01

EDA outperforms all baselines on D4RL benchmark.

02

Maintains 95% performance with only 1% Q-labeled data.

03

Diffusion policies enable effective behavior modeling and adaptation.

Abstract

Drawing upon recent advances in language model alignment, we formulate offline Reinforcement Learning as a two-stage optimization problem: First pretraining expressive generative policies on reward-free behavior datasets, then fine-tuning these policies to align with task-specific annotations like Q-values. This strategy allows us to leverage abundant and diverse behavior data to enhance generalization and enable rapid adaptation to downstream tasks using minimal annotations. In particular, we introduce Efficient Diffusion Alignment (EDA) for solving continuous control problems. EDA utilizes diffusion models for behavior modeling. However, unlike previous approaches, we represent diffusion policies as the derivative of a scalar neural network with respect to action inputs. This representation is critical because it enables direct density calculation for diffusion models, making them…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

thu-ml/efficient-diffusion-alignment
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Control Systems Optimization

MethodsALIGN · Diffusion