Aligning Diffusion Behaviors with Q-functions for Efficient Continuous Control
Huayu Chen, Kaiwen Zheng, Hang Su, Jun Zhu

TL;DR
This paper introduces a novel offline reinforcement learning approach that leverages diffusion models and Q-function alignment to improve continuous control, achieving superior performance with minimal labeled data.
Contribution
It proposes Efficient Diffusion Alignment (EDA), a new method combining diffusion models with Q-function alignment for improved offline continuous control.
Findings
EDA outperforms all baselines on D4RL benchmark.
Maintains 95% performance with only 1% Q-labeled data.
Diffusion policies enable effective behavior modeling and adaptation.
Abstract
Drawing upon recent advances in language model alignment, we formulate offline Reinforcement Learning as a two-stage optimization problem: First pretraining expressive generative policies on reward-free behavior datasets, then fine-tuning these policies to align with task-specific annotations like Q-values. This strategy allows us to leverage abundant and diverse behavior data to enhance generalization and enable rapid adaptation to downstream tasks using minimal annotations. In particular, we introduce Efficient Diffusion Alignment (EDA) for solving continuous control problems. EDA utilizes diffusion models for behavior modeling. However, unlike previous approaches, we represent diffusion policies as the derivative of a scalar neural network with respect to action inputs. This representation is critical because it enables direct density calculation for diffusion models, making them…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Control Systems Optimization
MethodsALIGN · Diffusion
