TL;DR
DyDiff introduces a diffusion-based method for long-horizon trajectory rollout in offline reinforcement learning, effectively injecting policy information into dynamics models to improve accuracy and consistency.
Contribution
The paper proposes DyDiff, a novel diffusion model approach that decouples dynamics learning from policy, enabling accurate long-horizon rollouts in offline RL.
Findings
DyDiff achieves superior long-horizon rollout accuracy.
It maintains policy consistency during rollouts.
Theoretical analysis shows advantages over traditional models.
Abstract
With the great success of diffusion models (DMs) in generating realistic synthetic vision data, many researchers have investigated their potential in decision-making and control. Most of these works utilized DMs to sample directly from the trajectory space, where DMs can be viewed as a combination of dynamics models and policies. In this work, we explore how to decouple DMs' ability as dynamics models in fully offline settings, allowing the learning policy to roll out trajectories. As DMs learn the data distribution from the dataset, their intrinsic policy is actually the behavior policy induced from the dataset, which results in a mismatch between the behavior policy and the learning policy. We propose Dynamics Diffusion, short as DyDiff, which can inject information from the learning policy to DMs iteratively. DyDiff ensures long-horizon rollout accuracy while maintaining policy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTraffic control and management · Autonomous Vehicle Technology and Safety · Vehicle Dynamics and Control Systems
MethodsDiffusion
