ATraDiff: Accelerating Online Reinforcement Learning with Imaginary Trajectories
Qianlan Yang, Yu-Xiong Wang

TL;DR
ATraDiff introduces a diffusion-based generative model to augment data in online reinforcement learning, improving efficiency and performance especially in complex environments by handling trajectory variations and distribution shifts.
Contribution
The paper presents ATraDiff, a novel adaptive diffusion model that generates synthetic trajectories to enhance online RL, addressing limitations of fixed offline data knowledge.
Findings
Achieves state-of-the-art results across various environments.
Effectively handles varying trajectory lengths and distribution shifts.
Significantly improves performance in complex RL settings.
Abstract
Training autonomous agents with sparse rewards is a long-standing problem in online reinforcement learning (RL), due to low data efficiency. Prior work overcomes this challenge by extracting useful knowledge from offline data, often accomplished through the learning of action distribution from offline data and utilizing the learned distribution to facilitate online RL. However, since the offline data are given and fixed, the extracted knowledge is inherently limited, making it difficult to generalize to new tasks. We propose a novel approach that leverages offline data to learn a generative diffusion model, coined as Adaptive Trajectory Diffuser (ATraDiff). This model generates synthetic trajectories, serving as a form of data augmentation and consequently enhancing the performance of online RL methods. The key strength of our diffuser lies in its adaptability, allowing it to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications · Modular Robots and Swarm Intelligence
MethodsDiffusion
