Learning from Random Demonstrations: Offline Reinforcement Learning with Importance-Sampled Diffusion Models
Zeyu Fang, Tian Lan

TL;DR
This paper introduces a novel offline reinforcement learning method that uses importance-sampled diffusion models for iterative policy evaluation and world model adaptation, leading to improved performance especially with limited demonstration data.
Contribution
It presents a new approach combining guided diffusion world models with importance sampling for adaptive offline RL, addressing limitations of existing static or interaction-dependent models.
Findings
Significant performance improvements over state-of-the-art baselines.
Effective with only random or medium-expertise demonstrations.
Provides theoretical analysis of return gap bounds.
Abstract
Generative models such as diffusion have been employed as world models in offline reinforcement learning to generate synthetic data for more effective learning. Existing work either generates diffusion models one-time prior to training or requires additional interaction data to update it. In this paper, we propose a novel approach for offline reinforcement learning with closed-loop policy evaluation and world-model adaptation. It iteratively leverages a guided diffusion world model to directly evaluate the offline target policy with actions drawn from it, and then performs an importance-sampled world model update to adaptively align the world model with the updated policy. We analyzed the performance of the proposed method and provided an upper bound on the return gap between our method and the real environment under an optimal policy. The result sheds light on various factors affecting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Data Stream Mining Techniques · Evolutionary Algorithms and Applications
MethodsALIGN · Diffusion
