Learning to Reach Goals via Diffusion
Vineet Jain, Siamak Ravanbakhsh

TL;DR
This paper introduces Merlin, a diffusion-based goal-conditioned reinforcement learning method that efficiently reaches goals without value functions, demonstrating superior performance and computational efficiency in offline tasks.
Contribution
Merlin is the first diffusion in state space approach for RL that requires only one denoising step per environment step, improving efficiency and effectiveness.
Findings
Outperforms state-of-the-art offline goal-reaching methods
Requires only one denoising iteration per environment step
Significantly improves computational efficiency over previous diffusion-based RL methods
Abstract
We present a novel perspective on goal-conditioned reinforcement learning by framing it within the context of denoising diffusion models. Analogous to the diffusion process, where Gaussian noise is used to create random trajectories that walk away from the data manifold, we construct trajectories that move away from potential goal states. We then learn a goal-conditioned policy to reverse these deviations, analogous to the score function. This approach, which we call Merlin, can reach specified goals from arbitrary initial states without learning a separate value function. In contrast to recent works utilizing diffusion models in offline RL, Merlin stands out as the first method to perform diffusion in the state space, requiring only one ``denoising" iteration per environment step. We experimentally validate our approach in various offline goal-reaching tasks, demonstrating substantial…
Peer Reviews
Decision·ICML 2024 Poster
Interesting dataset augmentation technique that might improve performance on some control tasks.
The trajectory stitching method is only usable if distance between two states can be defined. What if states are observed via images? Additionally what if distance between states is not indicative of their relation to one another in a sequential process. What if there are discontinuities in states? Transition from 3.2 to 4 is abrupt. No additional information on issues with offline reinforcement learning. GCSL seems to be a very important concept which is used as a baseline algorithm in this p
- The problem setting is interesting - The figures are nice and intuitively explain the ideas presented in the paper - The analogies to behavior cloning are interesting - Reduction in need for denoising steps is beneficial
- The introduction could be improved by making the exact problem setting more clear from the beginning - The nearest neighbor based approach makes the assumtion that close states are connected/ can be accessd from each other, this should be discussed. This could also be evaluated by designing a more complex toy environment based on the environment in Figure 2. - The related work description of Janner et all is not exactly correct, as it is not full trajectories that are noised but just trajector
1.Novel perspective of framing goal-reaching as a diffusion process. 2.Trajectory stitching technique seems useful for generating diverse state-goal pairs from offline data. 3.Strong empirical results on offline goal-reaching tasks compared to prior methods.
1.Although the paper seems to describe a feasible diffusion-like process to model the GCRL problem, I think merlin is essentially a variant of constrained GCSL. From this perspective, merlin has only limited novelty. Start with the cleanest method, merlin build policy upon $s, g, h$ instead of $s, g$ by GCSL. Although the merlin shows better results in the motivation example, I think it's because of the inclusion of a more stable time guide. 2.I observe that Merlin-NP and Merlin-P show better re
1. I like the high level idea of this work which is inspired from diffusion models: constructing a simple forward process to enlarge the training set by injecting noise, and learning the reverse process. Specifically, they use the Nearest-neighbor Trajectory Stitching to generate more data. The algorithm is somewhat novel and might work well on some tasks. 2. Competitive results: The authors validate their approach on offline goal-reaching tasks and show competitive results with state-of-the-a
1. Weak theoretical justification: diffusion models enjoy strong theoretical foundations, the forward and the backward process are proven to share the same marginal distribution. However, it is not clear to me whether the backward process of Nearest-neighbor Trajectory Stitching still has similar theoretical guarantees. 2. Limited range of applications: Nearest-neighbor Trajectory Stitching seems to be designed for some specific applications. The generalizability remains unclear. 3. Misleadin
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Model Reduction and Neural Networks
MethodsDiffusion
