TL;DR
This paper introduces FoG, a skill discovery method that uses foundation models to incorporate human preferences, enabling reinforcement learning agents to avoid undesirable behaviors and focus on desirable skills.
Contribution
FoG is the first approach to integrate foundation model-based human preferences into skill discovery, improving safety and alignment in reinforcement learning.
Findings
FoG effectively eliminates undesirable behaviors in robotic skills.
FoG successfully incorporates human preferences into skill learning.
FoG discovers complex behaviors aligned with human intentions.
Abstract
Learning diverse skills without hand-crafted reward functions could accelerate reinforcement learning in downstream tasks. However, existing skill discovery methods focus solely on maximizing the diversity of skills without considering human preferences, which leads to undesirable behaviors and possibly dangerous skills. For instance, a cheetah robot trained using previous methods learns to roll in all directions to maximize skill diversity, whereas we would prefer it to run without flipping or entering hazardous areas. In this work, we propose a Foundation model Guided (FoG) skill discovery method, which incorporates human intentions into skill discovery through foundation models. Specifically, FoG extracts a score function from foundation models to evaluate states based on human intentions, assigning higher values to desirable states and lower to undesirable ones. These scores are…
Peer Reviews
Decision·Submitted to ICLR 2025
Generally, the paper is well-written and the underlying idea is fairly straightforward. The authors have adopted the METRA framework, and can thus handle image inputs
In terms of the claimed novelties, one of them is that FoG can learn behaviors that are challenging to define. I do not believe this is a new phenomenon – it is fairly well documented in preference based RL works. In the same light, I believe the work has overlooked some very relevant works on safe/guided skill discovery which I believe could be used as baselines or at least discussed in detail.
The paper is well written and easy to follow. The experiments show that the proposed method successfully achieve the proposed goal -- learning diverse skills while avoiding undesirable behavior, on both object state and image space.
Though learning diverse skills while avoiding undesirable behaviors is a valid motivation, using foundation models simply as an undesirable behavior recoginizer seems to be a weird use case. In the case that we already use foundation models, it seem to me a more natural way is to let foundation models propose some tasks and safety reward (if applicable), examples include but are not limited to [1] and [2]. This allows skill discovery to directly target at learning more semantically-meaningful sk
The paper is generally well-organized, and the description of the FoG approach and its score function is detailed.
**The methodology is the same as DoDont.** The core methodology of FoG closely resembles the DoDont approach, as both employ a score function derived from human-provided preferences, especially the form $\max\sum p(s,s') \|\phi(s') - \phi(s)\| z, \quad\text{s.t.} \|\phi(s')-\phi(s)\|\le1 $. Although FoG uses $f(s')$ in its objective, i.e., $\sum f(s') \|\phi(s') - \phi(s)\| z, \quad\text{s.t.} \|\phi(s')-\phi(s)\|\le1$, $f(s')$ is just a special case of $p(s,s')$. The similarity to DoDont weake
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
