Guiding Skill Discovery with Foundation Models

Zhao Yang; Thomas M. Moerland; Mike Preuss; Aske Plaat; Vincent Fran\c{c}ois-Lavet; Edward S. Hu

arXiv:2510.23167·cs.AI·October 28, 2025

Guiding Skill Discovery with Foundation Models

Zhao Yang, Thomas M. Moerland, Mike Preuss, Aske Plaat, Vincent Fran\c{c}ois-Lavet, Edward S. Hu

PDF

3 Reviews

TL;DR

This paper introduces FoG, a skill discovery method that uses foundation models to incorporate human preferences, enabling reinforcement learning agents to avoid undesirable behaviors and focus on desirable skills.

Contribution

FoG is the first approach to integrate foundation model-based human preferences into skill discovery, improving safety and alignment in reinforcement learning.

Findings

01

FoG effectively eliminates undesirable behaviors in robotic skills.

02

FoG successfully incorporates human preferences into skill learning.

03

FoG discovers complex behaviors aligned with human intentions.

Abstract

Learning diverse skills without hand-crafted reward functions could accelerate reinforcement learning in downstream tasks. However, existing skill discovery methods focus solely on maximizing the diversity of skills without considering human preferences, which leads to undesirable behaviors and possibly dangerous skills. For instance, a cheetah robot trained using previous methods learns to roll in all directions to maximize skill diversity, whereas we would prefer it to run without flipping or entering hazardous areas. In this work, we propose a Foundation model Guided (FoG) skill discovery method, which incorporates human intentions into skill discovery through foundation models. Specifically, FoG extracts a score function from foundation models to evaluate states based on human intentions, assigning higher values to desirable states and lower to undesirable ones. These scores are…

Peer Reviews

Decision·Submitted to ICLR 2025

Reviewer 01Rating 5Confidence 4

Strengths

Generally, the paper is well-written and the underlying idea is fairly straightforward. The authors have adopted the METRA framework, and can thus handle image inputs

Weaknesses

In terms of the claimed novelties, one of them is that FoG can learn behaviors that are challenging to define. I do not believe this is a new phenomenon – it is fairly well documented in preference based RL works. In the same light, I believe the work has overlooked some very relevant works on safe/guided skill discovery which I believe could be used as baselines or at least discussed in detail.

Reviewer 02Rating 5Confidence 3

Strengths

The paper is well written and easy to follow. The experiments show that the proposed method successfully achieve the proposed goal -- learning diverse skills while avoiding undesirable behavior, on both object state and image space.

Weaknesses

Though learning diverse skills while avoiding undesirable behaviors is a valid motivation, using foundation models simply as an undesirable behavior recoginizer seems to be a weird use case. In the case that we already use foundation models, it seem to me a more natural way is to let foundation models propose some tasks and safety reward (if applicable), examples include but are not limited to [1] and [2]. This allows skill discovery to directly target at learning more semantically-meaningful sk

Reviewer 03Rating 3Confidence 5

Strengths

The paper is generally well-organized, and the description of the FoG approach and its score function is detailed.

Weaknesses

**The methodology is the same as DoDont.** The core methodology of FoG closely resembles the DoDont approach, as both employ a score function derived from human-provided preferences, especially the form $\max\sum p(s,s') \|\phi(s') - \phi(s)\| z, \quad\text{s.t.} \|\phi(s')-\phi(s)\|\le1 $. Although FoG uses $f(s')$ in its objective, i.e., $\sum f(s') \|\phi(s') - \phi(s)\| z, \quad\text{s.t.} \|\phi(s')-\phi(s)\|\le1$, $f(s')$ is just a special case of $p(s,s')$. The similarity to DoDont weake

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.