GoalLadder: Incremental Goal Discovery with Vision-Language Models

Alexey Zakharov; Shimon Whiteson

arXiv:2506.16396·cs.LG·December 15, 2025

GoalLadder: Incremental Goal Discovery with Vision-Language Models

Alexey Zakharov, Shimon Whiteson

PDF

Open Access 1 Video

TL;DR

GoalLadder introduces a novel approach leveraging vision-language models to incrementally discover goal states from natural language instructions, enabling reinforcement learning in visual environments with minimal feedback and improved success rates.

Contribution

The paper presents GoalLadder, a new method that uses VLMs and an ELO-based ranking system to incrementally identify goal states from language instructions, reducing reliance on large feedback datasets.

Findings

01

Achieves ~95% success rate on control and robotic tasks.

02

Outperforms existing methods by a significant margin.

03

Effectively handles noisy VLM feedback through ranking.

Abstract

Natural language can offer a concise and human-interpretable means of specifying reinforcement learning (RL) tasks. The ability to extract rewards from a language instruction can enable the development of robotic systems that can learn from human guidance; however, it remains a challenging problem, especially in visual environments. Existing approaches that employ large, pretrained language models either rely on non-visual environment representations, require prohibitively large amounts of feedback, or generate noisy, ill-shaped reward functions. In this paper, we propose a novel method, GoalLadder, that leverages vision-language models (VLMs) to train RL agents from a single language instruction in visual environments. GoalLadder works by incrementally discovering states that bring the agent closer to completing a task specified in natural language. To do so, it queries a VLM to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

GoalLadder: Incremental Goal Discovery with Vision-Language Models· slideslive

Taxonomy

TopicsMultimodal Machine Learning Applications · Reinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning