GoalLadder: Incremental Goal Discovery with Vision-Language Models
Alexey Zakharov, Shimon Whiteson

TL;DR
GoalLadder introduces a novel approach leveraging vision-language models to incrementally discover goal states from natural language instructions, enabling reinforcement learning in visual environments with minimal feedback and improved success rates.
Contribution
The paper presents GoalLadder, a new method that uses VLMs and an ELO-based ranking system to incrementally identify goal states from language instructions, reducing reliance on large feedback datasets.
Findings
Achieves ~95% success rate on control and robotic tasks.
Outperforms existing methods by a significant margin.
Effectively handles noisy VLM feedback through ranking.
Abstract
Natural language can offer a concise and human-interpretable means of specifying reinforcement learning (RL) tasks. The ability to extract rewards from a language instruction can enable the development of robotic systems that can learn from human guidance; however, it remains a challenging problem, especially in visual environments. Existing approaches that employ large, pretrained language models either rely on non-visual environment representations, require prohibitively large amounts of feedback, or generate noisy, ill-shaped reward functions. In this paper, we propose a novel method, GoalLadder, that leverages vision-language models (VLMs) to train RL agents from a single language instruction in visual environments. GoalLadder works by incrementally discovering states that bring the agent closer to completing a task specified in natural language. To do so, it queries a VLM to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Reinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning
