Vision-Language Models are Zero-Shot Reward Models for Reinforcement   Learning

Juan Rocamonde; Victoriano Montesinos; Elvis Nava; Ethan Perez; David; Lindner

arXiv:2310.12921·cs.LG·March 15, 2024·6 cites

Vision-Language Models are Zero-Shot Reward Models for Reinforcement Learning

Juan Rocamonde, Victoriano Montesinos, Elvis Nava, Ethan Perez, David, Lindner

PDF

Open Access 1 Repo

TL;DR

This paper demonstrates that large pretrained vision-language models can serve as effective zero-shot reward models for reinforcement learning, enabling complex task learning from minimal natural language prompts without manual reward engineering.

Contribution

Introducing VLM-RMs, a novel method using pretrained VLMs as zero-shot reward models for RL, reducing the need for manual reward specification and extensive human feedback.

Findings

01

VLM-RMs successfully train agents for complex tasks with minimal prompts.

02

Larger VLMs improve reward modeling performance.

03

Performance scales with model size and training data.

Abstract

Reinforcement learning (RL) requires either manually specifying a reward function, which is often infeasible, or learning a reward model from a large amount of human feedback, which is often very expensive. We study a more sample-efficient alternative: using pretrained vision-language models (VLMs) as zero-shot reward models (RMs) to specify tasks via natural language. We propose a natural and general approach to using VLMs as reward models, which we call VLM-RMs. We use VLM-RMs based on CLIP to train a MuJoCo humanoid to learn complex tasks without a manually specified reward function, such as kneeling, doing the splits, and sitting in a lotus position. For each of these tasks, we only provide a single sentence text prompt describing the desired task with minimal prompt engineering. We provide videos of the trained agents at: https://sites.google.com/view/vlm-rm. We can improve…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

alignmentresearch/vlmrm
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics

MethodsContrastive Language-Image Pre-training