Code as Reward: Empowering Reinforcement Learning with VLMs

David Venuto; Sami Nur Islam; Martin Klissarov; Doina Precup; Sherry; Yang; Ankit Anand

arXiv:2402.04764·cs.LG·February 8, 2024·1 cites

Code as Reward: Empowering Reinforcement Learning with VLMs

David Venuto, Sami Nur Islam, Martin Klissarov, Doina Precup, Sherry, Yang, Ankit Anand

PDF

Open Access

TL;DR

This paper introduces VLM-CaR, a framework that leverages pre-trained Vision-Language Models to generate dense, accurate rewards for reinforcement learning, improving training efficiency and effectiveness across various environments.

Contribution

The paper proposes a novel method to use VLMs for reward generation in RL by converting their outputs into dense rewards via code, reducing computational costs.

Findings

01

VLM-CaR produces accurate dense rewards across diverse environments.

02

Dense rewards from VLM-CaR outperform sparse environment rewards in training RL policies.

03

The approach significantly reduces the computational burden of using VLMs in RL.

Abstract

Pre-trained Vision-Language Models (VLMs) are able to understand visual concepts, describe and decompose complex tasks into sub-tasks, and provide feedback on task completion. In this paper, we aim to leverage these capabilities to support the training of reinforcement learning (RL) agents. In principle, VLMs are well suited for this purpose, as they can naturally analyze image-based observations and provide feedback (reward) on learning progress. However, inference in VLMs is computationally expensive, so querying them frequently to compute rewards would significantly slowdown the training of an RL agent. To address this challenge, we propose a framework named Code as Reward (VLM-CaR). VLM-CaR produces dense reward functions from VLMs through code generation, thereby significantly reducing the computational burden of querying the VLM directly. We show that the dense rewards generated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsElevator Systems and Control · Software Engineering Research

MethodsSparse Evolutionary Training