FuRL: Visual-Language Models as Fuzzy Rewards for Reinforcement Learning

Yuwei Fu; Haichao Zhang; Di Wu; Wei Xu; Benoit Boulet

arXiv:2406.00645·cs.LG·June 6, 2024·1 cites

FuRL: Visual-Language Models as Fuzzy Rewards for Reinforcement Learning

Yuwei Fu, Haichao Zhang, Di Wu, Wei Xu, Benoit Boulet

PDF

Open Access 1 Repo

TL;DR

This paper introduces FuRL, a method that fine-tunes pre-trained visual-language models to serve as fuzzy rewards in reinforcement learning, improving performance on sparse reward tasks.

Contribution

We propose a lightweight fine-tuning approach for VLMs as reward signals in RL, addressing reward misalignment and enhancing baseline agent performance.

Findings

01

Improved SAC/DrQ agent performance on sparse reward tasks.

02

Effective fine-tuning of VLMs for reward alignment.

03

Successful application on Meta-world benchmark tasks.

Abstract

In this work, we investigate how to leverage pre-trained visual-language models (VLM) for online Reinforcement Learning (RL). In particular, we focus on sparse reward tasks with pre-defined textual task descriptions. We first identify the problem of reward misalignment when applying VLM as a reward in RL tasks. To address this issue, we introduce a lightweight fine-tuning method, named Fuzzy VLM reward-aided RL (FuRL), based on reward alignment and relay RL. Specifically, we enhance the performance of SAC/DrQ baseline agents on sparse reward tasks by fine-tuning VLM representations and using relay RL to avoid local minima. Extensive experiments on the Meta-world benchmark tasks demonstrate the efficacy of the proposed method. Code is available at: https://github.com/fuyw/FuRL.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

fuyw/furl
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics

MethodsFocus