VIRAL: Vision-grounded Integration for Reward design And Learning

Valentin Cuzin-Rambaud; Emilien Komlenovic; Alexandre Faure; Bruno Yun

arXiv:2505.22092·cs.AI·October 29, 2025

VIRAL: Vision-grounded Integration for Reward design And Learning

Valentin Cuzin-Rambaud, Emilien Komlenovic, Alexandre Faure, Bruno Yun

PDF

Open Access 1 Repo

TL;DR

VIRAL is a novel pipeline that uses multi-modal large language models to generate and refine reward functions for reinforcement learning, improving alignment with human intent and accelerating learning in various environments.

Contribution

It introduces VIRAL, a new method for autonomous reward function creation and refinement using multi-modal LLMs, enhancing AI alignment and learning efficiency.

Findings

01

VIRAL accelerates learning of new behaviors in Gym environments.

02

It improves alignment with user intent through interactive reward refinement.

03

VIRAL demonstrates effective use of multi-modal LLMs for reward generation.

Abstract

The alignment between humans and machines is a critical challenge in artificial intelligence today. Reinforcement learning, which aims to maximize a reward function, is particularly vulnerable to the risks associated with poorly designed reward functions. Recent advancements has shown that Large Language Models (LLMs) for reward generation can outperform human performance in this context. We introduce VIRAL, a pipeline for generating and refining reward functions through the use of multi-modal LLMs. VIRAL autonomously creates and interactively improves reward functions based on a given environment and a goal prompt or annotated image. The refinement process can incorporate human feedback or be guided by a description generated by a video LLM, which explains the agent's policy in video form. We evaluated VIRAL in five Gymnasium environments, demonstrating that it accelerates the learning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

viral-ucbl1/viral
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsOnline and Blended Learning · Educational Research and Analysis