LLM-Driven Intrinsic Motivation for Sparse Reward Reinforcement Learning

Andr\'e Quadros; Cassio Silva; Ronnie Alves

arXiv:2508.18420·cs.LG·August 27, 2025

LLM-Driven Intrinsic Motivation for Sparse Reward Reinforcement Learning

Andr\'e Quadros, Cassio Silva, Ronnie Alves

PDF

TL;DR

This paper introduces a novel approach combining Variational State as Intrinsic Reward (VSIMR) with Large Language Model (LLM)-based rewards to enhance reinforcement learning in environments with sparse rewards, demonstrating improved performance in the MiniGrid DoorKey benchmark.

Contribution

The paper presents a new integrated intrinsic motivation framework using VAEs and LLMs, significantly improving RL efficiency in sparse reward settings.

Findings

01

Combined approach outperforms individual strategies and standard A2C.

02

Significant increase in learning efficiency and success rate.

03

Effective complementarity between exploration and exploitation mechanisms.

Abstract

This paper explores the combination of two intrinsic motivation strategies to improve the efficiency of reinforcement learning (RL) agents in environments with extreme sparse rewards, where traditional learning struggles due to infrequent positive feedback. We propose integrating Variational State as Intrinsic Reward (VSIMR), which uses Variational AutoEncoders (VAEs) to reward state novelty, with an intrinsic reward approach derived from Large Language Models (LLMs). The LLMs leverage their pre-trained knowledge to generate reward signals based on environment and goal descriptions, guiding the agent. We implemented this combined approach with an Actor-Critic (A2C) agent in the MiniGrid DoorKey environment, a benchmark for sparse rewards. Our empirical results show that this combined strategy significantly increases agent performance and sampling efficiency compared to using each…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.