Black Box Meta-Learning Intrinsic Rewards

Octavio Pappalardo; Rodrigo Ramele; Juan Miguel Santos

arXiv:2407.21546·cs.LG·March 5, 2026

Black Box Meta-Learning Intrinsic Rewards

Octavio Pappalardo, Rodrigo Ramele, Juan Miguel Santos

PDF

Open Access 1 Repo

TL;DR

This paper proposes a novel meta-learning approach to automatically learn intrinsic rewards for reinforcement learning agents, improving data efficiency and exploration in sparse-reward environments without relying on meta-gradient computations.

Contribution

It introduces a black-box meta-learning method for intrinsic reward optimization that bypasses traditional meta-gradient calculations, enhancing RL training in complex tasks.

Findings

01

Meta-learned intrinsic rewards outperform extrinsic rewards in sparse environments.

02

The approach is effective across continuous control tasks with parametric and non-parametric variations.

03

Sparse rewards during evaluation demonstrate the robustness of the learned intrinsic rewards.

Abstract

The broader application of reinforcement learning (RL) is limited by challenges including data efficiency, generalization capability, and ability to learn in sparse-reward environments. Meta-learning has emerged as a promising approach to address these issues by optimizing components of the learning algorithm to meet desired characteristics. Additionally, a different line of work has extensively studied the use of intrinsic rewards to enhance the exploration capabilities of algorithms. This work investigates how meta-learning can improve the training signal received by RL agents. We introduce a method to learn intrinsic rewards within a reinforcement learning framework that bypasses the typical computation of meta-gradients through an optimization process by treating policy updates as black boxes. We validate our approach against training with extrinsic rewards, demonstrating its…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Octavio-Pappalardo/Meta-learning-rewards
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Healthcare

MethodsFocus