Intrinsic Reward Policy Optimization for Sparse-Reward Environments

Minjae Cho; Huy Trong Tran

arXiv:2601.21391·cs.LG·January 30, 2026

Intrinsic Reward Policy Optimization for Sparse-Reward Environments

Minjae Cho, Huy Trong Tran

PDF

Open Access

TL;DR

This paper introduces IRPO, a new reinforcement learning algorithm that effectively uses multiple intrinsic rewards to improve exploration and policy optimization in environments with sparse rewards, enhancing performance and sample efficiency.

Contribution

The paper proposes IRPO, a novel policy optimization framework that directly leverages multiple intrinsic rewards without pretraining subpolicies, addressing instability and inefficiency issues.

Findings

01

IRPO outperforms baselines in discrete and continuous environments.

02

IRPO improves sample efficiency in sparse-reward tasks.

03

The paper provides a formal analysis of IRPO's optimization problem.

Abstract

Exploration is essential in reinforcement learning as an agent relies on trial and error to learn an optimal policy. However, when rewards are sparse, naive exploration strategies, like noise injection, are often insufficient. Intrinsic rewards can also provide principled guidance for exploration by, for example, combining them with extrinsic rewards to optimize a policy or using them to train subpolicies for hierarchical learning. However, the former approach suffers from unstable credit assignment, while the latter exhibits sample inefficiency and sub-optimality. We propose a policy optimization framework that leverages multiple intrinsic rewards to directly optimize a policy for an extrinsic reward without pretraining subpolicies. Our algorithm -- intrinsic reward policy optimization (IRPO) -- achieves this by using a surrogate policy gradient that provides a more informative…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Explainable Artificial Intelligence (XAI)