Adaptive Correlation-Weighted Intrinsic Rewards for Reinforcement Learning

Viet Bac Nguyen; Phuong Thai Nguyen

arXiv:2602.24081·cs.LG·March 2, 2026

Adaptive Correlation-Weighted Intrinsic Rewards for Reinforcement Learning

Viet Bac Nguyen, Phuong Thai Nguyen

PDF

Open Access

TL;DR

This paper introduces ACWI, a novel adaptive intrinsic reward scaling framework for reinforcement learning that dynamically balances exploration and exploitation, leading to improved sample efficiency and stability in sparse reward environments.

Contribution

ACWI is the first method to learn a state-dependent intrinsic reward weight online using a Beta Network, enhancing exploration without manual tuning.

Findings

01

ACWI outperforms fixed intrinsic reward baselines in MiniGrid environments.

02

ACWI improves sample efficiency and training stability.

03

ACWI maintains computational efficiency with minimal overhead.

Abstract

We propose ACWI (Adaptive Correlation Weighted Intrinsic), an adaptive intrinsic reward scaling framework designed to dynamically balance intrinsic and extrinsic rewards for improved exploration in sparse reward reinforcement learning. Unlike conventional approaches that rely on manually tuned scalar coefficients, which often result in unstable or suboptimal performance across tasks, ACWI learns a state dependent scaling coefficient online. Specifically, ACWI introduces a lightweight Beta Network that predicts the intrinsic reward weight directly from the agent state through an encoder based architecture. The scaling mechanism is optimized using a correlation based objective that encourages alignment between the weighted intrinsic rewards and discounted future extrinsic returns. This formulation enables task adaptive exploration incentives while preserving computational efficiency and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Stochastic Gradient Optimization Techniques · Advanced Bandit Algorithms Research