Occupancy Reward Shaping: Improving Credit Assignment for Offline Goal-Conditioned Reinforcement Learning

Aravind Venugopal; Jiayu Chen; Xudong Wu; Chongyi Zheng; Benjamin Eysenbach; Jeff Schneider

arXiv:2604.20627·cs.LG·April 23, 2026

Occupancy Reward Shaping: Improving Credit Assignment for Offline Goal-Conditioned Reinforcement Learning

Aravind Venugopal, Jiayu Chen, Xudong Wu, Chongyi Zheng, Benjamin Eysenbach, Jeff Schneider

PDF

1 Repo 1 Video

TL;DR

This paper introduces Occupancy Reward Shaping, a method that uses world models and optimal transport to improve credit assignment in goal-conditioned reinforcement learning, especially in sparse reward scenarios.

Contribution

It formalizes how temporal information in world models encodes environment geometry and leverages this for reward shaping without altering optimal policies.

Findings

01

Empirically improves performance by 2.2x on 13 tasks.

02

Effectively mitigates credit assignment issues in sparse reward settings.

03

Successfully applied to real-world nuclear fusion control tasks.

Abstract

The temporal lag between actions and their long-term consequences makes credit assignment a challenge when learning goal-directed behaviors from data. Generative world models capture the distribution of future states an agent may visit, indicating that they have captured temporal information. How can that temporal information be extracted to perform credit assignment? In this paper, we formalize how the temporal information stored in world models encodes the underlying geometry of the world. Leveraging optimal transport, we extract this geometry from a learned model of the occupancy measure into a reward function that captures goal-reaching information. Our resulting method, Occupancy Reward Shaping, largely mitigates the problem of credit assignment in sparse reward settings. ORS provably does not alter the optimal policy, yet empirically improves performance by 2.2x across 13 diverse…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aravindvenu7/occupancy_reward_shaping
github

Videos

Occupancy Reward Shaping: Improving Credit Assignment for Offline Goal-Conditioned Reinforcement Learning· slideslive