Interpretable Reward Redistribution in Reinforcement Learning: A Causal Approach
Yudi Zhang, Yali Du, Biwei Huang, Ziyan Wang, Jun Wang, Meng Fang,, Mykola Pechenizkiy

TL;DR
This paper introduces a causal, interpretable approach to reward redistribution in reinforcement learning, enabling better credit assignment for delayed rewards and improving policy optimization.
Contribution
It proposes the Generative Return Decomposition framework that models causal relations to identify unobservable rewards and enhance interpretability and performance in delayed reward scenarios.
Findings
Outperforms state-of-the-art reward redistribution methods
Provides theoretical guarantees for causal structure identifiability
Demonstrates interpretability through visualization
Abstract
A major challenge in reinforcement learning is to determine which state-action pairs are responsible for future rewards that are delayed. Reward redistribution serves as a solution to re-assign credits for each time step from observed sequences. While the majority of current approaches construct the reward redistribution in an uninterpretable manner, we propose to explicitly model the contributions of state and action from a causal perspective, resulting in an interpretable reward redistribution and preserving policy invariance. In this paper, we start by studying the role of causal generative models in reward redistribution by characterizing the generation of Markovian rewards and trajectory-wise long-term return and further propose a framework, called Generative Return Decomposition (GRD), for policy optimization in delayed reward scenarios. Specifically, GRD first identifies the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMental Health Research Topics · Functional Brain Connectivity Studies · Neural and Behavioral Psychology Studies
