Explaining an Agent's Future Beliefs through Temporally Decomposing   Future Reward Estimators

Mark Towers; Yali Du; Christopher Freeman; Timothy J. Norman

arXiv:2408.08230·cs.AI·August 16, 2024

Explaining an Agent's Future Beliefs through Temporally Decomposing Future Reward Estimators

Mark Towers, Yali Du, Christopher Freeman, Timothy J. Norman

PDF

Open Access 1 Repo

TL;DR

This paper introduces Temporal Reward Decomposition (TRD), a method that predicts an agent's upcoming individual rewards, providing clearer explanations of future expectations and decision-making in reinforcement learning agents.

Contribution

The paper proposes TRD, a novel approach to decompose future reward estimations into individual expected rewards, enhancing interpretability and enabling new insights into agent behavior.

Findings

01

TRD accurately predicts individual future rewards.

02

TRD enables estimation of reward timing, value, and confidence.

03

TRD can be integrated into DQN agents with minimal performance loss.

Abstract

Future reward estimation is a core component of reinforcement learning agents; i.e., Q-value and state-value functions, predicting an agent's sum of future rewards. Their scalar output, however, obfuscates when or what individual future rewards an agent may expect to receive. We address this by modifying an agent's future reward estimator to predict their next N expected rewards, referred to as Temporal Reward Decomposition (TRD). This unlocks novel explanations of agent behaviour. Through TRD we can: estimate when an agent may expect to receive a reward, the value of the reward and the agent's confidence in receiving it; measure an input feature's temporal importance to the agent's action decisions; and predict the influence of different actions on future rewards. Furthermore, we show that DQN agents trained on Atari environments can be efficiently retrained to incorporate TRD with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

pseudo-rnd-thoughts/temporal-reward-decomposition
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDecision-Making and Behavioral Economics

MethodsQ-Learning · Convolution · Dense Connections · Deep Q-Network