Would I have gotten that reward? Long-term credit assignment by   counterfactual contribution analysis

Alexander Meulemans; Simon Schug; Seijin Kobayashi; Nathaniel Daw,; Gregory Wayne

arXiv:2306.16803·cs.LG·November 1, 2023

Would I have gotten that reward? Long-term credit assignment by counterfactual contribution analysis

Alexander Meulemans, Simon Schug, Seijin Kobayashi, Nathaniel Daw,, Gregory Wayne

PDF

Open Access

TL;DR

This paper introduces Counterfactual Contribution Analysis (COCOA), a model-based credit assignment method that improves sample efficiency in reinforcement learning by accurately measuring actions' influence on future rewards through counterfactual reasoning.

Contribution

The paper proposes COCOA, a novel family of algorithms that measure action contributions relative to rewards or learned representations, reducing bias and variance compared to previous methods like HCA.

Findings

01

COCOA achieves lower bias and variance in policy gradient estimates.

02

Experimental results show improved performance on long-term credit assignment tasks.

03

Modeling contributions to rewarding outcomes enhances sample efficiency.

Abstract

To make reinforcement learning more sample efficient, we need better credit assignment methods that measure an action's influence on future rewards. Building upon Hindsight Credit Assignment (HCA), we introduce Counterfactual Contribution Analysis (COCOA), a new family of model-based credit assignment algorithms. Our algorithms achieve precise credit assignment by measuring the contribution of actions upon obtaining subsequent rewards, by quantifying a counterfactual query: 'Would the agent still have reached this reward if it had taken another action?'. We show that measuring contributions w.r.t. rewarding states, as is done in HCA, results in spurious estimates of contributions, causing HCA to degrade towards the high-variance REINFORCE estimator in many relevant environments. Instead, we measure contributions w.r.t. rewards or learned representations of the rewarding objects,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics