Would I have gotten that reward? Long-term credit assignment by counterfactual contribution analysis
Alexander Meulemans, Simon Schug, Seijin Kobayashi, Nathaniel Daw,, Gregory Wayne

TL;DR
This paper introduces Counterfactual Contribution Analysis (COCOA), a model-based credit assignment method that improves sample efficiency in reinforcement learning by accurately measuring actions' influence on future rewards through counterfactual reasoning.
Contribution
The paper proposes COCOA, a novel family of algorithms that measure action contributions relative to rewards or learned representations, reducing bias and variance compared to previous methods like HCA.
Findings
COCOA achieves lower bias and variance in policy gradient estimates.
Experimental results show improved performance on long-term credit assignment tasks.
Modeling contributions to rewarding outcomes enhances sample efficiency.
Abstract
To make reinforcement learning more sample efficient, we need better credit assignment methods that measure an action's influence on future rewards. Building upon Hindsight Credit Assignment (HCA), we introduce Counterfactual Contribution Analysis (COCOA), a new family of model-based credit assignment algorithms. Our algorithms achieve precise credit assignment by measuring the contribution of actions upon obtaining subsequent rewards, by quantifying a counterfactual query: 'Would the agent still have reached this reward if it had taken another action?'. We show that measuring contributions w.r.t. rewarding states, as is done in HCA, results in spurious estimates of contributions, causing HCA to degrade towards the high-variance REINFORCE estimator in many relevant environments. Instead, we measure contributions w.r.t. rewards or learned representations of the rewarding objects,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
