Towards Practical Credit Assignment for Deep Reinforcement Learning
Vyacheslav Alipov, Riley Simmons-Edler, Nikita Putintsev, Pavel, Kalinin, Dmitry Vetrov

TL;DR
This paper advances credit assignment in deep reinforcement learning by adapting Hindsight Credit Assignment (HCA), improving performance on complex tasks through theoretical modifications and empirical evaluation on the ALE benchmark.
Contribution
It introduces theoretically-justified modifications to HCA for deep RL, enabling practical application and improved performance on complex environments.
Findings
Enhanced credit assignment improves game scores
HCA-based methods outperform A2C in certain ALE games
Proposed modifications enable better estimation of hindsight probabilities
Abstract
Credit assignment is a fundamental problem in reinforcement learning, the problem of measuring an action's influence on future rewards. Explicit credit assignment methods have the potential to boost the performance of RL algorithms on many tasks, but thus far remain impractical for general use. Recently, a family of methods called Hindsight Credit Assignment (HCA) was proposed, which explicitly assign credit to actions in hindsight based on the probability of the action having led to an observed outcome. This approach has appealing properties, but remains a largely theoretical idea applicable to a limited set of tabular RL tasks. Moreover, it is unclear how to extend HCA to deep RL environments. In this work, we explore the use of HCA-style credit in a deep RL context. We first describe the limitations of existing HCA algorithms in deep RL that lead to their poor performance or complete…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications · Robot Manipulation and Learning
MethodsA2C
