Variance Reduced Advantage Estimation with $\delta$ Hindsight Credit Assignment
Kenny Young

TL;DR
This paper introduces a variance-reduced Hindsight Credit Assignment algorithm for reinforcement learning, providing theoretical guarantees of lower variance and demonstrating its potential for more efficient credit assignment.
Contribution
The paper proposes a new HCA algorithm with provably lower variance than Monte-Carlo estimators when functions are estimated exactly, advancing theoretical understanding.
Findings
The proposed method has lower variance than Monte-Carlo estimators.
Theoretical proof of variance reduction under certain conditions.
Empirical demonstrations of improved credit assignment efficiency.
Abstract
Hindsight Credit Assignment (HCA) refers to a recently proposed family of methods for producing more efficient credit assignment in reinforcement learning. These methods work by explicitly estimating the probability that certain actions were taken in the past given present information. Prior work has studied the properties of such methods and demonstrated their behaviour empirically. We extend this work by introducing a particular HCA algorithm which has provably lower variance than the conventional Monte-Carlo estimator when the necessary functions can be estimated exactly. This result provides a strong theoretical basis for how HCA could be broadly useful.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFinancial Distress and Bankruptcy Prediction · Reinforcement Learning in Robotics · Risk and Portfolio Optimization
