Hindsight Experience Replay with Kronecker Product Approximate Curvature
Dhuruva Priyan G M, Abhik Singla, Shalabh Bhatnagar

TL;DR
This paper introduces a novel reinforcement learning algorithm combining Hindsight Experience Replay with Kronecker-Factored Approximate Curvature and TD3 to improve sample efficiency, convergence speed, and success rates in sparse reward environments.
Contribution
It proposes integrating natural gradient methods and Kronecker-factored curvature into HER with TD3, enhancing performance and training efficiency in reinforcement learning tasks.
Findings
Improved sample efficiency over standard HER.
Faster convergence in Mujoco environments.
Higher success rates in sparse reward tasks.
Abstract
Hindsight Experience Replay (HER) is one of the efficient algorithm to solve Reinforcement Learning tasks related to sparse rewarded environments.But due to its reduced sample efficiency and slower convergence HER fails to perform effectively. Natural gradients solves these challenges by converging the model parameters better. It avoids taking bad actions that collapse the training performance. However updating parameters in neural networks requires expensive computation and thus increase in training time. Our proposed method solves the above mentioned challenges with better sample efficiency and faster convergence with increased success rate. A common failure mode for DDPG is that the learned Q-function begins to dramatically overestimate Q-values, which then leads to the policy breaking, because it exploits the errors in the Q-function. We solve this issue by including Twin Delayed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputer Graphics and Visualization Techniques · Advanced Vision and Imaging · Evacuation and Crowd Dynamics
MethodsBatch Normalization · Weight Decay · Adam · Convolution · Dense Connections · Clipped Double Q-learning · Deep Deterministic Policy Gradient · Target Policy Smoothing · Experience Replay · *Communicated@Fast*How Do I Communicate to Expedia?
