Parameter-free Reduction of the Estimation Bias in Deep Reinforcement Learning for Deterministic Policy Gradients
Baturay Saglam, Furkan Burak Mutlu, Dogan Can Cicek, Suleyman Serdar, Kozat

TL;DR
This paper introduces a parameter-free method to reduce estimation bias in deep reinforcement learning, specifically targeting deterministic policy gradients, leading to improved performance on continuous control tasks.
Contribution
A novel, parameter-free Deep Q-learning variant that mitigates underestimation bias by sampling critic weights from a shrunk bias interval, independent of reward variance.
Findings
Significantly outperforms existing methods on MuJoCo and Box2D tasks.
Reduces underestimation bias effectively in high-variance environments.
Enhances state-of-the-art performance in deterministic policy gradient methods.
Abstract
Approximation of the value functions in value-based deep reinforcement learning induces overestimation bias, resulting in suboptimal policies. We show that when the reinforcement signals received by the agents have a high variance, deep actor-critic approaches that overcome the overestimation bias lead to a substantial underestimation bias. We first address the detrimental issues in the existing approaches that aim to overcome such underestimation error. Then, through extensive statistical analysis, we introduce a novel, parameter-free Deep Q-learning variant to reduce this underestimation bias in deterministic policy gradients. By sampling the weights of a linear combination of two approximate critics from a highly shrunk estimation bias interval, our Q-value update rule is not affected by the variance of the rewards received by the agents throughout learning. We test the performance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
MethodsTest · Clipped Double Q-learning · Q-Learning · Double Q-learning
