AWD3: Dynamic Reduction of the Estimation Bias
Dogan C. Cicek, Enes Duran, Baturay Saglam, Kagan Kaya, Furkan B., Mutlu, Suleyman S. Kozat

TL;DR
AWD3 is a novel method that adaptively reduces estimation bias in off-policy continuous control RL algorithms, improving performance and robustness by dynamically tuning a key hyper-parameter.
Contribution
The paper introduces AWD3, a technique that adaptively learns the weighting hyper-parameter to eliminate estimation bias in off-policy continuous control algorithms.
Findings
AWD3 matches or outperforms state-of-the-art algorithms in OpenAI gym environments.
The adaptive weighting mechanism effectively reduces estimation bias.
Improved robustness and performance in continuous control tasks.
Abstract
Value-based deep Reinforcement Learning (RL) algorithms suffer from the estimation bias primarily caused by function approximation and temporal difference (TD) learning. This problem induces faulty state-action value estimates and therefore harms the performance and robustness of the learning algorithms. Although several techniques were proposed to tackle, learning algorithms still suffer from this bias. Here, we introduce a technique that eliminates the estimation bias in off-policy continuous control algorithms using the experience replay mechanism. We adaptively learn the weighting hyper-parameter beta in the Weighted Twin Delayed Deep Deterministic Policy Gradient algorithm. Our method is named Adaptive-WD3 (AWD3). We show through continuous control environments of OpenAI gym that our algorithm matches or outperforms the state-of-the-art off-policy policy gradient learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsExperience Replay
