Action Candidate Driven Clipped Double Q-learning for Discrete and Continuous Action Tasks
Haobo Jiang, Jin Xie, and Jian Yang

TL;DR
This paper introduces an action candidate-based clipped Double Q-learning method that reduces underestimation bias and improves performance in both discrete and continuous action reinforcement learning tasks.
Contribution
It proposes a novel action candidate-based estimator that adaptively balances bias, extending clipped Double Q-learning to continuous actions and demonstrating improved accuracy and performance.
Findings
Reduces underestimation bias as action candidate number decreases
Achieves more accurate value estimation in toy environments
Yields superior performance on benchmark problems
Abstract
Double Q-learning is a popular reinforcement learning algorithm in Markov decision process (MDP) problems. Clipped Double Q-learning, as an effective variant of Double Q-learning, employs the clipped double estimator to approximate the maximum expected action value. Due to the underestimation bias of the clipped double estimator, the performance of clipped Double Q-learning may be degraded in some stochastic environments. In this paper, in order to reduce the underestimation bias, we propose an action candidate-based clipped double estimator for Double Q-learning. Specifically, we first select a set of elite action candidates with high action values from one set of estimators. Then, among these candidates, we choose the highest valued action from the other set of estimators. Finally, we use the maximum value in the second set of estimators to clip the action value of the chosen action…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Reinforcement Learning in Robotics · Software Reliability and Analysis Research
MethodsQ-Learning · Clipped Double Q-learning · Double Q-learning
