Action Candidate Based Clipped Double Q-learning for Discrete and   Continuous Action Tasks

Haobo Jiang; Jin Xie; Jian Yang

arXiv:2105.00704·cs.LG·May 4, 2021·1 cites

Action Candidate Based Clipped Double Q-learning for Discrete and Continuous Action Tasks

Haobo Jiang, Jin Xie, Jian Yang

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces an action candidate based clipped double estimator to improve the accuracy of maximum expected action value estimation in Double Q-learning, reducing bias and enhancing performance in both discrete and continuous tasks.

Contribution

It proposes a novel action candidate based estimator that reduces underestimation bias and extends to continuous actions, improving over traditional clipped Double Q-learning.

Findings

01

More accurate maximum expected action value estimation in toy environments.

02

Better performance on benchmark problems.

03

Bias control via the number of action candidates.

Abstract

Double Q-learning is a popular reinforcement learning algorithm in Markov decision process (MDP) problems. Clipped Double Q-learning, as an effective variant of Double Q-learning, employs the clipped double estimator to approximate the maximum expected action value. Due to the underestimation bias of the clipped double estimator, performance of clipped Double Q-learning may be degraded in some stochastic environments. In this paper, in order to reduce the underestimation bias, we propose an action candidate based clipped double estimator for Double Q-learning. Specifically, we first select a set of elite action candidates with the high action values from one set of estimators. Then, among these candidates, we choose the highest valued action from the other set of estimators. Finally, we use the maximum value in the second set of estimators to clip the action value of the chosen action…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Jiang-HB/AC_CDQ
pytorchOfficial

Videos

Action Candidate Based Clipped Double Q-Learning for Discrete and Continuous Action Tasks· underline

Taxonomy

TopicsReinforcement Learning in Robotics · Machine Learning and Data Classification · Data Stream Mining Techniques

MethodsClipped Double Q-learning · Q-Learning · Double Q-learning