Action Candidate Driven Clipped Double Q-learning for Discrete and   Continuous Action Tasks

Haobo Jiang; Jin Xie; and Jian Yang

arXiv:2203.11526·cs.LG·March 23, 2022

Action Candidate Driven Clipped Double Q-learning for Discrete and Continuous Action Tasks

Haobo Jiang, Jin Xie, and Jian Yang

PDF

Open Access 1 Repo

TL;DR

This paper introduces an action candidate-based clipped Double Q-learning method that reduces underestimation bias and improves performance in both discrete and continuous action reinforcement learning tasks.

Contribution

It proposes a novel action candidate-based estimator that adaptively balances bias, extending clipped Double Q-learning to continuous actions and demonstrating improved accuracy and performance.

Findings

01

Reduces underestimation bias as action candidate number decreases

02

Achieves more accurate value estimation in toy environments

03

Yields superior performance on benchmark problems

Abstract

Double Q-learning is a popular reinforcement learning algorithm in Markov decision process (MDP) problems. Clipped Double Q-learning, as an effective variant of Double Q-learning, employs the clipped double estimator to approximate the maximum expected action value. Due to the underestimation bias of the clipped double estimator, the performance of clipped Double Q-learning may be degraded in some stochastic environments. In this paper, in order to reduce the underestimation bias, we propose an action candidate-based clipped double estimator for Double Q-learning. Specifically, we first select a set of elite action candidates with high action values from one set of estimators. Then, among these candidates, we choose the highest valued action from the other set of estimators. Finally, we use the maximum value in the second set of estimators to clip the action value of the chosen action…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Jiang-HB/AC_CDQ
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Reinforcement Learning in Robotics · Software Reliability and Analysis Research

MethodsQ-Learning · Clipped Double Q-learning · Double Q-learning