Parametrized Deep Q-Networks Learning: Reinforcement Learning with   Discrete-Continuous Hybrid Action Space

Jiechao Xiong; Qing Wang; Zhuoran Yang; Peng Sun; Lei Han; Yang Zheng,; Haobo Fu; Tong Zhang; Ji Liu; and Han Liu

arXiv:1810.06394·cs.LG·October 16, 2018·152 cites

Parametrized Deep Q-Networks Learning: Reinforcement Learning with Discrete-Continuous Hybrid Action Space

Jiechao Xiong, Qing Wang, Zhuoran Yang, Peng Sun, Lei Han, Yang Zheng,, Haobo Fu, Tong Zhang, Ji Liu, and Han Liu

PDF

Open Access 5 Repos

TL;DR

This paper introduces a novel Parametrized Deep Q-Network (P-DQN) that effectively handles hybrid discrete-continuous action spaces in reinforcement learning without approximation or relaxation, demonstrated through simulation and game experiments.

Contribution

The paper presents a new P-DQN framework that seamlessly integrates DQN and DDPG for hybrid action spaces, avoiding approximation or relaxation methods.

Findings

01

Successfully applied to RoboCup soccer simulation

02

Achieved effective goal scoring in King of Glory game

03

Validated efficiency and effectiveness through experiments

Abstract

Most existing deep reinforcement learning (DRL) frameworks consider either discrete action space or continuous action space solely. Motivated by applications in computer games, we consider the scenario with discrete-continuous hybrid action space. To handle hybrid action space, previous works either approximate the hybrid space by discretization, or relax it into a continuous set. In this paper, we propose a parametrized deep Q-network (P- DQN) framework for the hybrid action space without approximation or relaxation. Our algorithm combines the spirits of both DQN (dealing with discrete action space) and DDPG (dealing with continuous action space) by seamlessly integrating them. Empirical results on a simulation example, scoring a goal in simulated RoboCup soccer and the solo mode in game King of Glory (KOG) validate the efficiency and effectiveness of our method.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Adversarial Robustness in Machine Learning

MethodsExperience Replay · Weight Decay · *Communicated@Fast*How Do I Communicate to Expedia? · Q-Learning · Adam · Batch Normalization · Deep Deterministic Policy Gradient · Dense Connections · Convolution · Deep Q-Network