Parametrized Deep Q-Networks Learning: Reinforcement Learning with Discrete-Continuous Hybrid Action Space
Jiechao Xiong, Qing Wang, Zhuoran Yang, Peng Sun, Lei Han, Yang Zheng,, Haobo Fu, Tong Zhang, Ji Liu, and Han Liu

TL;DR
This paper introduces a novel Parametrized Deep Q-Network (P-DQN) that effectively handles hybrid discrete-continuous action spaces in reinforcement learning without approximation or relaxation, demonstrated through simulation and game experiments.
Contribution
The paper presents a new P-DQN framework that seamlessly integrates DQN and DDPG for hybrid action spaces, avoiding approximation or relaxation methods.
Findings
Successfully applied to RoboCup soccer simulation
Achieved effective goal scoring in King of Glory game
Validated efficiency and effectiveness through experiments
Abstract
Most existing deep reinforcement learning (DRL) frameworks consider either discrete action space or continuous action space solely. Motivated by applications in computer games, we consider the scenario with discrete-continuous hybrid action space. To handle hybrid action space, previous works either approximate the hybrid space by discretization, or relax it into a continuous set. In this paper, we propose a parametrized deep Q-network (P- DQN) framework for the hybrid action space without approximation or relaxation. Our algorithm combines the spirits of both DQN (dealing with discrete action space) and DDPG (dealing with continuous action space) by seamlessly integrating them. Empirical results on a simulation example, scoring a goal in simulated RoboCup soccer and the solo mode in game King of Glory (KOG) validate the efficiency and effectiveness of our method.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Adversarial Robustness in Machine Learning
MethodsExperience Replay · Weight Decay · *Communicated@Fast*How Do I Communicate to Expedia? · Q-Learning · Adam · Batch Normalization · Deep Deterministic Policy Gradient · Dense Connections · Convolution · Deep Q-Network
