Deep Reinforcement Learning in Parameterized Action Space

Matthew Hausknecht; Peter Stone

arXiv:1511.04143·cs.AI·May 6, 2024·ICLR·54 cites

Deep Reinforcement Learning in Parameterized Action Space

Matthew Hausknecht, Peter Stone

PDF

Open Access 5 Repos

TL;DR

This paper extends deep reinforcement learning to parameterized action spaces, successfully applying it to RoboCup soccer and achieving more reliable goal scoring than previous champions.

Contribution

It introduces methods for deep RL in structured continuous action spaces, filling a gap in existing research and demonstrating success in a complex simulated environment.

Findings

01

Agent scores goals more reliably than 2012 RoboCup champion

02

First successful application of deep RL in parameterized action spaces

03

Demonstrates potential for complex structured action domains

Abstract

Recent work has shown that deep neural networks are capable of approximating both value functions and policies in reinforcement learning domains featuring continuous state and action spaces. However, to the best of our knowledge no previous work has succeeded at using deep neural networks in structured (parameterized) continuous action spaces. To fill this gap, this paper focuses on learning within the domain of simulated RoboCup soccer, which features a small set of discrete action types, each of which is parameterized with continuous variables. The best learned agent can score goals more reliably than the 2012 RoboCup champion agent. As such, this paper represents a successful extension of deep reinforcement learning to the class of parameterized action space MDPs.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Advanced Bandit Algorithms Research