Discrete Action On-Policy Learning with Action-Value Critic

Yuguang Yue; Yunhao Tang; Mingzhang Yin; Mingyuan Zhou

arXiv:2002.03534·stat.ML·February 24, 2020·1 cites

Discrete Action On-Policy Learning with Action-Value Critic

Yuguang Yue, Yunhao Tang, Mingzhang Yin, Mingyuan Zhou

PDF

Open Access 1 Repo

TL;DR

This paper introduces a new on-policy reinforcement learning algorithm for discrete action spaces that uses an action-value critic to improve variance control and enhance performance, especially in high-dimensional settings.

Contribution

It develops a critic-based method for discrete on-policy RL that effectively manages variance and improves learning efficiency in multidimensional action spaces.

Findings

01

Empirically outperforms existing on-policy algorithms on benchmark tasks.

02

Demonstrates benefits of discretizing action space for exploration and convergence.

03

Provides a statistically grounded approach to action correlation and gradient sparsification.

Abstract

Reinforcement learning (RL) in discrete action space is ubiquitous in real-world applications, but its complexity grows exponentially with the action-space dimension, making it challenging to apply existing on-policy gradient based deep RL algorithms efficiently. To effectively operate in multidimensional discrete action spaces, we construct a critic to estimate action-value functions, apply it on correlated actions, and combine these critic estimated action values to control the variance of gradient estimation. We follow rigorous statistical analysis to design how to generate and combine these correlated actions, and how to sparsify the gradients by shutting down the contributions from certain dimensions. These efforts result in a new discrete action on-policy RL algorithm that empirically outperforms related on-policy algorithms relying on variance control techniques. We demonstrate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yuguangyue/CARSM
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Model Reduction and Neural Networks