Discrete Action On-Policy Learning with Action-Value Critic
Yuguang Yue, Yunhao Tang, Mingzhang Yin, Mingyuan Zhou

TL;DR
This paper introduces a new on-policy reinforcement learning algorithm for discrete action spaces that uses an action-value critic to improve variance control and enhance performance, especially in high-dimensional settings.
Contribution
It develops a critic-based method for discrete on-policy RL that effectively manages variance and improves learning efficiency in multidimensional action spaces.
Findings
Empirically outperforms existing on-policy algorithms on benchmark tasks.
Demonstrates benefits of discretizing action space for exploration and convergence.
Provides a statistically grounded approach to action correlation and gradient sparsification.
Abstract
Reinforcement learning (RL) in discrete action space is ubiquitous in real-world applications, but its complexity grows exponentially with the action-space dimension, making it challenging to apply existing on-policy gradient based deep RL algorithms efficiently. To effectively operate in multidimensional discrete action spaces, we construct a critic to estimate action-value functions, apply it on correlated actions, and combine these critic estimated action values to control the variance of gradient estimation. We follow rigorous statistical analysis to design how to generate and combine these correlated actions, and how to sparsify the gradients by shutting down the contributions from certain dimensions. These efforts result in a new discrete action on-policy RL algorithm that empirically outperforms related on-policy algorithms relying on variance control techniques. We demonstrate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Model Reduction and Neural Networks
