Representation Learning for Continuous Action Spaces is Beneficial for Efficient Policy Learning
Tingting Zhao, Ying Wang, Wei Sun, Yarui Chen, Gang Niub, Masashi, Sugiyama

TL;DR
This paper introduces a novel approach to reinforcement learning that leverages representation learning in both state and action spaces to improve efficiency and generalization in continuous action environments, demonstrated through multiple experiments.
Contribution
It extends state representation techniques to action spaces and proposes a two-scale learning framework combining large unsupervised models with small RL policy models.
Findings
Improved policy learning efficiency in continuous action spaces.
Enhanced generalization to large-scale action environments.
Validated effectiveness on MountainCar, CarRacing, and Cheetah tasks.
Abstract
Deep reinforcement learning (DRL) breaks through the bottlenecks of traditional reinforcement learning (RL) with the help of the perception capability of deep learning and has been widely applied in real-world problems.While model-free RL, as a class of efficient DRL methods, performs the learning of state representations simultaneously with policy learning in an end-to-end manner when facing large-scale continuous state and action spaces. However, training such a large policy model requires a large number of trajectory samples and training time. On the other hand, the learned policy often fails to generalize to large-scale action spaces, especially for the continuous action spaces. To address this issue, in this paper we propose an efficient policy learning method in latent state and action spaces. More specifically, we extend the idea of state representations to action representations…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
