EASpace: Enhanced Action Space for Policy Transfer
Zheng Zhang, Qingrui Zhang, Bo Zhu, Xiaohan Wang, and Tianjiang Hu

TL;DR
EASpace introduces a novel macro action formulation that accelerates policy learning by integrating multiple expert policies and an intrinsic reward, improving exploration and data efficiency in complex tasks.
Contribution
The paper proposes EASpace, a new macro action formulation that enhances policy transfer by integrating multiple expert policies with an intrinsic reward mechanism.
Findings
EASpace accelerates learning in grid-based and pursuit tasks.
Theoretical convergence of the learning rule is established.
EASpace is effective in physical system implementations.
Abstract
Formulating expert policies as macro actions promises to alleviate the long-horizon issue via structured exploration and efficient credit assignment. However, traditional option-based multi-policy transfer methods suffer from inefficient exploration of macro action's length and insufficient exploitation of useful long-duration macro actions. In this paper, a novel algorithm named EASpace (Enhanced Action Space) is proposed, which formulates macro actions in an alternative form to accelerate the learning process using multiple available sub-optimal expert policies. Specifically, EASpace formulates each expert policy into multiple macro actions with different execution {times}. All the macro actions are then integrated into the primitive action space directly. An intrinsic reward, which is proportional to the execution time of macro actions, is introduced to encourage the exploitation of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Data Storage Technologies · Smart Grid Energy Management
MethodsQ-Learning
