Planning Immediate Landmarks of Targets for Model-Free Skill Transfer   across Agents

Minghuan Liu; Zhengbang Zhu; Menghui Zhu; Yuzheng Zhuang; Weinan; Zhang; Jianye Hao

arXiv:2212.09033·cs.AI·December 20, 2022

Planning Immediate Landmarks of Targets for Model-Free Skill Transfer across Agents

Minghuan Liu, Zhengbang Zhu, Menghui Zhu, Yuzheng Zhuang, Weinan, Zhang, Jianye Hao

PDF

Open Access

TL;DR

This paper introduces PILoT, a goal-planner that enables model-free skill transfer across diverse agents and tasks by planning immediate landmarks, reducing re-training and improving sample efficiency.

Contribution

PILoT is a novel goal-conditioned planning method that distills a universal goal-planner for cross-agent transfer in reinforcement learning.

Findings

01

Effective in few-shot transfer across different action spaces and dynamics.

02

Successful zero-shot transfer from simple to complex tasks.

03

Works with various input types, including vector states and images.

Abstract

In reinforcement learning applications like robotics, agents usually need to deal with various input/output features when specified with different state/action spaces by their developers or physical restrictions. This indicates unnecessary re-training from scratch and considerable sample inefficiency, especially when agents follow similar solution steps to achieve tasks. In this paper, we aim to transfer similar high-level goal-transition knowledge to alleviate the challenge. Specifically, we propose PILoT, i.e., Planning Immediate Landmarks of Targets. PILoT utilizes the universal decoupled policy optimization to learn a goal-conditioned state planner; then, distills a goal-planner to plan immediate landmarks in a model-free style that can be shared among different agents. In our experiments, we show the power of PILoT on various transferring challenges, including few-shot transferring…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)