Zero-Shot Policy Transfer with Disentangled Task Representation of Meta-Reinforcement Learning
Zheng Wu, Yichen Xie, Wenzhao Lian, Changhao Wang, Yanjiang Guo,, Jianyu Chen, Stefan Schaal, Masayoshi Tomizuka

TL;DR
This paper introduces a meta-reinforcement learning method that uses disentangled task representations to enable zero-shot policy transfer across novel compositional tasks, demonstrating success in simulated and real-world scenarios.
Contribution
It proposes a novel meta-RL algorithm with explicit disentangled task encoding to achieve zero-shot generalization to unseen task combinations.
Findings
Successful zero-shot policy transfer in simulated tasks
Effective generalization to a real-world robotic insertion task
Disentangled representations improve compositional task understanding
Abstract
Humans are capable of abstracting various tasks as different combinations of multiple attributes. This perspective of compositionality is vital for human rapid learning and adaption since previous experiences from related tasks can be combined to generalize across novel compositional settings. In this work, we aim to achieve zero-shot policy generalization of Reinforcement Learning (RL) agents by leveraging the task compositionality. Our proposed method is a meta- RL algorithm with disentangled task representation, explicitly encoding different aspects of the tasks. Policy generalization is then performed by inferring unseen compositional task representations via the obtained disentanglement without extra exploration. The evaluation is conducted on three simulated tasks and a challenging real-world robotic insertion task. Experimental results demonstrate that our proposed method…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning
