Multi-task Hierarchical Adversarial Inverse Reinforcement Learning
Jiayu Chen, Dipesh Tamboli, Tian Lan, Vaneet Aggarwal

TL;DR
This paper introduces MH-AIRL, a hierarchical adversarial inverse reinforcement learning method that improves multi-task imitation learning by enhancing data efficiency, handling complex tasks, and enabling learning from unannotated demonstrations.
Contribution
It develops a hierarchical, multi-task policy learning framework that synthesizes context-based learning, AIRL, and hierarchical policies, with the ability to learn from unannotated data.
Findings
Outperforms state-of-the-art MIL baselines on complex multi-task settings.
Achieves higher data efficiency and transferability of learned policies.
Demonstrates effectiveness on challenging long-horizon tasks.
Abstract
Multi-task Imitation Learning (MIL) aims to train a policy capable of performing a distribution of tasks based on multi-task expert demonstrations, which is essential for general-purpose robots. Existing MIL algorithms suffer from low data efficiency and poor performance on complex long-horizontal tasks. We develop Multi-task Hierarchical Adversarial Inverse Reinforcement Learning (MH-AIRL) to learn hierarchically-structured multi-task policies, which is more beneficial for compositional tasks with long horizons and has higher expert data efficiency through identifying and transferring reusable basic skills across tasks. To realize this, MH-AIRL effectively synthesizes context-based multi-task learning, AIRL (an IL approach), and hierarchical policy learning. Further, MH-AIRL can be adopted to demonstrations without the task or skill annotations (i.e., state-action pairs only) which are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Robot Manipulation and Learning
