SEIHAI: A Sample-efficient Hierarchical AI for the MineRL Competition
Hangyu Mao, Chao Wang, Xiaotian Hao, Yihuan Mao, Yiming Lu, Chengjie, Wu, Jianye Hao, Dong Li, Pingzhong Tang

TL;DR
SEIHAI is a hierarchical AI that efficiently leverages human demonstrations and task structure to solve complex sparse reward tasks with fewer environment interactions, winning the MineRL competition.
Contribution
The paper introduces SEIHAI, a novel hierarchical approach that combines reinforcement and imitation learning with task decomposition and agent scheduling.
Findings
SEIHAI achieved first place in the MineRL competition.
The hierarchical approach reduces environment interactions needed.
Effective use of human demonstrations enhances learning efficiency.
Abstract
The MineRL competition is designed for the development of reinforcement learning and imitation learning algorithms that can efficiently leverage human demonstrations to drastically reduce the number of environment interactions needed to solve the complex \emph{ObtainDiamond} task with sparse rewards. To address the challenge, in this paper, we present \textbf{SEIHAI}, a \textbf{S}ample-\textbf{e}ff\textbf{i}cient \textbf{H}ierarchical \textbf{AI}, that fully takes advantage of the human demonstrations and the task structure. Specifically, we split the task into several sequentially dependent subtasks, and train a suitable agent for each subtask using reinforcement learning and imitation learning. We further design a scheduler to select different agents for different subtasks automatically. SEIHAI takes the first place in the preliminary and final of the NeurIPS-2020 MineRL competition.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Adversarial Robustness in Machine Learning
