Hierarchical Task Learning from Language Instructions with Unified Transformers and Self-Monitoring
Yichi Zhang, Joyce Chai

TL;DR
This paper introduces HiTUT, a hierarchical transformer-based model that decomposes task learning into sub-problems, significantly improving generalization and success rates in unseen environments on the ALFRED benchmark.
Contribution
The paper proposes a novel hierarchical task learning framework with unified transformers that explicitly models task structures, outperforming existing methods on ALFRED.
Findings
Over 160% success rate improvement in unseen environments
Explicit task structure representation enhances understanding and generalization
Achieves state-of-the-art performance on ALFRED benchmark
Abstract
Despite recent progress, learning new tasks through language instructions remains an extremely challenging problem. On the ALFRED benchmark for task learning, the published state-of-the-art system only achieves a task success rate of less than 10% in an unseen environment, compared to the human performance of over 90%. To address this issue, this paper takes a closer look at task learning. In a departure from a widely applied end-to-end architecture, we decomposed task learning into three sub-problems: sub-goal planning, scene navigation, and object manipulation; and developed a model HiTUT (stands for Hierarchical Tasks via Unified Transformers) that addresses each sub-problem in a unified manner to learn a hierarchical task structure. On the ALFRED benchmark, HiTUT has achieved the best performance with a remarkably higher generalization ability. In the unseen environment, HiTUT…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques
