Infer Human's Intentions Before Following Natural Language Instructions
Yanming Wan, Yue Wu, Yiping Wang, Jiayuan Mao, Natasha Jaques

TL;DR
This paper introduces FISER, a framework that explicitly infers human goals and intentions through social reasoning to improve natural language instruction following in embodied tasks, outperforming existing methods.
Contribution
FISER models human internal goals as partially observable factors, enabling better reasoning and action planning in collaborative environments, achieving state-of-the-art results.
Findings
FISER outperforms end-to-end approaches in instruction following.
Explicit social reasoning improves understanding of human intentions.
Achieves state-of-the-art on the HandMeThat benchmark.
Abstract
For AI agents to be helpful to humans, they should be able to follow natural language instructions to complete everyday cooperative tasks in human environments. However, real human instructions inherently possess ambiguity, because the human speakers assume sufficient prior knowledge about their hidden goals and intentions. Standard language grounding and planning methods fail to address such ambiguities because they do not model human internal goals as additional partially observable factors in the environment. We propose a new framework, Follow Instructions with Social and Embodied Reasoning (FISER), aiming for better natural language instruction following in collaborative embodied tasks. Our framework makes explicit inferences about human goals and intentions as intermediate reasoning steps. We implement a set of Transformer-based models and evaluate them over a challenging…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsChild and Animal Learning Development · Child Development and Digital Technology
MethodsSparse Evolutionary Training
