Multi-Level Compositional Reasoning for Interactive Instruction Following
Suvaansh Bhambri, Byeonghwi Kim, Jonghyun Choi

TL;DR
This paper introduces a multi-level reasoning framework for robotic agents to perform complex domestic tasks by breaking down instructions into subgoals, improving efficiency without relying on rule-based planning.
Contribution
It presents the MCR-Agent, a three-level policy system that infers subgoals, controls navigation, and executes manipulation, advancing interactive instruction understanding.
Findings
Achieves 2.03% improvement in efficiency metric (PLWSR) on unseen tasks.
Generates human-interpretable subgoals for complex tasks.
Does not rely on rule-based planning or semantic spatial memory.
Abstract
Robotic agents performing domestic chores by natural language directives are required to master the complex job of navigating environment and interacting with objects in the environments. The tasks given to the agents are often composite thus are challenging as completing them require to reason about multiple subtasks, e.g., bring a cup of coffee. To address the challenge, we propose to divide and conquer it by breaking the task into multiple subgoals and attend to them individually for better navigation and interaction. We call it Multi-level Compositional Reasoning Agent (MCR-Agent). Specifically, we learn a three-level action policy. At the highest level, we infer a sequence of human-interpretable subgoals to be executed based on language instructions by a high-level policy composition controller. At the middle level, we discriminatively control the agent's navigation by a master…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Topic Modeling
