Transforming Human-Centered AI Collaboration: Redefining Embodied Agents Capabilities through Interactive Grounded Language Instructions
Shrestha Mohanty, Negar Arabzadeh, Julia Kiseleva, Artem, Zholus, Milagro Teruel, Ahmed Awadallah, Yuxuan Sun, Kavya Srinet, and Arthur Szlam

TL;DR
This paper introduces a new dataset and tools for developing embodied agents capable of understanding and executing natural language instructions in real-world tasks, advancing human-AI collaboration.
Contribution
It presents a crowd-sourcing tool, the largest grounded language instruction dataset, and baseline models for embodied AI agents, facilitating future research.
Findings
Developed a crowd-sourcing platform for grounded language data collection.
Created the largest dataset of grounded language instructions.
Provided baseline models demonstrating current capabilities.
Abstract
Human intelligence's adaptability is remarkable, allowing us to adjust to new tasks and multi-modal environments swiftly. This skill is evident from a young age as we acquire new abilities and solve problems by imitating others or following natural language instructions. The research community is actively pursuing the development of interactive "embodied agents" that can engage in natural conversations with humans and assist them with real-world tasks. These agents must possess the ability to promptly request feedback in case communication breaks down or instructions are unclear. Additionally, they must demonstrate proficiency in learning new vocabulary specific to a given domain. In this paper, we made the following contributions: (1) a crowd-sourcing tool for collecting grounded language instructions; (2) the largest dataset of grounded language instructions; and (3) several…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems
