Multi-Modal Grounded Planning and Efficient Replanning For Learning Embodied Agents with A Few Examples
Taewoong Kim, Byeonghwi Kim, Jonghyun Choi

TL;DR
FLARE enhances robotic task planning by integrating language instructions with environmental perception and visual cues, enabling efficient few-shot learning and correction of ambiguous commands for more accurate execution.
Contribution
The paper introduces FLARE, a novel framework that combines language, environment perception, and visual cues to improve planning and correction in embodied agents with minimal data.
Findings
Outperforms state-of-the-art planning methods with few language examples.
Effectively corrects ambiguous or incorrect instructions using visual cues.
Enables more accurate and environment-aware task execution in robotic agents.
Abstract
Learning a perception and reasoning module for robotic assistants to plan steps to perform complex tasks based on natural language instructions often requires large free-form language annotations, especially for short high-level instructions. To reduce the cost of annotation, large language models (LLMs) are used as a planner with few data. However, when elaborating the steps, even the state-of-the-art planner that uses LLMs mostly relies on linguistic common sense, often neglecting the status of the environment at command reception, resulting in inappropriate plans. To generate plans grounded in the environment, we propose FLARE (Few-shot Language with environmental Adaptive Replanning Embodied agent), which improves task planning using both language command and environmental perception. As language instructions often contain ambiguities or incorrect expressions, we additionally…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques
