Multi-Modal Grounded Planning and Efficient Replanning For Learning   Embodied Agents with A Few Examples

Taewoong Kim; Byeonghwi Kim; Jonghyun Choi

arXiv:2412.17288·cs.RO·December 24, 2024

Multi-Modal Grounded Planning and Efficient Replanning For Learning Embodied Agents with A Few Examples

Taewoong Kim, Byeonghwi Kim, Jonghyun Choi

PDF

Open Access 1 Repo 1 Video

TL;DR

FLARE enhances robotic task planning by integrating language instructions with environmental perception and visual cues, enabling efficient few-shot learning and correction of ambiguous commands for more accurate execution.

Contribution

The paper introduces FLARE, a novel framework that combines language, environment perception, and visual cues to improve planning and correction in embodied agents with minimal data.

Findings

01

Outperforms state-of-the-art planning methods with few language examples.

02

Effectively corrects ambiguous or incorrect instructions using visual cues.

03

Enables more accurate and environment-aware task execution in robotic agents.

Abstract

Learning a perception and reasoning module for robotic assistants to plan steps to perform complex tasks based on natural language instructions often requires large free-form language annotations, especially for short high-level instructions. To reduce the cost of annotation, large language models (LLMs) are used as a planner with few data. However, when elaborating the steps, even the state-of-the-art planner that uses LLMs mostly relies on linguistic common sense, often neglecting the status of the environment at command reception, resulting in inappropriate plans. To generate plans grounded in the environment, we propose FLARE (Few-shot Language with environmental Adaptive Replanning Embodied agent), which improves task planning using both language command and environmental perception. As language instructions often contain ambiguities or incorrect expressions, we additionally…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

snumprlab/flare
pytorchOfficial

Videos

Multi-Modal Grounded Planning and Efficient Replanning for Learning Embodied Agents with a Few Examples· underline

Taxonomy

TopicsNatural Language Processing Techniques