EMPOWER: Embodied Multi-role Open-vocabulary Planning with Online   Grounding and Execution

Francesco Argenziano; Michele Brienza; Vincenzo Suriani; Daniele; Nardi; Domenico D. Bloisi

arXiv:2408.17379·cs.RO·October 23, 2024

EMPOWER: Embodied Multi-role Open-vocabulary Planning with Online Grounding and Execution

Francesco Argenziano, Michele Brienza, Vincenzo Suriani, Daniele, Nardi, Domenico D. Bloisi

PDF

Open Access

TL;DR

EMPOWER is a framework that enhances robot task planning by integrating open-vocabulary grounding and multi-role mechanisms, leading to improved success rates in real-world scenarios with limited computational resources.

Contribution

It introduces EMPOWER, a novel approach combining foundation models and multi-role strategies for embodied agents to improve grounded planning and execution.

Findings

01

Achieves an average success rate of 0.73 across six real-life scenarios.

02

Demonstrates significant improvements in grounded planning and execution.

03

Utilizes efficient pre-trained foundation models for online grounding.

Abstract

Task planning for robots in real-life settings presents significant challenges. These challenges stem from three primary issues: the difficulty in identifying grounded sequences of steps to achieve a goal; the lack of a standardized mapping between high-level actions and low-level commands; and the challenge of maintaining low computational overhead given the limited resources of robotic hardware. We introduce EMPOWER, a framework designed for open-vocabulary online grounding and planning for embodied agents aimed at addressing these issues. By leveraging efficient pre-trained foundation models and a multi-role mechanism, EMPOWER demonstrates notable improvements in grounded planning and execution. Quantitative results highlight the effectiveness of our approach, achieving an average success rate of 0.73 across six different real-life scenarios using a TIAGo robot.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech and dialogue systems · Multimodal Machine Learning Applications