EMPOWER: Embodied Multi-role Open-vocabulary Planning with Online Grounding and Execution
Francesco Argenziano, Michele Brienza, Vincenzo Suriani, Daniele, Nardi, Domenico D. Bloisi

TL;DR
EMPOWER is a framework that enhances robot task planning by integrating open-vocabulary grounding and multi-role mechanisms, leading to improved success rates in real-world scenarios with limited computational resources.
Contribution
It introduces EMPOWER, a novel approach combining foundation models and multi-role strategies for embodied agents to improve grounded planning and execution.
Findings
Achieves an average success rate of 0.73 across six real-life scenarios.
Demonstrates significant improvements in grounded planning and execution.
Utilizes efficient pre-trained foundation models for online grounding.
Abstract
Task planning for robots in real-life settings presents significant challenges. These challenges stem from three primary issues: the difficulty in identifying grounded sequences of steps to achieve a goal; the lack of a standardized mapping between high-level actions and low-level commands; and the challenge of maintaining low computational overhead given the limited resources of robotic hardware. We introduce EMPOWER, a framework designed for open-vocabulary online grounding and planning for embodied agents aimed at addressing these issues. By leveraging efficient pre-trained foundation models and a multi-role mechanism, EMPOWER demonstrates notable improvements in grounded planning and execution. Quantitative results highlight the effectiveness of our approach, achieving an average success rate of 0.73 across six different real-life scenarios using a TIAGo robot.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech and dialogue systems · Multimodal Machine Learning Applications
