MaP-AVR: A Meta-Action Planner for Agents Leveraging Vision Language Models and Retrieval-Augmented Generation
Zhenglong Guo, Yiming Zhao, Feng Jiang, Heng Jin, Zongbao Feng, Jianbin Zhou, Siyuan Xu

TL;DR
MaP-AVR introduces a novel meta-action abstraction and leverages retrieval-augmented generation to improve robotic task planning with vision-language models, enhancing generalization and adaptability in complex environments.
Contribution
The paper proposes a new meta-action abstraction for robotic planning and integrates RAG for improved task execution, which are novel contributions in the field.
Findings
Demonstrates effective task completion with GPT-4o and OmniGibson.
Shows improved generalization over existing methods.
Validates the approach's promising performance.
Abstract
Embodied robotic AI systems designed to manage complex daily tasks rely on a task planner to understand and decompose high-level tasks. While most research focuses on enhancing the task-understanding abilities of LLMs/VLMs through fine-tuning or chain-of-thought prompting, this paper argues that defining the planned skill set is equally crucial. To handle the complexity of daily environments, the skill set should possess a high degree of generalization ability. Empirically, more abstract expressions tend to be more generalizable. Therefore, we propose to abstract the planned result as a set of meta-actions. Each meta-action comprises three components: {move/rotate, end-effector status change, relationship with the environment}. This abstraction replaces human-centric concepts, such as grasping or pushing, with the robot's intrinsic functionalities. As a result, the planned outcomes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Social Robot Interaction and HRI · Reinforcement Learning in Robotics
