RAP: Retrieval-Augmented Planning with Contextual Memory for Multimodal LLM Agents
Tomoyuki Kagaya, Thong Jing Yuan, Yuxuan Lou, Jayashree Karlekar,, Sugiri Pranata, Akira Kinose, Koki Oguri, Felix Wick, Yang You

TL;DR
RAP is a versatile framework that enhances LLM agents' planning by dynamically retrieving and utilizing past experiences, significantly improving performance in both text-only and multimodal tasks.
Contribution
The paper introduces RAP, a novel retrieval-augmented planning framework that effectively incorporates past experiences into decision-making for multimodal LLM agents.
Findings
Achieves state-of-the-art performance in textual tasks.
Significantly improves multimodal LLM agents' performance in embodied tasks.
Demonstrates versatility across different environments.
Abstract
Owing to recent advancements, Large Language Models (LLMs) can now be deployed as agents for increasingly complex decision-making applications in areas including robotics, gaming, and API integration. However, reflecting past experiences in current decision-making processes, an innate human behavior, continues to pose significant challenges. Addressing this, we propose Retrieval-Augmented Planning (RAP) framework, designed to dynamically leverage past experiences corresponding to the current situation and context, thereby enhancing agents' planning capabilities. RAP distinguishes itself by being versatile: it excels in both text-only and multimodal environments, making it suitable for a wide range of tasks. Empirical evaluations demonstrate RAP's effectiveness, where it achieves SOTA performance in textual scenarios and notably enhances multimodal LLM agents' performance for embodied…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMulti-Agent Systems and Negotiation · Semantic Web and Ontologies · AI-based Problem Solving and Planning
