Efficient Agent: Optimizing Planning Capability for Multimodal Retrieval Augmented Generation
Yuechen Wang, Yuming Qiao, Dan Meng, Jun Yang, Haonan Lu, Zhenyu Yang, Xudong Zhang

TL;DR
This paper introduces E-Agent, a novel framework for multimodal retrieval-augmented generation that improves planning efficiency and accuracy by dynamic tool orchestration and introduces a new benchmark for evaluation.
Contribution
The paper presents E-Agent, a new agent framework with dynamic planning and task execution for mRAG, and introduces the RemPlan benchmark for assessing mRAG planning capabilities.
Findings
E-Agent achieves 13% higher accuracy than existing methods.
Reduces redundant searches by 37%.
Demonstrates superior performance on multiple benchmarks.
Abstract
Multimodal Retrieval-Augmented Generation (mRAG) has emerged as a promising solution to address the temporal limitations of Multimodal Large Language Models (MLLMs) in real-world scenarios like news analysis and trending topics. However, existing approaches often suffer from rigid retrieval strategies and under-utilization of visual information. To bridge this gap, we propose E-Agent, an agent framework featuring two key innovations: a mRAG planner trained to dynamically orchestrate multimodal tools based on contextual reasoning, and a task executor employing tool-aware execution sequencing to implement optimized mRAG workflows. E-Agent adopts a one-time mRAG planning strategy that enables efficient information retrieval while minimizing redundant tool invocations. To rigorously assess the planning capabilities of mRAG systems, we introduce the Real-World mRAG Planning (RemPlan)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Speech and dialogue systems
