Self-adaptive Multimodal Retrieval-Augmented Generation
Wenjia Zhai

TL;DR
This paper introduces SAM-RAG, a self-adaptive multimodal retrieval-augmented generation method that dynamically filters and verifies relevant documents, significantly improving retrieval accuracy and response quality in complex multimodal tasks.
Contribution
The paper presents SAM-RAG, a novel adaptive framework for multimodal RAG that enhances document relevance filtering and verification, outperforming existing methods in accuracy and task performance.
Findings
SAM-RAG outperforms state-of-the-art methods in retrieval accuracy.
SAM-RAG improves response quality in multimodal tasks.
Ablation studies confirm the effectiveness of dynamic filtering and verification.
Abstract
Traditional Retrieval-Augmented Generation (RAG) methods are limited by their reliance on a fixed number of retrieved documents, often resulting in incomplete or noisy information that undermines task performance. Although recent adaptive approaches alleviated these problems, their application in intricate and real-world multimodal tasks remains limited. To address these, we propose a new approach called Self-adaptive Multimodal Retrieval-Augmented Generation (SAM-RAG), tailored specifically for multimodal contexts. SAM-RAG not only dynamically filters relevant documents based on the input query, including image captions when needed, but also verifies the quality of both the retrieved documents and the output. Extensive experimental results show that SAM-RAG surpasses existing state-of-the-art methods in both retrieval accuracy and response generation. By further ablation experiments…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Linear Layer · Multi-Head Attention · Dense Connections · WordPiece · Residual Connection · Linear Warmup With Linear Decay · Dropout · Layer Normalization · Adam
