Re-ranking the Context for Multimodal Retrieval Augmented Generation
Matin Mortaheb, Mohammad A. Amir Khojastepour, Srimat T., Chakradhar, Sennur Ulukus

TL;DR
This paper improves multi-modal retrieval-augmented generation by using an advanced relevancy measure to select more relevant context entries, thereby enhancing response accuracy and reducing hallucinations.
Contribution
It introduces a relevancy score-based retrieval method that adaptively selects context entries, improving multi-modal RAG performance over traditional embedding-based methods.
Findings
Enhanced relevance in context selection using the proposed measure.
Significant improvement in response accuracy on COCO dataset.
Reduction in irrelevant context entries leading to better generation quality.
Abstract
Retrieval-augmented generation (RAG) enhances large language models (LLMs) by incorporating external knowledge to generate a response within a context with improved accuracy and reduced hallucinations. However, multi-modal RAG systems face unique challenges: (i) the retrieval process may select irrelevant entries to user query (e.g., images, documents), and (ii) vision-language models or multi-modal language models like GPT-4o may hallucinate when processing these entries to generate RAG output. In this paper, we aim to address the first challenge, i.e, improving the selection of relevant context from the knowledge-base in retrieval phase of the multi-modal RAG. Specifically, we leverage the relevancy score (RS) measure designed in our previous work for evaluating the RAG performance to select more relevant entries in retrieval process. The retrieval based on embeddings, say CLIP-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Layer Normalization · Dense Connections · Linear Warmup With Linear Decay · WordPiece · Attention Dropout · Adam · Residual Connection · Dropout
