Re-ranking the Context for Multimodal Retrieval Augmented Generation

Matin Mortaheb; Mohammad A. Amir Khojastepour; Srimat T.; Chakradhar; Sennur Ulukus

arXiv:2501.04695·cs.LG·January 9, 2025

Re-ranking the Context for Multimodal Retrieval Augmented Generation

Matin Mortaheb, Mohammad A. Amir Khojastepour, Srimat T., Chakradhar, Sennur Ulukus

PDF

Open Access

TL;DR

This paper improves multi-modal retrieval-augmented generation by using an advanced relevancy measure to select more relevant context entries, thereby enhancing response accuracy and reducing hallucinations.

Contribution

It introduces a relevancy score-based retrieval method that adaptively selects context entries, improving multi-modal RAG performance over traditional embedding-based methods.

Findings

01

Enhanced relevance in context selection using the proposed measure.

02

Significant improvement in response accuracy on COCO dataset.

03

Reduction in irrelevant context entries leading to better generation quality.

Abstract

Retrieval-augmented generation (RAG) enhances large language models (LLMs) by incorporating external knowledge to generate a response within a context with improved accuracy and reduced hallucinations. However, multi-modal RAG systems face unique challenges: (i) the retrieval process may select irrelevant entries to user query (e.g., images, documents), and (ii) vision-language models or multi-modal language models like GPT-4o may hallucinate when processing these entries to generate RAG output. In this paper, we aim to address the first challenge, i.e, improving the selection of relevant context from the knowledge-base in retrieval phase of the multi-modal RAG. Specifically, we leverage the relevancy score (RS) measure designed in our previous work for evaluating the RAG performance to select more relevant entries in retrieval process. The retrieval based on embeddings, say CLIP-based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Layer Normalization · Dense Connections · Linear Warmup With Linear Decay · WordPiece · Attention Dropout · Adam · Residual Connection · Dropout