BRIT: Bidirectional Retrieval over Unified Image-Text Graph
Ainulla Khan, Yamada Moyuru, Srinidhi Akella

TL;DR
BRIT is a novel multi-modal retrieval framework that unifies image-text connections in a graph to improve retrieval for complex cross-modal questions, especially when fine-tuning is ineffective.
Contribution
It introduces a unified multi-modal graph for retrieval that traverses bidirectional image-text paths, enhancing multi-hop question answering over multi-modal documents.
Findings
Outperforms existing methods on MM-RAG test set
Effectively retrieves relevant images and texts for complex questions
Handles cross-modal multi-hop reasoning
Abstract
Retrieval-Augmented Generation (RAG) has emerged as a promising technique to enhance the quality and relevance of responses generated by large language models. While recent advancements have mainly focused on improving RAG for text-based queries, RAG on multi-modal documents containing both texts and images has not been fully explored. Especially when fine-tuning does not work. This paper proposes BRIT, a novel multi-modal RAG framework that effectively unifies various text-image connections in the document into a multi-modal graph and retrieves the texts and images as a query-specific sub-graph. By traversing both image-to-text and text-to-image paths in the graph, BRIT retrieve not only directly query-relevant images and texts but also further relevant contents to answering complex cross-modal multi-hop questions. To evaluate the effectiveness of BRIT, we introduce MM-RAG test set…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques · Multimodal Machine Learning Applications
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Warmup With Linear Decay · Attention Dropout · Softmax · WordPiece · Weight Decay · Dropout · Adam · Linear Layer
