MARVEL: Multimodal Adaptive Reasoning-intensiVe Expand-rerank and retrievaL

Mahmoud SalahEldin Kasem; Mohamed Mahmoud; Mostafa Farouk Senussi; Mahmoud Abdalla; Abdelrahman Abdallah; Hyun-Soo Kang

arXiv:2604.07079·cs.IR·April 9, 2026

MARVEL: Multimodal Adaptive Reasoning-intensiVe Expand-rerank and retrievaL

Mahmoud SalahEldin Kasem, Mohamed Mahmoud, Mostafa Farouk Senussi, Mahmoud Abdalla, Abdelrahman Abdallah, Hyun-Soo Kang

PDF

1 Repo

TL;DR

MARVEL is a unified multimodal retrieval framework that significantly improves reasoning-intensive retrieval performance by combining query expansion, a reasoning-enhanced retriever, and step-by-step reranking.

Contribution

It introduces a novel integrated pipeline that combines LLM-driven query expansion, a reasoning-enhanced retriever, and chain-of-thought reranking, surpassing existing multimodal retrieval methods.

Findings

01

Achieves 37.9 nDCG@10 on MM-BRIGHT, outperforming previous best by 10.3 points.

02

Outperforms all baselines in 27 of 29 domains, matching the best in two.

03

Demonstrates the effectiveness of a unified expand-retrieve-rerank framework for multimodal retrieval.

Abstract

Multimodal retrieval over text corpora remains a fundamental challenge: the best vision-language encoder achieves only 27.6 nDCG@10 on MM-BRIGHT, a reasoning-intensive multimodal retrieval benchmark, underperforming strong text-only systems. We argue that effective multimodal retrieval requires three tightly integrated capabilities that existing approaches address only in isolation: expanding the query's latent intent, retrieving with a model trained for complex reasoning, and reranking via explicit step-by-step reasoning over candidates. We introduce \textbf{MARVEL} (\textbf{M}ultimodal \textbf{A}daptive \textbf{R}easoning-intensi\textbf{V}e \textbf{E}xpand-rerank and retrieva\textbf{L}), a unified pipeline that combines LLM-driven query expansion, \textbf{MARVEL-Retriever} -- a reasoning-enhanced dense retriever fine-tuned for complex multimodal queries -- and GPT-4o-based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mm-bright/multimodal-reasoning-retrieval
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.