Seeing Through the MiRAGE: Evaluating Multimodal Retrieval Augmented Generation
Alexander Martin, William Walden, Reno Kriz, Dengjia Zhang, Kate Sanders, Eugene Yang, Chihsheng Jin, Benjamin Van Durme

TL;DR
MiRAGE is a new evaluation framework for multimodal retrieval-augmented generation that assesses factual accuracy and source support, addressing limitations of text-centric metrics in audiovisual media contexts.
Contribution
The paper introduces MiRAGE, a claim-centric evaluation framework for multimodal RAG, including new metrics and automatic variants, to improve assessment of factuality and source coverage.
Findings
MiRAGE aligns well with human judgments of quality.
Automatic variants of MiRAGE correlate with manual assessments.
Existing text-centric metrics have limitations in multimodal settings.
Abstract
We introduce MiRAGE, an evaluation framework for retrieval-augmented generation (RAG) from multimodal sources. As audiovisual media becomes a prevalent source of information online, it is essential for RAG systems to integrate information from these sources into generation. However, existing evaluations for RAG are text-centric, limiting their applicability to multimodal, reasoning intensive settings because they don't verify information against sources. MiRAGE is a claim-centric approach to multimodal RAG evaluation, consisting of InfoF1, evaluating factuality and information coverage, and CiteF1, measuring citation support and completeness. We show that MiRAGE, when applied by humans, strongly aligns with extrinsic quality judgments. We additionally introduce automatic variants of MiRAGE and three prominent TextRAG metrics -- ACLE, ARGUE, and RAGAS -- demonstrating the limitations of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
