MEG-RAG: Quantifying Multi-modal Evidence Grounding for Evidence Selection in RAG

Xihang Wang; Zihan Wang; Chengkai Huang; Quan Z. Sheng; Lina Yao

arXiv:2604.24564·cs.CL·May 1, 2026

MEG-RAG: Quantifying Multi-modal Evidence Grounding for Evidence Selection in RAG

Xihang Wang, Zihan Wang, Chengkai Huang, Quan Z. Sheng, Lina Yao

PDF

TL;DR

This paper introduces MEG-RAG, a semantic-aware metric and framework for improving evidence selection in multimodal retrieval-augmented generation, reducing hallucinations and enhancing factual accuracy.

Contribution

It proposes MEG, a novel semantic grounding metric, and MEG-RAG, a training framework that aligns retrieved evidence with semantic anchors for better multimodal answer quality.

Findings

01

MEG-RAG outperforms strong baselines on the M$^2$RAG benchmark.

02

It demonstrates improved accuracy and multimodal consistency in generated outputs.

03

The approach generalizes well across different teacher models.

Abstract

Multimodal Retrieval-Augmented Generation (MRAG) addresses key limitations of Multimodal Large Language Models (MLLMs), such as hallucination and outdated knowledge. However, current MRAG systems struggle to distinguish whether retrieved multimodal data truly supports the semantic core of an answer or merely provides superficial relevance. Existing metrics often rely on heuristic position-based confidence, which fails to capture the informational density of multimodal entities. To address this, we propose Multi-modal Evidence Grounding (MEG), a semantic-aware metric that quantifies the contribution of retrieved evidence. Unlike standard confidence measures, MEG utilizes Semantic Certainty Anchoring, focusing on high-IDF information-bearing tokens that better capture the semantic core of the answer. Building on MEG, we introduce MEG-RAG, a framework that trains a multimodal reranker to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.