M-Eval: A Heterogeneity-Based Framework for Multi-evidence Validation in Medical RAG Systems

Mengzhou Sun; Sendong Zhao; Jianyu Chen; Haochun Wang; Bin Qin

arXiv:2510.23995·cs.CL·October 29, 2025

M-Eval: A Heterogeneity-Based Framework for Multi-evidence Validation in Medical RAG Systems

Mengzhou Sun, Sendong Zhao, Jianyu Chen, Haochun Wang, Bin Qin

PDF

TL;DR

M-Eval introduces a heterogeneity-based framework that improves validation of medical RAG responses by detecting factual errors and assessing evidence reliability, enhancing system accuracy and trustworthiness.

Contribution

This paper presents M-Eval, a novel heterogeneity analysis approach for verifying factual correctness and evidence reliability in medical RAG systems, addressing hallucinations and misinformation.

Findings

01

Up to 23.31% accuracy improvement across LLMs.

02

Effective detection of factual errors and evidence inconsistencies.

03

Enhanced reliability of medical RAG responses.

Abstract

Retrieval-augmented Generation (RAG) has demonstrated potential in enhancing medical question-answering systems through the integration of large language models (LLMs) with external medical literature. LLMs can retrieve relevant medical articles to generate more professional responses efficiently. However, current RAG applications still face problems. They generate incorrect information, such as hallucinations, and they fail to use external knowledge correctly. To solve these issues, we propose a new method named M-Eval. This method is inspired by the heterogeneity analysis approach used in Evidence-Based Medicine (EBM). Our approach can check for factual errors in RAG responses using evidence from multiple sources. First, we extract additional medical literature from external knowledge bases. Then, we retrieve the evidence documents generated by the RAG system. We use heterogeneity…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.