EvidenceRL: Reinforcing Evidence Consistency for Trustworthy Language Models

J. Ben Tamo; Yuxing Lu; Benoit L. Marteau; Micky C. Nnamdi; May D. Wang

arXiv:2603.19532·cs.CL·March 23, 2026

EvidenceRL: Reinforcing Evidence Consistency for Trustworthy Language Models

J. Ben Tamo, Yuxing Lu, Benoit L. Marteau, Micky C. Nnamdi, May D. Wang

PDF

Open Access

TL;DR

EvidenceRL is a reinforcement learning framework that enhances language models' adherence to evidence, reducing hallucinations and improving trustworthiness in high-stakes domains like medicine and law.

Contribution

It introduces EvidenceRL, a novel reinforcement learning method that enforces evidence consistency during training, significantly improving faithfulness without losing accuracy.

Findings

01

Increases F1@3 from 37.0 to 54.5 in cardiac diagnosis

02

Raises legal reasoning faithfulness from 32.8% to 67.6%

03

Reduces hallucinations nearly 5 times in high-stakes tasks

Abstract

Large Language Models (LLMs) are fluent but prone to hallucinations, producing answers that appear plausible yet are unsupported by available evidence. This failure is especially problematic in high-stakes domains where decisions must be justified by verifiable information. We introduce \textbf{EvidenceRL}, a reinforcement learning framework that enforces evidence adherence during training. EvidenceRL scores candidate responses for grounding (entailment with retrieved evidence and context) and correctness (agreement with reference answers) and optimizes the generator using Group Relative Policy Optimization (GRPO). We evaluate across two high-stakes domains, cardiac diagnosis and legal reasoning, where EvidenceRL consistently improves evidence grounding and faithfulness without sacrificing task accuracy. On cardiac diagnosis, F1@3 increases from 37.0 to 54.5 on Llama-3.2-3B while…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Machine Learning in Healthcare