EvidenceRL: Reinforcing Evidence Consistency for Trustworthy Language Models
J. Ben Tamo, Yuxing Lu, Benoit L. Marteau, Micky C. Nnamdi, May D. Wang

TL;DR
EvidenceRL is a reinforcement learning framework that enhances language models' adherence to evidence, reducing hallucinations and improving trustworthiness in high-stakes domains like medicine and law.
Contribution
It introduces EvidenceRL, a novel reinforcement learning method that enforces evidence consistency during training, significantly improving faithfulness without losing accuracy.
Findings
Increases F1@3 from 37.0 to 54.5 in cardiac diagnosis
Raises legal reasoning faithfulness from 32.8% to 67.6%
Reduces hallucinations nearly 5 times in high-stakes tasks
Abstract
Large Language Models (LLMs) are fluent but prone to hallucinations, producing answers that appear plausible yet are unsupported by available evidence. This failure is especially problematic in high-stakes domains where decisions must be justified by verifiable information. We introduce \textbf{EvidenceRL}, a reinforcement learning framework that enforces evidence adherence during training. EvidenceRL scores candidate responses for grounding (entailment with retrieved evidence and context) and correctness (agreement with reference answers) and optimizes the generator using Group Relative Policy Optimization (GRPO). We evaluate across two high-stakes domains, cardiac diagnosis and legal reasoning, where EvidenceRL consistently improves evidence grounding and faithfulness without sacrificing task accuracy. On cardiac diagnosis, F1@3 increases from 37.0 to 54.5 on Llama-3.2-3B while…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Machine Learning in Healthcare
