Retromorphic Testing with Hierarchical Verification for Hallucination Detection in RAG

Boxi Yu; Yuzhong Zhang; Liting Lin; Lionel Briand; Emir Mu\~noz

arXiv:2603.27752·cs.CL·March 31, 2026

Retromorphic Testing with Hierarchical Verification for Hallucination Detection in RAG

Boxi Yu, Yuzhong Zhang, Liting Lin, Lionel Briand, Emir Mu\~noz

PDF

TL;DR

This paper introduces RT4CHART, a hierarchical verification framework for detecting hallucinations in retrieval-augmented generation, providing fine-grained, evidence-grounded diagnostics that outperform existing methods.

Contribution

RT4CHART is a novel retromorphic testing framework that decomposes outputs into claims and verifies them hierarchically against context, improving hallucination detection accuracy.

Findings

01

RT4CHART achieves an F1 score of 0.776 on RAGTruth++, surpassing baselines by 83%.

02

On RAGTruth-Enhance, RT4CHART attains a span-level F1 of 47.5%.

03

Re-annotation shows 1.68x more hallucinations than original labels.

Abstract

Large language models (LLMs) continue to hallucinate in retrieval-augmented generation (RAG), producing claims that are unsupported by or conflict with the retrieved context. Detecting such errors remains challenging when faithfulness is evaluated solely with respect to the retrieved context. Existing approaches either provide coarse-grained, answer-level scores or focus on open-domain factuality, often lacking fine-grained, evidence-grounded diagnostics. We present RT4CHART, a retromorphic testing framework for context-faithfulness assessment. RT4CHART decomposes model outputs into independently verifiable claims and performs hierarchical, local-to-global verification against the retrieved context. Each claim is assigned one of three labels: entailed, contradicted, or baseless. Furthermore, RT4CHART maps claim-level decisions back to specific answer spans and retrieves explicit…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.