Claim-Selective Certification for High-Risk Medical Retrieval-Augmented Generation
Shao Kan

TL;DR
This paper introduces claim-selective certification for high-risk medical retrieval-augmented generation, decomposing responses into claims scored against evidence and mapped to actions, improving verification and trustworthiness.
Contribution
It proposes a novel claim-selective certification framework that decomposes responses into verifiable claims and maps them to actions, enhancing evaluation in high-risk medical QA systems.
Findings
Achieved UCCR=0.0000 on dev and test sets, indicating no unsupported claims.
High action accuracy of over 90% demonstrates effective claim-action mapping.
Source-missing counterfactuals evaluate abstain behavior under empty evidence.
Abstract
Medical RAG systems in high-risk QA settings are often evaluated through a single answer-or-abstain decision, but mixed evidence may support one claim, require conditions for another, and contradict a third. We study claim-selective certification: each response is decomposed into verifiable claims, scored against retrieved evidence, and mapped by an intent-aware selector to {full, partial, conflict, abstain}. On the primary weak-label certificate protocol, whose real-source-only dev/test rows cover the naturally occurring non-abstain actions, the full system records UCCR=0.0000, PAU=1.0000, PAU Precision=0.9901, and action accuracy=0.9204 on dev (n=314), and UCCR=0.0000, PAU=0.9967, PAU Precision=0.9739, and action accuracy=0.8997 on test (n=319). UCCR measures unsupported-claim risk within the certificate definition, and a source-missing counterfactual slice evaluates abstain under…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
