Is Conformal Factuality for RAG-based LLMs Robust? Novel Metrics and Systematic Insights
Yi Chen, Daiwei Chen, Sukrut Madhav Chikodikar, Caitlyn Heqi Yin, Ramya Korlakai Vinayak

TL;DR
This paper evaluates the robustness and usefulness of conformal factuality filtering in RAG-based LLMs, revealing limitations under distribution shifts and proposing more efficient verification methods with better utility.
Contribution
It introduces novel informativeness-aware metrics, systematically analyzes conformal factuality's limitations, and compares lightweight verifiers to LLM-based scorers for improved reliability and efficiency.
Findings
Conformal filtering has low usefulness at high factuality levels due to vacuous outputs.
Factuality guarantees are fragile under distribution shifts and distractors.
Lightweight entailment-based verifiers outperform LLM-based confidence scorers in efficiency.
Abstract
Large language models (LLMs) frequently hallucinate, limiting their reliability in knowledge-intensive applications. Retrieval-augmented generation (RAG) and conformal factuality have emerged as potential ways to address this limitation. While RAG aims to ground responses in retrieved evidence, it provides no statistical guarantee that the final output is correct. Conformal factuality filtering offers distribution-free statistical reliability by scoring and filtering atomic claims using a threshold calibrated on held-out data, however, the informativeness of the final output is not guaranteed. We systematically analyze the reliability and usefulness of conformal factuality for RAG-based LLMs across generation, scoring, calibration, robustness, and efficiency. We propose novel informativeness-aware metrics that better reflect task utility under conformal filtering. Across three…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Explainable Artificial Intelligence (XAI)
