Face4RAG: Factual Consistency Evaluation for Retrieval Augmented Generation in Chinese
Yunqi Xu, Tianchi Cai, Jiyan Jiang, Xierui Song

TL;DR
This paper introduces Face4RAG, a comprehensive benchmark for evaluating factual consistency in retrieval-augmented generation across various LLMs, and proposes L-Face4RAG, a new method that significantly improves detection of logical fallacies and factual errors.
Contribution
The paper presents the first LLM-independent FCE benchmark and a novel method, L-Face4RAG, that enhances factual inconsistency detection in RAG systems.
Findings
Existing FCE methods fail to detect logical fallacies.
L-Face4RAG outperforms previous methods across multiple tasks.
The benchmark and method are publicly available.
Abstract
The prevailing issue of factual inconsistency errors in conventional Retrieval Augmented Generation (RAG) motivates the study of Factual Consistency Evaluation (FCE). Despite the various FCE methods proposed earlier, these methods are evaluated on datasets generated by specific Large Language Models (LLMs). Without a comprehensive benchmark, it remains unexplored how these FCE methods perform on other LLMs with different error distributions or even unseen error types, as these methods may fail to detect the error types generated by other LLMs. To fill this gap, in this paper, we propose the first comprehensive FCE benchmark \emph{Face4RAG} for RAG independent of the underlying LLM. Our benchmark consists of a synthetic dataset built upon a carefully designed typology for factuality inconsistency error and a real-world dataset constructed from six commonly used LLMs, enabling evaluation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Weight Decay · Multi-Head Attention · Residual Connection · WordPiece · Softmax · Byte Pair Encoding · Layer Normalization
